Awesome-VLA-UAVs

A list of research papers and other related resources on Vision-Language-Action/Navigation (VLA/VLN) models for UAVs.

Contributions are welcome!

2026

APEX: A Decoupled Memory-based Explorer for Asynchronous Aerial Object Goal Navigation (CVPR 2026)[paper][code] (Note: Dual system; History info)
History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation (AAAI 2026)[paper][code] (Note: Two-stage:先看大概方位，再找具体细节；历史网格地图)
IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments (AAAI 2026)[paper][code] (Note: Datasets: IndoorUAV-VLN（长时导航任务）和IndoorUAV-VLA（短时规划任务）；IndoorUAV-Agent: 先利用 GPT-4o 对原始指令进行分段，再利用基于π0架构的VLA进行飞行控制；再进行视觉反馈以辅助下一轮推理)
AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild (ICLR 2026)[paper][code] (Note: Dataset, Pseudo depth encoder)
AirHunt: Bridging VLM Semantics and Continuous Planning for Efficient Aerial Object Navigation (arXiv 2026.1)[paper][[code]] (Note: Dual system, Memory)
Fly0: Decoupling Semantic Grounding from Geometric Planning for Zero-Shot Aerial Navigation (arXiv 2026.2)[paper][code] (Note: Dual system, similar to SPF)
USS-Nav: Unified Spatio-Semantic Scene Graph for Lightweight UAV Zero-Shot Object Navigation (arXiv 2026.2)[paper][[code]] (Note: 多面体三维空间图，语义选区域、算法走路径，高效Jetson orin nx)
AirNav: A Large-Scale Real-World UAV Vision-and-Language Navigation Dataset with Natural and Diverse Instructions (arXiv 2026.1)[paper][code] (Note: Dataset, AirVLN-R1, Tello)
AION: Aerial Indoor Object-Goal Navigation Using Dual-Policy Reinforcement Learning (arXiv 2026.1)[paper][[code]] (Note: Detection & Depth, Exploration & Exploitation, 探索+寻的)
Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space (arXiv 2026.1)[paper][[code]] (Note: Imaging before movement)
NavDreamer: Video Models as Zero-Shot 3D Navigators (arXiv 2026.2)[paper][code] (Note: 语言指令→视频生成→航点提取→轨迹规划→实际飞行)
EzReal: Enhancing Zero-Shot Outdoor Robot Navigation toward Distant Targets under Varying Visibility (ICRA 2026)[paper][code] (Note: Robots, Object navigation, 看轮廓-辨方向-记方向-寻方向)

2025

[Review] UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility (Information Fusion 2025.3)[paper][code]
See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation (CoRL 2025)[paper][code] (Note: Dual system, SPF)
VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments (arXiv 2025.12)[paper][[code]] (Note: End-to-end, 3-stage training strategy, Onboard implementation)
NavRL: Learning Safe Flight in Dynamic Environments (IEEE Robotics and Automation Letters, 2025.4)[paper][code] (Note: Deep RL, Using depth info)
ASMA: An Adaptive Safety Margin Algorithm for Vision-Language Drone Navigation via Scene-Aware Control Barrier Functions (IEEE Robotics and Automation Letters, 2025.9)[paper][code] (Note: VLN + MPC, Using depth info)
LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration (arXiv 2025.12)[paper][[code]] (Note: Using history info)
OpenFly: A Comprehensive Platform for Aerial Vision-Language Navigation (arXiv 2025.7)[paper][code](Note: Dataset)
TypeFly: Low-Latency Drone Planning With Large Language Models (IEEE Transactions on Mobile Computing 2025.9) [paper][code]
Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology (OpenUAV) (ICLR 2025)[paper][code]
MonoSpheres: Large-Scale Monocular SLAM-Based UAV Exploration through Perception-Coupled Mapping and Planning (arXiv 2025.11)[paper][code](Note: 单目SLAM与感知建图与规划的融合)
OpenVLN: Open-world Aerial Vision-Language Navigation (arXiv 2025.11)[paper][[code]](Note: 利用强化学习和值模型应对数据稀缺和长视域规划的双重挑战)
UAV-VLRR: Vision-Language Informed NMPC for Rapid Response in UAV Search and Rescue (arXiv 2025.3)[paper][code](Note: VLM + NMPC)
UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning (arXiv 2025.5)[paper][code]
UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial Agents (ACM MM Dataset Track 2025)[paper][code]
AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation (ACM MM 2025)[paper][[code]]
Open3D-VQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space (ACM MM'25)[paper][code](Note: Dataset)
CityNav: A Large-Scale Dataset for Real-World Aerial Navigation (ICCV 2025)[paper][code]
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory (ACL 2025)[paper][code]
VLM-Nav: Mapless UAV-Navigation Using Monocular Vision Driven by Vision-Language Model (SSRN)[paper][code]
Learning Fine-Grained Alignment for Aerial Vision-Dialog Navigation (AAAI 2025)[paper][code]
UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation (Int. Conf. on Human Robot Interaction, HRI 2025)[paper][code]
General-Purpose Aerial Intelligent Agents Empowered by Large Language Models (arXiv 2025.5)[paper][[code]]
RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation (arXiv 2025.9 "Best Paper Finalist at IROS 2025 Active Perception Workshop")[paper][project]

2024

[Review] Large Language Models for UAVs: Current State and Pathways to the Future (IEEE Open Journal of Vehicular Technology 2024.8) [paper][[code]]
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models (arXiv 2024.8)[paper][[code]]
TPML: Task Planning for Multi-UAV System with Large Language Models (2024 IEEE 18th International Conference on Control & Automation (ICCA))[paper][code]
EAI-SIM: An Open-Source Embodied AI Simulation Framework with Large Language Models (2024 IEEE 18th International Conference on Control & Automation (ICCA))[paper][code]
Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning (STMR) (Submitted to ICRA 2025)[paper][[code]]

2023

AerialVLN: Vision-and-Language Navigation for UAVs (ICCV 2023)[paper][code]

System1 + System2 Thinking

Visual Agents as Fast and Slow Thinkers (ICLR 2025)[paper][code]
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces (arXiv 2025)[paper][[code]]
Helix: A "System 1, System 2" VLA for Whole Upper Body Control (figure.ai) [link]
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models (Conference on Robot Learning (CoRL) 2024)[paper][project]
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models (Physical Intelligence (π)) (ICML 2025)[paper][blog]
HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers (Conference on Robot Learning (CoRL) 2024)[paper][[code]]
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots (arXiv 2025.3)[paper][code][tech]
GR00T N1.5: An Improved Open Foundation Model for Generalist Humanoid Robots [tech][code][blog]

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-VLA-UAVs

2026

2025

2024

2023

System1 + System2 Thinking

Related Awesome lists

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Folders and files

Latest commit

History

Repository files navigation

Awesome-VLA-UAVs

2026

2025

2024

2023

System1 + System2 Thinking

Related Awesome lists

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Packages