Skip to content

TheBrainLab/Awesome-VLA-UAVs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 

Repository files navigation

Awesome-VLA-UAVs

A list of research papers and other related resources on Vision-Language-Action/Navigation (VLA/VLN) models for UAVs.

Contributions are welcome!

2026

  • APEX: A Decoupled Memory-based Explorer for Asynchronous Aerial Object Goal Navigation (CVPR 2026)[paper][code] (Note: Dual system; History info)

  • History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation (AAAI 2026)[paper][code] (Note: Two-stage:先看大概方位,再找具体细节;历史网格地图)

  • IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments (AAAI 2026)[paper][code] (Note: Datasets: IndoorUAV-VLN(长时导航任务)和IndoorUAV-VLA(短时规划任务);IndoorUAV-Agent: 先利用 GPT-4o 对原始指令进行分段,再利用基于π0架构的VLA进行飞行控制;再进行视觉反馈以辅助下一轮推理)

  • AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild (ICLR 2026)[paper][code] (Note: Dataset, Pseudo depth encoder)

  • AirHunt: Bridging VLM Semantics and Continuous Planning for Efficient Aerial Object Navigation (arXiv 2026.1)[paper][[code]] (Note: Dual system, Memory)

  • Fly0: Decoupling Semantic Grounding from Geometric Planning for Zero-Shot Aerial Navigation (arXiv 2026.2)[paper][code] (Note: Dual system, similar to SPF)

  • USS-Nav: Unified Spatio-Semantic Scene Graph for Lightweight UAV Zero-Shot Object Navigation (arXiv 2026.2)[paper][[code]] (Note: 多面体三维空间图,语义选区域、算法走路径,高效Jetson orin nx)

  • AirNav: A Large-Scale Real-World UAV Vision-and-Language Navigation Dataset with Natural and Diverse Instructions (arXiv 2026.1)[paper][code] (Note: Dataset, AirVLN-R1, Tello)

  • AION: Aerial Indoor Object-Goal Navigation Using Dual-Policy Reinforcement Learning (arXiv 2026.1)[paper][[code]] (Note: Detection & Depth, Exploration & Exploitation, 探索+寻的)

  • Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space (arXiv 2026.1)[paper][[code]] (Note: Imaging before movement)

  • NavDreamer: Video Models as Zero-Shot 3D Navigators (arXiv 2026.2)[paper][code] (Note: 语言指令→视频生成→航点提取→轨迹规划→实际飞行)

  • EzReal: Enhancing Zero-Shot Outdoor Robot Navigation toward Distant Targets under Varying Visibility (ICRA 2026)[paper][code] (Note: Robots, Object navigation, 看轮廓-辨方向-记方向-寻方向)

2025

  • [Review] UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility (Information Fusion 2025.3)[paper][code]

  • See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation (CoRL 2025)[paper][code] (Note: Dual system, SPF)

  • VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments (arXiv 2025.12)[paper][[code]] (Note: End-to-end, 3-stage training strategy, Onboard implementation)

  • NavRL: Learning Safe Flight in Dynamic Environments (IEEE Robotics and Automation Letters, 2025.4)[paper][code] (Note: Deep RL, Using depth info)

  • ASMA: An Adaptive Safety Margin Algorithm for Vision-Language Drone Navigation via Scene-Aware Control Barrier Functions (IEEE Robotics and Automation Letters, 2025.9)[paper][code] (Note: VLN + MPC, Using depth info)

  • LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration (arXiv 2025.12)[paper][[code]] (Note: Using history info)

  • OpenFly: A Comprehensive Platform for Aerial Vision-Language Navigation (arXiv 2025.7)[paper][code](Note: Dataset)

  • TypeFly: Low-Latency Drone Planning With Large Language Models (IEEE Transactions on Mobile Computing 2025.9) [paper][code]

  • Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology (OpenUAV) (ICLR 2025)[paper][code]

  • MonoSpheres: Large-Scale Monocular SLAM-Based UAV Exploration through Perception-Coupled Mapping and Planning (arXiv 2025.11)[paper][code](Note: 单目SLAM与感知建图与规划的融合)

  • OpenVLN: Open-world Aerial Vision-Language Navigation (arXiv 2025.11)[paper][[code]](Note: 利用强化学习和值模型应对数据稀缺和长视域规划的双重挑战)

  • UAV-VLRR: Vision-Language Informed NMPC for Rapid Response in UAV Search and Rescue (arXiv 2025.3)[paper][code](Note: VLM + NMPC)

  • UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning (arXiv 2025.5)[paper][code]

  • UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial Agents (ACM MM Dataset Track 2025)[paper][code]

  • AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation (ACM MM 2025)[paper][[code]]

  • Open3D-VQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space (ACM MM'25)[paper][code](Note: Dataset)

  • CityNav: A Large-Scale Dataset for Real-World Aerial Navigation (ICCV 2025)[paper][code]

  • CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory (ACL 2025)[paper][code]

  • VLM-Nav: Mapless UAV-Navigation Using Monocular Vision Driven by Vision-Language Model (SSRN)[paper][code]

  • Learning Fine-Grained Alignment for Aerial Vision-Dialog Navigation (AAAI 2025)[paper][code]

  • UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation (Int. Conf. on Human Robot Interaction, HRI 2025)[paper][code]

  • General-Purpose Aerial Intelligent Agents Empowered by Large Language Models (arXiv 2025.5)[paper][[code]]

  • RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation (arXiv 2025.9 "Best Paper Finalist at IROS 2025 Active Perception Workshop")[paper][project]

2024

  • [Review] Large Language Models for UAVs: Current State and Pathways to the Future (IEEE Open Journal of Vehicular Technology 2024.8) [paper][[code]]

  • AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models (arXiv 2024.8)[paper][[code]]

  • TPML: Task Planning for Multi-UAV System with Large Language Models (2024 IEEE 18th International Conference on Control & Automation (ICCA))[paper][code]

  • EAI-SIM: An Open-Source Embodied AI Simulation Framework with Large Language Models (2024 IEEE 18th International Conference on Control & Automation (ICCA))[paper][code]

  • Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning (STMR) (Submitted to ICRA 2025)[paper][[code]]

2023

  • AerialVLN: Vision-and-Language Navigation for UAVs (ICCV 2023)[paper][code]

System1 + System2 Thinking

  • Visual Agents as Fast and Slow Thinkers (ICLR 2025)[paper][code]

  • Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces (arXiv 2025)[paper][[code]]

  • Helix: A "System 1, System 2" VLA for Whole Upper Body Control (figure.ai) [link]

  • DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models (Conference on Robot Learning (CoRL) 2024)[paper][project]

  • Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models (Physical Intelligence (π)) (ICML 2025)[paper][blog]

  • HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers (Conference on Robot Learning (CoRL) 2024)[paper][[code]]

  • GR00T N1: An Open Foundation Model for Generalist Humanoid Robots (arXiv 2025.3)[paper][code][tech]

  • GR00T N1.5: An Improved Open Foundation Model for Generalist Humanoid Robots [tech][code][blog]

Related Awesome lists

About

A list of research papers, models, datasets, and other resources on Vision-Language-Action/Navigation (VLA/VLN) models for UAVs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages