Knowledge Vault 5 /88 - CVPR 2023
Planning-oriented Autonomous Driving
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li
< Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:

graph LR classDef unified fill:#f9d4d4, font-weight:bold, font-size:14px classDef autonomous fill:#d4f9d4, font-weight:bold, font-size:14px classDef solutions fill:#d4d4f9, font-weight:bold, font-size:14px classDef uniad fill:#f9f9d4, font-weight:bold, font-size:14px classDef future fill:#f9d4f9, font-weight:bold, font-size:14px A[Planning-oriented Autonomous Driving] --> B[UniAD: Full-stack autonomous
driving framework 1] A --> C[Autonomous driving: Weather,
scenarios, tasks 2] A --> D[Typical solutions: Standalone
models, errors 3] A --> E[Multitask frameworks: Shared
backbone, uncoordinated 4] A --> F[End-to-end: Policy from
sensors, uninterpretable 5] A --> G[UniAD: Integrate perception,
prediction, planning 6] G --> H[UniAD tasks: Formers,
occupancy, planner 7] G --> I[Unified query: Connects,
coordinates tasks 8] G --> J[Transformer modules: Complex
interactions, attention 9] H --> K[Track, map formers:
End-to-end training 10] H --> L[Motion former: Diverse
relation modeling 11] H --> M[Occupancy former: Predicts
expectations, restricts 12] H --> N[Planner: Attends features,
adjusts path 13] G --> O[Two-phase training: Stabilizes,
shares results 14] G --> P[Experiments: Validate tasks,
benefit planning 15] G --> Q[Planning: Lowest error,
collision rate 16] G --> R[Interpretability: Visualizations, recovers
from errors 17] G --> S[Unified query: Connects,
coordinates tasks 18] B --> T[Results: State-of-the-art vision-only 19] A --> U[Future: Data, training,
algorithms, systems 20] A --> V[Foundation model potential:
UniAD principles 21] A --> W[Applications: Robotics, navigation,
autonomous tasks 22] A --> X[UniAD: Step towards
foundation model 23] A --> Y[Paper, poster available
for details 24] A --> Z[Q&A: Submit questions
via box 25] class B,G,H,I,J,K,L,M,N,O,P,Q,R,S,T unified class C autonomous class D,E,F solutions class U,V,W,X,Y,Z future

Resume:

1.- UniAD: Unified full-stack autonomous driving framework coordinating perception, prediction, and planning tasks for safe driving.

2.- Autonomous driving challenges: Various weather, illuminations, and scenarios; tasks include perception, prediction, and planning.

3.- Typical solutions: Standalone models trained independently for each task, leading to accumulated errors.

4.- Multitask frameworks: Shared backbone for multiple tasks, efficient but lacks coordination between task heads.

5.- Vanilla end-to-end solutions: Learn policy directly from sensor inputs, good simulator results but lack interpretability in real-world scenarios.

6.- UniAD's approach: Integrate safety-critical perception and prediction tasks, organize in a hierarchy to maximize information flow to the planner.

7.- UniAD's tasks: Track former, map former, motion former, occupancy former, and planner.

8.- Unified query design: Connects the entire pipeline and coordinates all tasks towards planning.

9.- Transformer-based task modules: Model complex interactions in driving scenes with attention mechanisms.

10.- Track former and map former: Developed from previous research, treat agents and map elements as queries for end-to-end training.

11.- Motion former: Handles diverse relation modeling with attention mechanisms (agent-agent, agent-map, agent-ego relations).

12.- Occupancy former: Predicts occupancy expectations and restricts interactions between agents and their corresponding BEV features.

13.- Planner: Uses ego-vehicle query to attend BEV features, predicts future waypoints, and adjusts path to avoid potential collisions.

14.- Two-phase training: Stabilizes the training process and shares matching results across modules for convergence.

15.- Experiments: Validate the necessity of preceding tasks, showing they benefit each other and final planning.

16.- Planning performance: UniAD achieves the lowest L2 error and collision rate, outperforming ladder-based and previous end-to-end methods.

17.- Interpretability: Visualization of intermediate representations exhibits UniAD's interpretability and ability to recover from upstream errors.

18.- Unified query design: Connects and coordinates all tasks in the framework.

19.- Results: UniAD achieves state-of-the-art results on all investigated tasks with vision-only inputs.

20.- Future directions: Data and training strategy, shippable algorithms, and closed-loop systems.

21.- Foundation model for autonomous driving: Potential for a universal foundation model based on UniAD's principles and structures.

22.- Applications: Extending to a broad range of robotics, enabling machines to interact, navigate, and perform tasks autonomously and intelligently.

23.- Conclusion: UniAD is a step towards a foundation model for autonomous driving, opening up new possibilities in robotics.

24.- Additional information: Paper and poster session available for more details.

25.- Q&A: Speakers encourage questions through the Q&A box at the bottom of the screen.

Knowledge Vault built byDavid Vivancos 2024