The End Of Knowledge - Vault 5/42 - CVPR - 2019 - Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Knowledge Vault 5 /42 - CVPR 2019

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang

<

Resume Image

>

Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:

graph LR classDef navigation fill:#f9d4d4, font-weight:bold, font-size:14px classDef grounding fill:#d4f9d4, font-weight:bold, font-size:14px classDef reward fill:#d4d4f9, font-weight:bold, font-size:14px classDef generalization fill:#f9f9d4, font-weight:bold, font-size:14px classDef learning fill:#f9d4f9, font-weight:bold, font-size:14px A[Reinforced Cross-Modal Matching
and Self-Supervised Imitation
Learning for Vision-Language
Navigation] --> B[Navigating 3D environments
with language. 1] B --> C[Grounding language onto visuals. 2] A --> D[Success only at destination. 3] A --> E[Calculating rewards from
history, instructions, visuals. 4] E --> F[Evaluates instruction reconstruction
from trajectory. 5] E --> G[Encourages instruction-following. 6] A --> H[Models fail to generalize. 7] A --> I[Self-supervised exploration
of unseen environments. 8] I --> J[Generates trajectories for
critic evaluation. 9] I --> K[Stores best trajectories
for imitation. 10] I --> L[Approximates better policy
for new environments. 11] I --> M[One trajectory per instruction. 12] I --> N[Robot improves with
house familiarity. 13] I --> O[Successful instruction-following
after exploration. 14] A --> P[Outperforms baseline, improves
unseen performance. 15] A --> Q[Reduces seen-unseen
performance gap. 16] class B navigation class C grounding class D,E,F,G reward class H generalization class I,J,K,L,M,N,O,P,Q learning

Resume:

1.- Vision and Language Navigation: Navigating an embodied agent in a 3D environment using natural language instructions.

2.- Cross-modal grounding: Grounding natural language instructions onto local visual scenes and global visual trajectories.

3.- Sparse reward issue: Success signal only given when the agent reaches the destination, ignoring instruction-following.

4.- Reinforced Cross-Modal Matching: Method to calculate extrinsic rewards based on history, instruction, and visual context.

5.- Matching critic: Evaluates the extent to which the original instruction can be reconstructed from the generated trajectory.

6.- Cycle reconstruction reward: Intrinsic reward used to train the navigator, encouraging instruction-following.

7.- Generalization issue: Models fail to generalize well to unseen environments.

8.- Self-Supervised Imitation Learning (SIL): Learning to explore unseen environments with self-supervision.

9.- Unlabeled instruction: Used in SIL to generate trajectories, which are evaluated by the matching critic.

10.- Replay buffer: Stores the best trajectories generated during SIL for the navigator to imitate.

11.- Learning from past good behaviors: SIL allows the model to approximate a better policy for new environments.

12.- Test time: Navigator performs one trajectory per instruction, making SIL helpful in practice.

13.- In-home robot: Example application where SIL can help the robot improve as it becomes familiar with the house.

14.- Example before and after SIL: Agent successfully follows instructions and reaches destination after exploring with SIL.

15.- Results on unseen test set: RCM model outperforms baseline speaker-follow model, and SIL significantly improves SPL score.

16.- Performance gap reduction: SIL helps reduce the performance gap between seen and unseen environments.

Knowledge Vault built byDavid Vivancos 2024