Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:
Resume:
1.- Embodied vision: Vision systems for agents that act purposefully in their environment, not just static systems. Combines perception and action.
2.- Embodied intelligence: Purposeful exchange of energy and information with the environment. Requires thinking about consequences of movement and uncertainty.
3.- Uncertainty in computer vision: Modern CV techniques like deep learning often don't handle uncertainty well, which is crucial for embodied systems.
4.- Interaction with environment: In embodied vision, the agent's actions change the environment, objects, and data distribution it encounters.
5.- Robot learning for action policies: Much robotics research neglects advances in computer vision. More work needed on visual representations for RL.
6.- Self-supervised learning for RL: Explored using self-supervised computer vision losses for RL, but encountered difficulties. A better framework is needed.
7.- Sim-to-real transfer: Interaction with the environment is expensive in the real world. Simulation is cheaper but sim-to-real transfer is challenging.
8.- Video understanding for human activity: Going beyond recognition to understand human intentions, affordances, and anticipate actions.
9.- Egocentric video datasets: Large egocentric video datasets like Ego4D enable learning from human experience to inform embodied agent behavior.
10.- Embodied audiovisual learning: Important for embodied agents to learn spatial audio to understand 3D environments and interact.
11.- Abstracting away low-level control: Some argue abstracted actions are okay if the application allows. Others believe reasoning about dynamics/forces is essential.
12.- Photorealistic simulation: Has improved but still lacks realism in areas like physics and camera operation. Not a silver bullet.
13.- Data distribution shifts: In embodied vision, the visual data distribution continuously changes based on the agent's actions, unlike static datasets.
14.- World models and dreaming: Learning dynamics models from data to simulate environments. Extremely difficult, but high potential if possible.
15.- Benchmarking and reproducibility: Simulation enables benchmarking embodied vision systems, but reproducibility and sim-to-real remain significant open challenges.
16.- Value of simulation: Blessing for scaling up experience and evaluation. Curse in still lacking complete realism. Important research tool.
17.- Lack of humans in simulators: Current embodied AI simulators lack humans. Dynamic modeling of human behavior in simulated environments is a key opportunity.
18.- Assistive robotics timeline: Steady rollout of robotic systems helping in daily life expected, but timeline is uncertain and overhyped.
19.- Self-driving progress: Significant strides like initial commercial operations, but ubiquitous autonomy still far off. Will emerge in focused domains first.
20.- Human-robot interaction: Often neglected in robotics research in favor of navigation and manipulation. Proper HRI critical for real-world deployment.
21.- Performance modeling and guarantees: Formal methods to model robotic system performance are critical for real-world deployment, but currently lacking.
22.- Task-driven representations: Embodied vision provides a concrete task to drive representation learning, not just accuracy for its own sake.
23.- Hierarchical representations: Potential need for hierarchical, symbolic, abstract representations to enable efficient reasoning and strong generalization.
24.- Language as a prior: Language models may provide useful priors or knowledge for embodied AI, but significant work remains to leverage them.
25.- Composable models: Future embodied AI likely requires composable models to handle complexity, similar to early AI paradigms, not just end-to-end neural nets.
26.- Robustness and generalization: Embodied AI systems need to be robust and generalize well to real-world deployment with novel situations.
27.- Integrating multiple modalities: Embodied perception should leverage multiple sensor modalities (vision, audio, touch, etc.) to better understand and act.
28.- Lifelong learning: Embodied agents have the opportunity to keep learning and adapting over their lifespan as they encounter new experiences.
29.- Simulation for human environments: Photorealistic simulation of human-centric spaces and activities could accelerate development of assistive embodied AI if done well.
30.- Rethinking problem formulations: As embodied AI advances, many existing computer vision problem setups and assumptions may need fundamental rethinking to integrate with real-world systems.
Knowledge Vault built byDavid Vivancos 2024