Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:
Resume:
1.- Learning to see the human way involves modeling the physical world, not just finding patterns in images.
2.- Early computer vision focused on labeling what humans can in images, which is coherent and practical but limited.
3.- True human vision involves "seeing" things that are occluded or invisible by leveraging knowledge of physics and objecthood.
4.- Human intelligence is about modeling the world, not just data, to explain observations, imagine possibilities, and achieve goals through planning.
5.- Cognitive AI views intelligence as model-building, with learning as the construction of new models based on interactions with the world.
6.- Humans can infer invisible objects and properties in scenes by reasoning about physics, objecthood, and causality.
7.- AI has focused more on pattern recognition and function approximation, capturing only part of what constitutes intelligence.
8.- Two views of "seeing the human way": 1) Labeling images like humans do, 2) Making sense of the world from visual input
9.- Core knowledge in humans, present early in infancy, includes intuitive physics, psychology, and other domains for reasoning about the world.
10.- Intuitive physics allows infants and adults to understand object permanence, solidity, support, stability, and causal interactions from visual observations.
11.- Mental simulation, akin to a "game engine in your head", may underlie human physical scene understanding and interaction planning.
12.- Cognitive AI aims to combine the strengths of probabilistic, symbolic, and neural approaches, integrated via techniques like probabilistic programming.
13.- Inverse graphics infers 3D scene structure from images, a foundation for human-like vision; recent progress comes from differentiable rendering.
14.- Ideal vision systems see "independently movable objects" to support reasoning about scene dynamics and afforances for action.
15.- Neural architectures incorporating inductive biases about physics, objecthood, and causality show promise for human-like visual scene understanding.
16.- Flexible, object-centric scene representations that combine logic, probability, and neural networks can overcome limitations of conventional recognition pipelines.
17.- Probabilistic programs can express rich generative models for physics-based vision, with programmable inference to adaptively solve scene understanding tasks.
18.- The "Bottle Cap Challenge" tests whether vision systems can segment and model novel objects with partial observability by leveraging physics understanding.
19.- The "General Game Inverse Graphics Challenge" tests transfer of visual understanding to novel virtual worlds with different appearance and physics.
20.- Progress on these challenges may come from integrating differentiable rendering, probabilistic programming, and insights from studies of human cognition.
21.- Humans effortlessly parse 3D structure, dynamics, and afforances in novel scenes and transfer knowledge to new environments in near-zero-shot ways.
22.- A key goal is vision systems that rapidly learn generative models to infer occluded objects/properties by combining physics and few observations.
23.- Biological vision likely relies on physics-based representations and simulation, not just pattern recognition, implicating areas beyond the ventral stream.
24.- Probabilistic approaches are crucial for quantifying uncertainty in vision to drive information-seeking behaviors when world models are inapplicable.
25.- Improved performance on tasks is not equivalent to human-level understanding; we must distinguish small steps from reaching key human abilities.
26.- Exciting progress is being made in physically-grounded AI systems that learn generative world models in a more human-like way.
27.- However, major open challenges remain in flexibly transferring knowledge to novel environments with different appearance and physical dynamics.
28.- Success on these challenges will require combining tools from modern ML, classic vision, cognitive science, and studies of biological intelligence.
29.- The field should aim to create systems that learn rapidly and transfer flexibly, not just achieve incremental gains on narrow tasks.
30.- Key to progress is precise formulation of challenges that push the boundaries of artificial visual understanding towards more human-like capabilities.
Knowledge Vault built byDavid Vivancos 2024