Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:
Resume:
1.- Alan Turing's ideas on using machines to mimic the human mind, including the Turing test, universality of machines, memory requirements, and machine learning.
2.- Brains as decision-making devices that receive complex data and utilize it to make good decisions, with a focus on vision.
3.- Supervised learning approach of defining a task, collecting human decisions, and training machines to generate the same responses.
4.- ImageNet challenge of assigning images to 1000 categories. Performance has improved from 50% to 90% accuracy over 10 years.
5.- Question of whether high ImageNet performance implies brain-like visual decision making in neural networks.
6.- Testing generalization ability as a key to intelligence when input data or task changes.
7.- Transfer learning: Reusing features from pretrained ImageNet models as fixed representations for other vision tasks.
8.- Successes of transfer learning in saliency prediction, pose estimation, behavior tracking, showing useful generalization beyond ImageNet.
9.- Limitations: Adversarial examples, sensitivity to domains/backgrounds, texture bias show ImageNet features alone don't imply brain-like vision.
10.- Controlled experiments showing CNNs rely more on texture while humans rely more on shape for object recognition.
11.- Out-of-domain testing on noise perturbations reveals non-human-like sensitivity of standard CNNs compared to shape-based models.
12.- Removing texture bias through data augmentation makes CNNs more robust to noise like humans.
13.- Empirical findings that better out-of-domain accuracy on some datasets correlates with more human-like decisions.
14.- Counterfactual testing of smallest input changes that alter model decisions is even stronger test of human-like vision.
15.- Generative modeling enables this on MNIST, revealing human-interpretable perturbations at class boundaries.
16.- Controversial stimuli experiments systematically comparing model disagreements enables quantifying alignment with human decisions.
17.- Generative model (ABS) shows best alignment with human interpretation of ambiguous digits.
18.- Scaling up generative models to natural images requires handling combinatorial complexity of objects and scenes.
19.- Compositional generative scene model learns to sequentially render background and objects from noisy unsupervised segmentation.
20.- Learned latent representation captures meaningful perceptual properties, enables plausible recombination and intervention on scenes.
21.- Exploring invariance manifolds in neural networks to study what information is preserved or discarded.
22.- Invertible neural networks allow synthesizing "metameric" images with same output but different nuisance (non-class-specific) information.
23.- For standard CNNs, humans perceive metamers as identical to nuisance image, not class image, exposing misaligned invariances.
24.- Modified training to encourage nuisance space to be invariant to class improves human consistency of CNN invariances.
25.- Actively shaping invariances in neural networks is important direction to make their decisions more human-like.
26.- Overall perspective on using data and inductive biases to constrain learnable decision rules towards intended human-like solutions.
27.- Object-centric generative models and compositionality across scales as key ingredients for generating training data.
28.- Implicit argument that more human-like decision making, not just benchmark performance, should be goal of computer vision.
29.- Importance of out-of-domain and counterfactual testing to assess and improve human-consistency of vision models.
30.- Central role of generative models in future work to build more robust, generalizable and human-aligned computer vision systems.
Knowledge Vault built byDavid Vivancos 2024