The End Of Knowledge - Vault 6/44 - CVPR - 2019 - Reverse engineering neuroscience and cognitive science

graph LR classDef main fill:#f9d4f9, font-weight:bold, font-size:14px classDef neuro fill:#f9d4d4, font-weight:bold, font-size:14px classDef ai fill:#d4f9d4, font-weight:bold, font-size:14px classDef optimization fill:#d4d4f9, font-weight:bold, font-size:14px classDef generalization fill:#f9f9d4, font-weight:bold, font-size:14px classDef future fill:#d4f9f9, font-weight:bold, font-size:14px Main[Reverse engineering neuroscience
and cognitive science] --> A[Neuroscience and
Brain Insights] Main --> B[AI and Neural
Networks] Main --> C[Optimization and
Training] Main --> D[Generalization and
Overfitting] Main --> E[Future Directions] A --> A1[Neuroscience informs AI system
design 1] A --> A2[Brain has specialized information
processing regions 2] A --> A3[Neuroimaging reveals sequential neural
representations 3] A --> A4[Sounds processed differently than
images 4] A --> A5[Brain connectivity changes from
birth 5] A --> A6[Human brain extremely complex,
constantly growing 9] B --> B1[Three-second videos match working
memory capacity 6] B --> B2[Memory game reveals consistently
memorable videos 7] B --> B3[GANs generate memorable images
using insights 8] B --> B4[CNN confidence misaligns with
human recognition 10] B --> B5[AVH metric correlates with
human recognition 11] B --> B6[AVH plateaus early in
CNN training 12] C --> C1[Neural network loss landscapes
decomposed 15] C --> C2[Wide networks have convex-like
optimization 16] C --> C3[Poor initialization creates negative
curvature region 17] C --> C4[Saddle points slow gradient
descent 19] C --> C5[Escaping saddles requires selective
neuron saturation 20] C --> C6[Initialization tradeoff: fast training
vs generalization 28] D --> D1[AVH reflects model generalization
ability 13] D --> D2[CNNs robust to AVH
adversarial attacks 14] D --> D3[Overparameterization generalizes well via
regularization 21] D --> D4[Null space protects from
overfitting 23] D --> D5[Sharp-flat minima dont explain
generalization 24] D --> D6[Large weights hurt generalization
via null-space 27] E --> E1[Deep networks learn hierarchical
distinctions 18] E --> E2[High-dimensional data covariance eigenvalues
split 22] E --> E3[Rectified networks show double
descent curve 25] E --> E4[Model error decomposed into
three parts 26] E --> E5[Deep nets learn complex
data structures 29] E --> E6[Future: learn generative structures,
avoid overfitting 30] class Main main class A,A1,A2,A3,A4,A5,A6 neuro class B,B1,B2,B3,B4,B5,B6 ai class C,C1,C2,C3,C4,C5,C6 optimization class D,D1,D2,D3,D4,D5,D6 generalization class E,E1,E2,E3,E4,E5,E6 future

Resume:

1.- Aude Oliva discusses how neuroscience and cognitive science can inform the design of artificial intelligence systems.

2.- The human brain has specialized regions for processing different types of information, like the visual and auditory cortices.

3.- Neuroimaging allows mapping brain activity in space and time when perceiving images, revealing a sequence of neural representations.

4.- Sounds are processed differently than images in the brain, recruiting more regions and persisting longer after the stimulus ends.

5.- The brain's connectivity changes massively from birth to adulthood, with significant pruning of neural connections, especially in the visual cortex.

6.- The 3-second videos in the Moments in Time dataset correspond to the capacity of human working memory for meaningful events.

7.- A memory game reveals some videos are consistently memorable while others are quickly forgotten, useful for designing memorable AI systems.

8.- GANs can be trained to generate images optimized to be memorable to humans by leveraging insights from human memory.

9.- The human brain is extremely complex with 100 billion neurons, 100 trillion connections, and 1,000 new neurons added daily.

10.- In CNNs, model confidence does not align well with human visual hardness of recognizing an image.

11.- However, the angular visual hardness (AVH) metric, based on angles between embeddings and weights, correlates strongly with human recognition.

12.- AVH plateaus early in CNN training while accuracy continues improving, suggesting it is not directly optimized by the objective function.

13.- AVH reflects model generalization ability, with lower final AVH scores for better generalizing models like ResNet vs AlexNet.

14.- CNNs are robust to adversarial attacks in terms of AVH - large perturbations are needed to substantially change an image's AVH.

15.- The curvature tensor of neural network loss landscapes can be decomposed into a positive definite part G and indefinite part H.

16.- For properly initialized wide networks, the G part dominates the Hessian, making optimization more convex-like.

17.- Poor initialization leads to an initial negative curvature region that networks must escape, creating a gap between G and H.

18.- Deep linear networks exhibit "progressive differentiation" - sequentially learning finer category distinctions aligned with a ground-truth hierarchy.

19.- Saddle points slow gradient descent in neural nets and arise from degeneracies to simpler models when neurons or weights are redundant.

20.- Escaping saddle points requires saturating some but not all neurons' activations to introduce helpful nonlinearities.

21.- Overparameterized networks can generalize well, defying traditional statisticlearning theory, due to implicit regularization from optimization.

22.- In high dimensions, eigenvalues of the data covariance split into a bulk and a spike at zero as data becomes scarce.

23.- Neural nets are protected from overfitting by a data null space of zero eigenvalues and an eigengap separating nonzero eigenvalues from zero.

24.- Sharp vs flat minima don't fully explain generalization; zero eigenvalue directions are always "flat" but can still hurt test error.

25.- Rectified linear networks also show the "double descent" generalization curve, overfitting only when the overparameterization matches dataset size.

26.- Model error can be decomposed into approximation, estimation, and null space errors, the last due to variability in data-free directions.

27.- Large initial weights inflate the null space error, hurting generalization; small initialization is needed for overparameterized nets.

28.- Consequently, a tradeoff emerges between fast training (large initialization) and good generalization (small initialization) in deep learning.

29.- Deep nets can learn complex representations like hierarchies and graphs from the structure of training data without explicit encoding.

30.- Future work aims to encourage deep nets to learn true generative structures and avoid overfitting to noise in complex real-world settings.

Knowledge Vault built byDavid Vivancos 2024