Reverse engineering neuroscience and cognitive science

Aude Oliva

**Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:**

graph LR
classDef main fill:#f9d4f9, font-weight:bold, font-size:14px
classDef neuro fill:#f9d4d4, font-weight:bold, font-size:14px
classDef ai fill:#d4f9d4, font-weight:bold, font-size:14px
classDef optimization fill:#d4d4f9, font-weight:bold, font-size:14px
classDef generalization fill:#f9f9d4, font-weight:bold, font-size:14px
classDef future fill:#d4f9f9, font-weight:bold, font-size:14px
Main[Reverse engineering neuroscience

and cognitive science] --> A[Neuroscience and

Brain Insights] Main --> B[AI and Neural

Networks] Main --> C[Optimization and

Training] Main --> D[Generalization and

Overfitting] Main --> E[Future Directions] A --> A1[Neuroscience informs AI system

design 1] A --> A2[Brain has specialized information

processing regions 2] A --> A3[Neuroimaging reveals sequential neural

representations 3] A --> A4[Sounds processed differently than

images 4] A --> A5[Brain connectivity changes from

birth 5] A --> A6[Human brain extremely complex,

constantly growing 9] B --> B1[Three-second videos match working

memory capacity 6] B --> B2[Memory game reveals consistently

memorable videos 7] B --> B3[GANs generate memorable images

using insights 8] B --> B4[CNN confidence misaligns with

human recognition 10] B --> B5[AVH metric correlates with

human recognition 11] B --> B6[AVH plateaus early in

CNN training 12] C --> C1[Neural network loss landscapes

decomposed 15] C --> C2[Wide networks have convex-like

optimization 16] C --> C3[Poor initialization creates negative

curvature region 17] C --> C4[Saddle points slow gradient

descent 19] C --> C5[Escaping saddles requires selective

neuron saturation 20] C --> C6[Initialization tradeoff: fast training

vs generalization 28] D --> D1[AVH reflects model generalization

ability 13] D --> D2[CNNs robust to AVH

adversarial attacks 14] D --> D3[Overparameterization generalizes well via

regularization 21] D --> D4[Null space protects from

overfitting 23] D --> D5[Sharp-flat minima dont explain

generalization 24] D --> D6[Large weights hurt generalization

via null-space 27] E --> E1[Deep networks learn hierarchical

distinctions 18] E --> E2[High-dimensional data covariance eigenvalues

split 22] E --> E3[Rectified networks show double

descent curve 25] E --> E4[Model error decomposed into

three parts 26] E --> E5[Deep nets learn complex

data structures 29] E --> E6[Future: learn generative structures,

avoid overfitting 30] class Main main class A,A1,A2,A3,A4,A5,A6 neuro class B,B1,B2,B3,B4,B5,B6 ai class C,C1,C2,C3,C4,C5,C6 optimization class D,D1,D2,D3,D4,D5,D6 generalization class E,E1,E2,E3,E4,E5,E6 future

and cognitive science] --> A[Neuroscience and

Brain Insights] Main --> B[AI and Neural

Networks] Main --> C[Optimization and

Training] Main --> D[Generalization and

Overfitting] Main --> E[Future Directions] A --> A1[Neuroscience informs AI system

design 1] A --> A2[Brain has specialized information

processing regions 2] A --> A3[Neuroimaging reveals sequential neural

representations 3] A --> A4[Sounds processed differently than

images 4] A --> A5[Brain connectivity changes from

birth 5] A --> A6[Human brain extremely complex,

constantly growing 9] B --> B1[Three-second videos match working

memory capacity 6] B --> B2[Memory game reveals consistently

memorable videos 7] B --> B3[GANs generate memorable images

using insights 8] B --> B4[CNN confidence misaligns with

human recognition 10] B --> B5[AVH metric correlates with

human recognition 11] B --> B6[AVH plateaus early in

CNN training 12] C --> C1[Neural network loss landscapes

decomposed 15] C --> C2[Wide networks have convex-like

optimization 16] C --> C3[Poor initialization creates negative

curvature region 17] C --> C4[Saddle points slow gradient

descent 19] C --> C5[Escaping saddles requires selective

neuron saturation 20] C --> C6[Initialization tradeoff: fast training

vs generalization 28] D --> D1[AVH reflects model generalization

ability 13] D --> D2[CNNs robust to AVH

adversarial attacks 14] D --> D3[Overparameterization generalizes well via

regularization 21] D --> D4[Null space protects from

overfitting 23] D --> D5[Sharp-flat minima dont explain

generalization 24] D --> D6[Large weights hurt generalization

via null-space 27] E --> E1[Deep networks learn hierarchical

distinctions 18] E --> E2[High-dimensional data covariance eigenvalues

split 22] E --> E3[Rectified networks show double

descent curve 25] E --> E4[Model error decomposed into

three parts 26] E --> E5[Deep nets learn complex

data structures 29] E --> E6[Future: learn generative structures,

avoid overfitting 30] class Main main class A,A1,A2,A3,A4,A5,A6 neuro class B,B1,B2,B3,B4,B5,B6 ai class C,C1,C2,C3,C4,C5,C6 optimization class D,D1,D2,D3,D4,D5,D6 generalization class E,E1,E2,E3,E4,E5,E6 future

**Resume: **

**1.-** Aude Oliva discusses how neuroscience and cognitive science can inform the design of artificial intelligence systems.

**2.-** The human brain has specialized regions for processing different types of information, like the visual and auditory cortices.

**3.-** Neuroimaging allows mapping brain activity in space and time when perceiving images, revealing a sequence of neural representations.

**4.-** Sounds are processed differently than images in the brain, recruiting more regions and persisting longer after the stimulus ends.

**5.-** The brain's connectivity changes massively from birth to adulthood, with significant pruning of neural connections, especially in the visual cortex.

**6.-** The 3-second videos in the Moments in Time dataset correspond to the capacity of human working memory for meaningful events.

**7.-** A memory game reveals some videos are consistently memorable while others are quickly forgotten, useful for designing memorable AI systems.

**8.-** GANs can be trained to generate images optimized to be memorable to humans by leveraging insights from human memory.

**9.-** The human brain is extremely complex with 100 billion neurons, 100 trillion connections, and 1,000 new neurons added daily.

**10.-** In CNNs, model confidence does not align well with human visual hardness of recognizing an image.

**11.-** However, the angular visual hardness (AVH) metric, based on angles between embeddings and weights, correlates strongly with human recognition.

**12.-** AVH plateaus early in CNN training while accuracy continues improving, suggesting it is not directly optimized by the objective function.

**13.-** AVH reflects model generalization ability, with lower final AVH scores for better generalizing models like ResNet vs AlexNet.

**14.-** CNNs are robust to adversarial attacks in terms of AVH - large perturbations are needed to substantially change an image's AVH.

**15.-** The curvature tensor of neural network loss landscapes can be decomposed into a positive definite part G and indefinite part H.

**16.-** For properly initialized wide networks, the G part dominates the Hessian, making optimization more convex-like.

**17.-** Poor initialization leads to an initial negative curvature region that networks must escape, creating a gap between G and H.

**18.-** Deep linear networks exhibit "progressive differentiation" - sequentially learning finer category distinctions aligned with a ground-truth hierarchy.

**19.-** Saddle points slow gradient descent in neural nets and arise from degeneracies to simpler models when neurons or weights are redundant.

**20.-** Escaping saddle points requires saturating some but not all neurons' activations to introduce helpful nonlinearities.

**21.-** Overparameterized networks can generalize well, defying traditional statisticlearning theory, due to implicit regularization from optimization.

**22.-** In high dimensions, eigenvalues of the data covariance split into a bulk and a spike at zero as data becomes scarce.

**23.-** Neural nets are protected from overfitting by a data null space of zero eigenvalues and an eigengap separating nonzero eigenvalues from zero.

**24.-** Sharp vs flat minima don't fully explain generalization; zero eigenvalue directions are always "flat" but can still hurt test error.

**25.-** Rectified linear networks also show the "double descent" generalization curve, overfitting only when the overparameterization matches dataset size.

**26.-** Model error can be decomposed into approximation, estimation, and null space errors, the last due to variability in data-free directions.

**27.-** Large initial weights inflate the null space error, hurting generalization; small initialization is needed for overparameterized nets.

**28.-** Consequently, a tradeoff emerges between fast training (large initialization) and good generalization (small initialization) in deep learning.

**29.-** Deep nets can learn complex representations like hierarchies and graphs from the structure of training data without explicit encoding.

**30.-** Future work aims to encourage deep nets to learn true generative structures and avoid overfitting to noise in complex real-world settings.

Knowledge Vault built byDavid Vivancos 2024