Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:
Resume:
1.- Aude Oliva discusses how neuroscience and cognitive science can inform the design of artificial intelligence systems.
2.- The human brain has specialized regions for processing different types of information, like the visual and auditory cortices.
3.- Neuroimaging allows mapping brain activity in space and time when perceiving images, revealing a sequence of neural representations.
4.- Sounds are processed differently than images in the brain, recruiting more regions and persisting longer after the stimulus ends.
5.- The brain's connectivity changes massively from birth to adulthood, with significant pruning of neural connections, especially in the visual cortex.
6.- The 3-second videos in the Moments in Time dataset correspond to the capacity of human working memory for meaningful events.
7.- A memory game reveals some videos are consistently memorable while others are quickly forgotten, useful for designing memorable AI systems.
8.- GANs can be trained to generate images optimized to be memorable to humans by leveraging insights from human memory.
9.- The human brain is extremely complex with 100 billion neurons, 100 trillion connections, and 1,000 new neurons added daily.
10.- In CNNs, model confidence does not align well with human visual hardness of recognizing an image.
11.- However, the angular visual hardness (AVH) metric, based on angles between embeddings and weights, correlates strongly with human recognition.
12.- AVH plateaus early in CNN training while accuracy continues improving, suggesting it is not directly optimized by the objective function.
13.- AVH reflects model generalization ability, with lower final AVH scores for better generalizing models like ResNet vs AlexNet.
14.- CNNs are robust to adversarial attacks in terms of AVH - large perturbations are needed to substantially change an image's AVH.
15.- The curvature tensor of neural network loss landscapes can be decomposed into a positive definite part G and indefinite part H.
16.- For properly initialized wide networks, the G part dominates the Hessian, making optimization more convex-like.
17.- Poor initialization leads to an initial negative curvature region that networks must escape, creating a gap between G and H.
18.- Deep linear networks exhibit "progressive differentiation" - sequentially learning finer category distinctions aligned with a ground-truth hierarchy.
19.- Saddle points slow gradient descent in neural nets and arise from degeneracies to simpler models when neurons or weights are redundant.
20.- Escaping saddle points requires saturating some but not all neurons' activations to introduce helpful nonlinearities.
21.- Overparameterized networks can generalize well, defying traditional statisticlearning theory, due to implicit regularization from optimization.
22.- In high dimensions, eigenvalues of the data covariance split into a bulk and a spike at zero as data becomes scarce.
23.- Neural nets are protected from overfitting by a data null space of zero eigenvalues and an eigengap separating nonzero eigenvalues from zero.
24.- Sharp vs flat minima don't fully explain generalization; zero eigenvalue directions are always "flat" but can still hurt test error.
25.- Rectified linear networks also show the "double descent" generalization curve, overfitting only when the overparameterization matches dataset size.
26.- Model error can be decomposed into approximation, estimation, and null space errors, the last due to variability in data-free directions.
27.- Large initial weights inflate the null space error, hurting generalization; small initialization is needed for overparameterized nets.
28.- Consequently, a tradeoff emerges between fast training (large initialization) and good generalization (small initialization) in deep learning.
29.- Deep nets can learn complex representations like hierarchies and graphs from the structure of training data without explicit encoding.
30.- Future work aims to encourage deep nets to learn true generative structures and avoid overfitting to noise in complex real-world settings.
Knowledge Vault built byDavid Vivancos 2024