The End Of Knowledge - Vault 2 - ICLR (2014-2023)

graph LR classDef main fill:#f9d4d4, font-weight:bold, font-size:14px; classDef visual fill:#d4f9d4, font-weight:bold, font-size:14px; classDef receptive fill:#d4d4f9, font-weight:bold, font-size:14px; classDef cnn fill:#f9f9d4, font-weight:bold, font-size:14px; classDef retina fill:#f9d4f9, font-weight:bold, font-size:14px; classDef model fill:#d4f9f9, font-weight:bold, font-size:14px; classDef biology fill:#f9d4d4, font-weight:bold, font-size:14px; A[Jack Lindsey et al
ICLR 2019] --> B[Visual representations in brain
using CNNs 1] A --> C[Neurons: receptive fields for
stimuli patterns 2] C --> D[Retina: center-surround, V1: edge-detecting 3] B --> E[Why receptive fields differ
in shapes 4] A --> F[CNNs map onto human/monkey
visual cortex 5] F --> G[CNNs skip center-surround,
perform edge detection 6] A --> H[4-layer CNN explores biological
receptive fields 7] H --> I[Baseline model learns edge-detecting filters 8] A --> J[Retina, brain distinct, optic
nerve bottleneck 9] J --> K[Retina output fewer neurons
than photoreceptors, V1 10] H --> L[Bottleneck constraint produces center-surround 11] L --> M[After expansion, edge-like receptive
fields reemerge 12] H --> N[Constraints account for receptive
field differences 13] H --> O[Model simplified, nonlinearity needs consideration 14] A --> P[Humans, macaques: quasi-linear center-surround 15] P --> Q[Mice: nonlinear cell types 15] Q --> R[Linear cells: no semantic features 16] Q --> S[Nonlinear cells: specific functions 16] A --> T[Brain, visual system sophistication varies 17] T --> U[Mice: 1M visual cortex neurons 17] T --> V[Humans: billions of neurons 17] A --> W[Retinal representations change with
downstream sophistication 18] W --> X[Deep networks: retina preserves input 19] W --> Y[Shallow networks: retina extracts features 19] X --> Z[Retinal linearity increases with
network depth 20] Y --> AA[Separability decreases with network depth 20] Y --> AB[Shallow networks induce task-relevant
retinal features 21] A --> AC[Biology: linear retina in
sophisticated cortex animals 22] AC --> AD[Reverse in smaller mammals 22] H --> AE[Trends arise from visual
system-layer interaction 23] AE --> AF[No bottleneck: no linearity,
separability trends 24] A --> AG[Model accounts for receptive
fields, species differences 25] AG --> AH[Architecture, layer variations mimic
mammalian differences 26] A --> AI[Details in poster session 27] A --> AJ[Utility of linear neurons questioned 28] AJ --> AK[Networks on retinal outputs
outperform raw inputs 29] AK --> AL[Center-surround may benefit optimization,
semantic separation 30] class A main; class B,C,D,E,P,Q,R,S,T,U,V,AC,AD visual; class F,G cnn; class H,I,N,O,AE,AF,AG,AH,AI model; class J,K,W,X,Y,Z,AA,AB,AJ,AK,AL retina; class L,M receptive; class A,B main; class C,D,E,P,Q,R,S,T,U,V,AC,AD visual; class F,G,H,I,N,O,AE,AF,AG,AH,AI model; class J,K,W,X,Y,Z,AA,AB,AJ,AK,AL retina; class L,M receptive;

Resume:

1.-The talk explores using convolutional neural networks (CNNs) to understand visual representations in the brain, focusing on the retina and visual cortex.

2.-Neurons in visual processing layers are characterized by receptive fields, describing spatial patterns of stimuli that activate or inhibit them.

3.-Retinal ganglion cells exhibit center-surround receptive fields, while neurons in the first layer of visual cortex (V1) show edge-detecting patterns.

4.-The talk aims to understand why receptive fields have these shapes and differ between the retina and early visual cortex.

5.-Convolutional neural networks trained on image processing tasks learn representations that map well onto those in human or monkey visual cortex.

6.-However, early layer filters in trained CNNs typically perform edge detection, skipping the characteristic center-surround retina-like stage seen in biological systems.

7.-The study uses a simple 4-layer CNN trained on CIFAR-10 to explore architectural changes that induce biologically observed receptive field patterns.

8.-The baseline model's early layer neurons learn edge-detecting filters, prompting the question of what's missing to produce more biological results.

9.-The retina and brain are distinct entities connected by the optic nerve, which imposes a physical bottleneck on communication.

10.-The retina's output has fewer neurons than the preceding photoreceptors and the first layer of visual cortex (V1).

11.-Incorporating a bottleneck constraint in the model by reducing neurons in the "retina output" layer leads to emergent center-surround receptive fields.

12.-The subsequent layer, after dimensionality expansion akin to V1, exhibits reemergent edge-like receptive fields.

13.-Architectural constraints can account for differences in receptive field patterns across visual processing layers.

14.-The model is simplified, and additional complexities like non-linearity in early layer neurons need to be considered.

15.-In humans and macaques, quasi-linear center-surround models describe retinal ganglion cell outputs well, while mice have many non-linear cell types.

16.-Linear cells don't extract semantically interesting features, while non-linear cells (like W3 cells in mice) may serve specific functions.

17.-The sophistication of the brain and visual system differs between mice (1 million visual cortex neurons) and humans (a few billion).

18.-The study explores how retinal representations change with the sophistication of the downstream visual processing pipeline.

19.-In models with deeper downstream components, the retinal layer preserves input characteristics, while shallower networks induce non-linear feature extraction.

20.-Quantitatively, retinal response linearity increases with downstream network depth, while object class separability decreases, suggesting information preservation vs. feature extraction.

21.-Shallower downstream networks induce more task-relevant feature extraction in the retinal stage.

22.-This maps onto biology, with more sophisticated visual cortex animals exhibiting linear retinal responses and the reverse in smaller mammals.

23.-These trends are not generic properties of CNNs but arise from the interaction between visual system sophistication and early layer dimensionality.

24.-Models without the retinal output bottleneck do not show pronounced trends in linearity and separability with downstream depth.

25.-The unified model accounts for center-surround and edge-detecting receptive fields, and cross-species differences in retinal linearity and feature extraction.

26.-Varying the architecture and downstream layers can account for results akin to differences between more and less sophisticated mammals.

27.-More details on the model and specific questions are available in the poster session.

28.-The biological utility of linearly projecting neurons is questioned, as they don't perform meaningful computation.

29.-Experiments show that networks trained on retinal outputs outperform those trained on raw inputs, even with linearized retinal outputs.

30.-Center-surround retinal representations may benefit optimization and learning semantic class separations needed for useful behaviors, but more work is needed to understand this.

Knowledge Vault built byDavid Vivancos 2024