The End Of Knowledge - Vault 2 - ICLR (2014-2023)

graph LR classDef neuron fill:#f9d4d4, font-weight:bold, font-size:14px; classDef representation fill:#d4f9d4, font-weight:bold, font-size:14px; classDef concept fill:#d4d4f9, font-weight:bold, font-size:14px; classDef hypothesis fill:#f9f9d4, font-weight:bold, font-size:14px; classDef method fill:#f9d4f9, font-weight:bold, font-size:14px; A[Yixuan Li et al
ICLR 2016] --> B["Grandmother neuron": one neuron,
one concept. 1] A --> C[Local representations: one-to-one
concept-neuron. Evidence exists. 2] C --> D[Distributed representations: concept
across repurposed neurons. 3] A --> E[Measuring representation types
in artificial neural networks. 4] E --> F[Previous work suggested
distributed representations. 5] E --> G[Authors showed local
representations examples. 6] A --> H[Test representation types:
labeled concept datasets. 7] H --> I[Requires lots of labeled data.
Done for human-recognizable concepts. 8] H --> J[Doesn't scale well, biased
toward human-recognizable concepts. 9] A --> K[Concepts: reliably learned feature
subspaces across networks. 10] K --> L[Local codes: one-to-one
features across networks. 11] K --> M[Partially distributed hypothesis:
reliable low-dimensional subspaces. 12] A --> N[Study: AlexNet on ImageNet,
different initializations. 13] N --> O[Compute neuron activation
correlations between networks. 14] O --> P[High correlations indicate
neurons firing for same concepts. 15] O --> Q[Greedy and bipartite matching
used to align networks. 16] Q --> R[Agreement and high correlation
suggest local codes. 17] Q --> S[Units with no match indicate
network-unique features. 18] Q --> T[Disagreement may indicate
partially distributed codes. 19] N --> U[Mapping layers predict activations,
reveal predictive subsets. 20] N --> V[Hierarchical clustering aligns
networks, reveals co-predictive clusters. 21] A --> W[Summary: evidence of local,
partially distributed codes, unknowns. 22] A --> X[Compare multiple trained networks
to understand representations. 23] A --> Y[Future: understand partially distributed,
vary architecture, encourage types. 24] class B,C neuron; class D,E,F,G,K,L,M representation; class H,I,J concept; class N,O,P,Q,R,S,T,U,V method; class W,X,Y hypothesis;

Resume:

1.-The concept of a "grandmother neuron" - a single neuron that corresponds to a specific concept like one's grandmother.

2.-Local representations have a one-to-one relationship between a concept and a neuron's firing. Evidence exists for this in neuroscience.

3.-An alternative is distributed representations, where a concept is represented by a pattern across several neurons that are repurposed for other concepts.

4.-It's hard to measure representation types in the brain, but we can measure them in artificial neural networks we study.

5.-Previous work by Szegedy et al. in 2013 suggested representations are distributed, but the authors think this is an incomplete story.

6.-The authors previously showed examples of locally represented concepts like spiders, water, text in the middle of some networks.

7.-To test representation types, you could assemble labeled datasets of concepts and see if single neurons always fire for one concept.

8.-This requires lots of labeled data. Zhao et al. did this with Mechanical Turk for some concepts humans recognize, like lamps.

9.-But this approach doesn't scale well and is biased toward human-recognizable concepts. What about unnamed concepts that are still important?

10.-Key assumption: concepts are feature subspaces reliably learned in multiple networks. Allows probing representations by comparing networks from different initializations.

11.-With local codes, features in one net match features in another net subject to permutation. Distributed codes use arbitrary rotated basis vectors.

12.-Partially distributed hypothesis: reliably learned low-dimensional subspaces, but arbitrary rotations within subspace between networks.

13.-The study uses AlexNet trained on ImageNet with identical architecture but different initializations. Performance is very similar between networks.

14.-To find one-to-one unit matches between networks, they compute correlation statistics of neuron activations after running ImageNet through both networks.

15.-High correlations indicate neurons in the two networks are firing for the same concepts, suggesting those individual concepts are important.

16.-Greedy matching (picking max correlation for each neuron) and max weighted bipartite matching (unique matches between networks) are used to align networks.

17.-Where both matching methods agree and have high correlation, this suggests local codes being used, as expected for that representation type.

18.-Some units in one network have no high correlation match in the other, indicating features unique to each network, possibly enabling ensembles.

19.-Where matching methods disagree, partially distributed codes may be in use - e.g. one net using more units to span a subspace.

20.-One-to-one matching explains some but not all of the network. Next they look for subsets of units in one net predicting another.

21.-Mapping layers predict one net's activations from the other. Increasing sparsity reveals small subsets of units can predict well.

22.-Hierarchical clustering aligns the two networks to reveal co-predictive clusters - e.g. a 4D subspace of edge filters.

23.-In summary, they find some evidence of local codes, hints of partially distributed codes in some layers, and some still unexplained aspects.

24.-This is an interesting research direction - training multiple networks and comparing them to understand the learned representations.

25.-Future work could better understand partially distributed codes, examine how this varies with architecture, and potentially encourage certain representation types during training.

26.-Code is available online. Thanks given to co-authors.

Knowledge Vault built byDavid Vivancos 2024