Knowledge Vault 2/82 - ICLR 2014-2023
Been Kim ICLR 2022 - Invited Talk - Beyond interpretability: developing a language to shape our relationships with AI
<Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:

graph LR classDef medicine fill:#f9d4d4, font-weight:bold, font-size:14px; classDef science fill:#d4f9d4, font-weight:bold, font-size:14px; classDef practical fill:#d4d4f9, font-weight:bold, font-size:14px; classDef theoretical fill:#f9f9d4, font-weight:bold, font-size:14px; classDef future fill:#f9d4f9, font-weight:bold, font-size:14px; A[Been Kim
ICLR 2022] --> B[Medicine: TCAP aligns predictions
with knowledge. 1] A --> C[Science: TCAP bridges
ML and experts. 2] A --> D[Practical: TCAP's Google popularity,
UNESCO award. 3] A --> E[Science: Decomposing embeddings
reveals concepts. 4] E --> F[Science: Model attends to
class-specific features. 5] E --> G[Theoretical: Measuring concept
completeness is possible. 6] E --> H[Science: Dissect trains model
on classification gradients. 7] E --> I[Theoretical: Current expansion methods'
limitations exist. 8] A --> J[Practical: AlphaZero study provides
insight into expansion. 9] J --> K[Science: Human chess concepts
exist in AlphaZero. 10] J --> L[Science: AlphaZero's development differs
from humans'. 11] J --> M[Practical: NMF tool explores
AlphaZero's representations. 12] A --> N[Theoretical: Human-machine misalignment
inspires creativity. 13] N --> O[Practical: Mood Board enables
visual dialogue. 14] N --> P[Practical: Machine's perspective helps
artists see differently. 15] N --> Q[Practical: Concept Camera sees
through conceptual eyes. 16] N --> R[Theoretical: Human-machine dialogue projects
expand knowledge. 17] A --> S[Practical: Collaborators shaped language
influence opportunity. 18] A --> T[Theoretical: ML visualization implies
skepticism, testing need. 19] T --> U[Theoretical: Parallel science, engineering
efforts surface errors. 20] T --> V[Theoretical: Human psychology collaboration
crucial given biases. 21] T --> W[Theoretical: Balance inherent interpretability,
post-hoc explanations. 22] T --> X[Practical: Bug insertion tests
explanation methods. 23] T --> Y[Theoretical: Saliency maps may
reflect data variance. 24] T --> Z[Theoretical: Human-machine perception differences
underlie limitations. 25] A --> AA[Practical: TCAP applied to
diverse data types. 26] A --> AB[Future: Abstract interpretability language
enables alignment. 27] AB --> AC[Future: Language allows accessible
expert, layperson communication. 28] A --> AD[Science: TCAV improves NLP
model understanding. 29] A --> AE[Theoretical: Methods should consider
human cognition. 30] class B,C medicine; class D,E,F,G,H,I,J,K,L,M,U,V,W,Y,Z,AA,AD,AE science; class N,O,P,Q,R,S,T,X practical; class AB,AC future;

Resume:

1.-Papers using TCAP in medicine and science provide the best evidence, allowing model predictions to align with current medical knowledge and guidelines.

2.-Using concepts familiar to doctors makes the language work for both machine learning researchers and experts in other fields.

3.-The TCAP work is widely popular at Google, highlighted by Sundar Pichai, and won a UNESCO NetExplorer Award for potential impact.

4.-To expand knowledge, examples decompose the embedding space using PCA or clustering to reveal machine concepts expressed in human-understandable ways.

5.-A trained model pays attention to tiles on a platform for one class, and humans holding dumbbells for the dumbbell class.

6.-Measuring completeness of discovered concepts is possible, though machines' concepts may be too wild to express using available images.

7.-The Dissect paper trains a generative model using gradients of a trained classification model to draw the machine's learned concepts.

8.-Limitations exist in current methods to expand knowledge, such as validating new concepts on limited synthetic datasets or with domain experts.

9.-An in-depth study of how the self-trained chess model AlphaZero sees the world provides insight into expanding shared basis with humans.

10.-Human chess concepts like material imbalance and in-check exist in AlphaZero, but when and where they are learned varies.

11.-AlphaZero's chess development differs from humans', with more diverse opening moves and an "aha" moment where skills explode and style emerges.

12.-A tool using non-negative matrix factorization allows exploring AlphaZero's representational space, marking a first step towards many potential follow-up works.

13.-Lack of alignment between humans and machines could inspire human creativity, as explored in an open-source project with designers and artists.

14.-Mood Board Search enables visual dialogue, with humans providing seeding images and machines responding based on their different representational mapping.

15.-Artists found the machine's differing perspective helped them see their own photography in new ways and escape the ordinary.

16.-Concept Camera, another open-sourced app, allows seeing through your camera from someone else's conceptual eyes.

17.-Projects bringing out surprising insights in humans represent a different way to expand knowledge through concept-based human-machine dialogue.

18.-Many collaborators over years contributed to shaping the opportunity to influence human and machine thinking and future relationships through language.

19.-Implications of the work on using visualization to diagnose machine learning errors underscore the need for skepticism and extensive testing.

20.-Parallel efforts between science and engineering, both theoretical and practical, are needed to surface errors and develop interpretability tools.

21.-Collaboration with experts in human psychology is crucial given human biases and the challenge of understanding ourselves as we develop machines.

22.-Balanced consideration of both inherently interpretable models and post-hoc explanation methods is warranted given the current state of knowledge.

23.-Testing explanation methods by intentionally inserting bugs is important to verify they actually detect known problems before practical use.

24.-Saliency maps may reflect data distribution variance rather than prediction-relevant information, so testing is key to determine explanation method fit.

25.-Fundamental differences in how humans and machines perceive pixel-level data may underlie some observed saliency map limitations.

26.-General interpretability tools like TCAP have been successfully applied across diverse data types including language, audio, images, and medical data.

27.-An abstract interpretability language envisioned to enable broad alignment is not yet realized but an aspirational long-term goal.

28.-The language could eventually allow both experts and laypeople to communicate insights to machine learning models in accessible ways.

29.-Potential applications of TCAV to improve NLP model understanding, like for abusive language detection, are an intriguing area of study.

30.-Effective interpretability methods should consider human cognition, like leveraging our quick parsing of visual information or domain-relevant sequential thinking constraints.

Knowledge Vault built byDavid Vivancos 2024