Knowledge Vault 6 /30 - ICML 2017
Interpretable Machine Learning
Been Kim & Finale Doshi-Velez
< Resume Image >

Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:

graph LR classDef importance fill:#f9d4d4, font-weight:bold, font-size:14px classDef approaches fill:#d4f9d4, font-weight:bold, font-size:14px classDef models fill:#d4d4f9, font-weight:bold, font-size:14px classDef posthoc fill:#f9f9d4, font-weight:bold, font-size:14px classDef challenges fill:#f9d4f9, font-weight:bold, font-size:14px Main[Interpretable Machine Learning] Main --> A[Interpretable ML avoids
harmful consequences 1] A --> B[Crucial in high-stakes
domains like healthcare 2] A --> C[ML community increasingly
developing interpretability tools 3] A --> D[Complex systems understanding
issue predates ML 4] A --> E[Ubiquitous ML makes
interpretability more important 5] Main --> F[Needed for underspecified
problems lacking cost functions 6] F --> G[Examples: self-driving cars,
debugging, scientific discovery 7] F --> H[Not needed for
well-studied problems 8] Main --> I[Approaches: pre-modeling, inherently
interpretable, post-hoc explanations 9] I --> J[Facets: open-source tool
for dataset visualization 10] J --> K[Exploratory data analysis
visualizes dataset properties 11] J --> L[MMD-critic selects prototypical
and critical datapoints 12] I --> M[Inherently interpretable: rule-based,
per-feature, monotonic models 13] M --> N[Rule-based models can
become complex 14] M --> O[Generalized additive models:
complex but interpretable 15] M --> P[Case-based models use
examples to explain 16] P --> Q[Limitations: lacking representatives,
human overgeneralization 17] I --> R[Post-hoc: explain models
after theyre built 18] R --> S[Sensitivity analysis: perturb
inputs, observe outputs 19] R --> T[LIME: local interpretable
model-agnostic explanations 20] R --> U[Saliency maps: gradient
of output to input 21] R --> V[Integrated gradients: attribution
using path integral 22] R --> W[Concept activation vectors:
align with human concepts 23] R --> X[Influence functions: estimate
training point impact 24] Main --> Y[Monotonic models encode
domain knowledge 25] Y --> Z[Example-based explanations for
complex data points 26] Z --> AA[Experts can update
prototypes and criticism 27] Main --> AB[Interpretable models limitations
in relationship representation 28] AB --> AC[Feature sparsity and
monotonicity limit expressiveness 29] AB --> AD[Open questions: interdisciplinary
collaboration needed 30] class A,B,C,D,E importance class F,G,H,I approaches class J,K,L,M,N,O,P,Q models class R,S,T,U,V,W,X posthoc class Y,Z,AA,AB,AC,AD challenges

Resume:

1.- Interpretable machine learning aims to help understand what complex machine learning models are doing to avoid unintended harmful consequences.

2.- Interpretability is important when machine learning is used in high-stakes domains like healthcare where mistakes can be very costly.

3.- The machine learning community has been increasingly working on interpretability tools and techniques over the past decade.

4.- Complex systems and humans not fully understanding them has been an issue before, like with expert systems in the 80s.

5.- The abundance of data and cheap computation today makes machine learning ubiquitous and interpretability more important than ever.

6.- Interpretability is needed when the problem is fundamentally underspecified and you can't cleanly write it in a cost function.

7.- Examples of underspecified problems are self-driving cars, debugging models, and scientific discovery where the right answers aren't fully known.

8.- Interpretability is not needed when expected loss can be reasoned about or the problem is sufficiently well-studied.

9.- Approaches to interpretability include doing it before modeling (data analysis), inherently interpretable models, and post-hoc explanations of black-box models.

10.- Facets is an open-source tool from Google to help visualize and understand datasets before modeling.

11.- Exploratory data analysis, coined by John Tukey, refers to visualizing and investigating properties of the data.

12.- MMD-critic is a method to select prototypical and critical data points to efficiently understand datasets.

13.- Inherently interpretable models include rule-based models, per-feature models like linear/logistic regression, and monotonic models.

14.- Rule-based models like decision trees and rule lists can still get quite complex and difficult for humans to parse.

15.- Generalized additive models learn shape functions for each feature to allow complex but interpretable relationships.

16.- Case-based interpretable models use examples to explain, like prototypes for clusters or criticism to show unrepresented points.

17.- Limitations of case-based models include potentially lacking representative examples and humans overgeneralizing from single cases.

18.- Post-hoc interpretability approaches aim to explain models after they are built, like sensitivity analysis and saliency maps.

19.- Sensitivity analysis involves perturbing inputs and seeing the impact on outputs to understand feature importance and interactions.

20.- LIME explains a classifier's decision on a data point by perturbing it and fitting an interpretable model locally.

21.- Saliency maps take the gradient of the output with respect to the input to show the influence of each feature.

22.- Integrated gradients attribute the prediction of a deep network to its input features using a path integral.

23.- Concept activation vectors show how internal neural representations align with human-interpretable concepts.

24.- Influence functions estimate the impact of each training point on a model's predictions for understanding and debugging.

25.- Monotonic models enforce monotonic relationships between certain features and the output, encoding domain knowledge for better learning with less data.

26.- Example-based explanations work well for complex data points like pieces of code that domain experts can readily understand.

27.- Experts like doctors and data scientists can interactively update prototypes and criticism to align explanations with their knowledge.

28.- Inherently interpretable models may not always be able to represent relationships in a sparse, simulatable way.

29.- Feature sparsity and monotonicity can be useful for interpretability but have limitations in expressive power.

30.- The tutorial raises open questions and discussions around interpretability and calls for more interdisciplinary collaboration, such as with HCI.

Knowledge Vault built byDavid Vivancos 2024