Knowledge Vault 6 /50 - ICML 2019
On Calibration and Fairness
Kilian Weinberger
< Resume Image >

Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:

graph LR classDef main fill:#f9d4f9, font-weight:bold, font-size:14px classDef calibration fill:#f9d4d4, font-weight:bold, font-size:14px classDef fairness fill:#d4f9d4, font-weight:bold, font-size:14px classDef adversarial fill:#d4d4f9, font-weight:bold, font-size:14px classDef models fill:#f9f9d4, font-weight:bold, font-size:14px classDef techniques fill:#d4f9f9, font-weight:bold, font-size:14px Main[On Calibration and
Fairness] --> A[Calibration] Main --> B[Fairness] Main --> C[Adversarial Examples] Main --> D[Model Characteristics] Main --> E[Techniques and Metrics] A --> A1[Calibration: matching predicted and
actual probabilities 1] A --> A2[Deep learning models often
overconfident 2] A --> A3[Temperature scaling calibrates neural
networks 3] A --> A4[Group calibration for different
demographics 5] A --> A5[ECE measures calibration quality 18] A --> A6[Log likelihood can cause
overconfidence 20] B --> B1[Fairness: equal treatment across
demographic groups 4] B --> B2[Impossibility theorem: calibration vs
equal rates 6] B --> B3[COMPASS predicts criminal recidivism 22] B --> B4[Fairness constraints ensure equal
treatment 30] B --> B5[False positive/negative rates evaluate
performance 17] B --> B6[Overfitting: good training, poor
generalization 19] C --> C1[Adversarial examples cause misclassification
confidently 7] C --> C2[White box attacks use
model gradients 8] C --> C3[Black box attacks use
only predictions 9] C --> C4[SimBA: efficient limited-query adversarial
examples 10] C --> C5[Over-optimization pushes examples into
misclassified region 13] C --> C6[Adversarial transferability: creating new
from existing 14] D --> D1[DenseNet: modern neural network
architecture 21] D --> D2[Feature extractors exploited by
adversarial examples 23] D --> D3[Google Cloud API: black
box model 25] D --> D4[Logits: unnormalized neural network
outputs 28] D --> D5[Softmax converts logits to
probabilities 29] D --> D6[Natural images robust to
small perturbations 11] E --> E1[Detecting adversarials using noise
robustness differences 12] E --> E2[Gray box: adversary unaware
of detection 15] E --> E3[White box attacks optimize
against detection 16] E --> E4[Gradient descent creates white
box adversarials 24] E --> E5[Gaussian noise tests robustness,
detects adversarials 26] E --> E6[PGD, Carlini-Wagner generate adversarial
examples 27] class Main main class A,A1,A2,A3,A4,A5,A6 calibration class B,B1,B2,B3,B4,B5,B6 fairness class C,C1,C2,C3,C4,C5,C6 adversarial class D,D1,D2,D3,D4,D5,D6 models class E,E1,E2,E3,E4,E5,E6 techniques

Resume:

1.- Calibration: Ensuring predicted probabilities match actual probabilities of outcomes.

2.- Deep learning models: Often overconfident in predictions compared to older neural networks.

3.- Temperature scaling: Simple method to calibrate deep neural networks by dividing logits by a constant.

4.- Fairness: Ensuring equal treatment across different demographic groups in machine learning predictions.

5.- Group calibration: Calibrating predictions separately for different demographic groups.

6.- Impossibility theorem: Cannot achieve both group-wise calibration and equal false positive/negative rates across demographics.

7.- Adversarial examples: Imperceptible changes to inputs that cause machine learning models to misclassify with high confidence.

8.- White box attacks: Creating adversarial examples with access to model gradients.

9.- Black box attacks: Creating adversarial examples without access to model internals, only predictions.

10.- Simple Black Box Attack (SimBA): Efficient method for creating adversarial examples with limited queries to target model.

11.- Robustness to noise: Natural images maintain classification under small random perturbations.

12.- Detecting adversarial examples: Leveraging differences in noise robustness between natural and adversarial images.

13.- Over-optimization: Adversarial examples pushed far into misclassified region to evade detection.

14.- Adversarial transferability: Difficulty in creating new adversarial examples from existing ones.

15.- Gray box attacks: Adversary unaware of detection method being used.

16.- White box attacks against detection: Adversary aware of and optimizing against specific detection method.

17.- False positive/negative rates: Metrics for evaluating fairness and detection performance.

18.- Expected Calibration Error (ECE): Measure of calibration quality, comparing predicted to actual probabilities.

19.- Overfitting: Phenomenon where model performs well on training data but poorly on new data.

20.- Log likelihood: Objective function often used in training that can lead to overconfidence.

21.- DenseNet: Deep learning architecture mentioned as an example of modern neural networks.

22.- COMPASS system: Automated system for predicting criminal recidivism, used as example in fairness discussion.

23.- Feature extractors: Components of machine learning models that can be exploited by adversarial examples.

24.- Gradient descent: Optimization method used in creating white box adversarial examples.

25.- Google Cloud API: Example of a black box model that can be attacked with limited queries.

26.- Gaussian noise: Random perturbations used to test robustness of images and detect adversarial examples.

27.- PGD and Carlini-Wagner attacks: Common methods for generating adversarial examples.

28.- Logits: Unnormalized outputs of neural networks before final activation function.

29.- Softmax: Function used to convert logits into probability distributions.

30.- Fairness constraints: Conditions imposed on models to ensure equal treatment across demographics.

Knowledge Vault built byDavid Vivancos 2024