On Calibration and Fairness

Kilian Weinberger

**Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:**

graph LR
classDef main fill:#f9d4f9, font-weight:bold, font-size:14px
classDef calibration fill:#f9d4d4, font-weight:bold, font-size:14px
classDef fairness fill:#d4f9d4, font-weight:bold, font-size:14px
classDef adversarial fill:#d4d4f9, font-weight:bold, font-size:14px
classDef models fill:#f9f9d4, font-weight:bold, font-size:14px
classDef techniques fill:#d4f9f9, font-weight:bold, font-size:14px
Main[On Calibration and

Fairness] --> A[Calibration] Main --> B[Fairness] Main --> C[Adversarial Examples] Main --> D[Model Characteristics] Main --> E[Techniques and Metrics] A --> A1[Calibration: matching predicted and

actual probabilities 1] A --> A2[Deep learning models often

overconfident 2] A --> A3[Temperature scaling calibrates neural

networks 3] A --> A4[Group calibration for different

demographics 5] A --> A5[ECE measures calibration quality 18] A --> A6[Log likelihood can cause

overconfidence 20] B --> B1[Fairness: equal treatment across

demographic groups 4] B --> B2[Impossibility theorem: calibration vs

equal rates 6] B --> B3[COMPASS predicts criminal recidivism 22] B --> B4[Fairness constraints ensure equal

treatment 30] B --> B5[False positive/negative rates evaluate

performance 17] B --> B6[Overfitting: good training, poor

generalization 19] C --> C1[Adversarial examples cause misclassification

confidently 7] C --> C2[White box attacks use

model gradients 8] C --> C3[Black box attacks use

only predictions 9] C --> C4[SimBA: efficient limited-query adversarial

examples 10] C --> C5[Over-optimization pushes examples into

misclassified region 13] C --> C6[Adversarial transferability: creating new

from existing 14] D --> D1[DenseNet: modern neural network

architecture 21] D --> D2[Feature extractors exploited by

adversarial examples 23] D --> D3[Google Cloud API: black

box model 25] D --> D4[Logits: unnormalized neural network

outputs 28] D --> D5[Softmax converts logits to

probabilities 29] D --> D6[Natural images robust to

small perturbations 11] E --> E1[Detecting adversarials using noise

robustness differences 12] E --> E2[Gray box: adversary unaware

of detection 15] E --> E3[White box attacks optimize

against detection 16] E --> E4[Gradient descent creates white

box adversarials 24] E --> E5[Gaussian noise tests robustness,

detects adversarials 26] E --> E6[PGD, Carlini-Wagner generate adversarial

examples 27] class Main main class A,A1,A2,A3,A4,A5,A6 calibration class B,B1,B2,B3,B4,B5,B6 fairness class C,C1,C2,C3,C4,C5,C6 adversarial class D,D1,D2,D3,D4,D5,D6 models class E,E1,E2,E3,E4,E5,E6 techniques

Fairness] --> A[Calibration] Main --> B[Fairness] Main --> C[Adversarial Examples] Main --> D[Model Characteristics] Main --> E[Techniques and Metrics] A --> A1[Calibration: matching predicted and

actual probabilities 1] A --> A2[Deep learning models often

overconfident 2] A --> A3[Temperature scaling calibrates neural

networks 3] A --> A4[Group calibration for different

demographics 5] A --> A5[ECE measures calibration quality 18] A --> A6[Log likelihood can cause

overconfidence 20] B --> B1[Fairness: equal treatment across

demographic groups 4] B --> B2[Impossibility theorem: calibration vs

equal rates 6] B --> B3[COMPASS predicts criminal recidivism 22] B --> B4[Fairness constraints ensure equal

treatment 30] B --> B5[False positive/negative rates evaluate

performance 17] B --> B6[Overfitting: good training, poor

generalization 19] C --> C1[Adversarial examples cause misclassification

confidently 7] C --> C2[White box attacks use

model gradients 8] C --> C3[Black box attacks use

only predictions 9] C --> C4[SimBA: efficient limited-query adversarial

examples 10] C --> C5[Over-optimization pushes examples into

misclassified region 13] C --> C6[Adversarial transferability: creating new

from existing 14] D --> D1[DenseNet: modern neural network

architecture 21] D --> D2[Feature extractors exploited by

adversarial examples 23] D --> D3[Google Cloud API: black

box model 25] D --> D4[Logits: unnormalized neural network

outputs 28] D --> D5[Softmax converts logits to

probabilities 29] D --> D6[Natural images robust to

small perturbations 11] E --> E1[Detecting adversarials using noise

robustness differences 12] E --> E2[Gray box: adversary unaware

of detection 15] E --> E3[White box attacks optimize

against detection 16] E --> E4[Gradient descent creates white

box adversarials 24] E --> E5[Gaussian noise tests robustness,

detects adversarials 26] E --> E6[PGD, Carlini-Wagner generate adversarial

examples 27] class Main main class A,A1,A2,A3,A4,A5,A6 calibration class B,B1,B2,B3,B4,B5,B6 fairness class C,C1,C2,C3,C4,C5,C6 adversarial class D,D1,D2,D3,D4,D5,D6 models class E,E1,E2,E3,E4,E5,E6 techniques

**Resume: **

**1.-** Calibration: Ensuring predicted probabilities match actual probabilities of outcomes.

**2.-** Deep learning models: Often overconfident in predictions compared to older neural networks.

**3.-** Temperature scaling: Simple method to calibrate deep neural networks by dividing logits by a constant.

**4.-** Fairness: Ensuring equal treatment across different demographic groups in machine learning predictions.

**5.-** Group calibration: Calibrating predictions separately for different demographic groups.

**6.-** Impossibility theorem: Cannot achieve both group-wise calibration and equal false positive/negative rates across demographics.

**7.-** Adversarial examples: Imperceptible changes to inputs that cause machine learning models to misclassify with high confidence.

**8.-** White box attacks: Creating adversarial examples with access to model gradients.

**9.-** Black box attacks: Creating adversarial examples without access to model internals, only predictions.

**10.-** Simple Black Box Attack (SimBA): Efficient method for creating adversarial examples with limited queries to target model.

**11.-** Robustness to noise: Natural images maintain classification under small random perturbations.

**12.-** Detecting adversarial examples: Leveraging differences in noise robustness between natural and adversarial images.

**13.-** Over-optimization: Adversarial examples pushed far into misclassified region to evade detection.

**14.-** Adversarial transferability: Difficulty in creating new adversarial examples from existing ones.

**15.-** Gray box attacks: Adversary unaware of detection method being used.

**16.-** White box attacks against detection: Adversary aware of and optimizing against specific detection method.

**17.-** False positive/negative rates: Metrics for evaluating fairness and detection performance.

**18.-** Expected Calibration Error (ECE): Measure of calibration quality, comparing predicted to actual probabilities.

**19.-** Overfitting: Phenomenon where model performs well on training data but poorly on new data.

**20.-** Log likelihood: Objective function often used in training that can lead to overconfidence.

**21.-** DenseNet: Deep learning architecture mentioned as an example of modern neural networks.

**22.-** COMPASS system: Automated system for predicting criminal recidivism, used as example in fairness discussion.

**23.-** Feature extractors: Components of machine learning models that can be exploited by adversarial examples.

**24.-** Gradient descent: Optimization method used in creating white box adversarial examples.

**25.-** Google Cloud API: Example of a black box model that can be attacked with limited queries.

**26.-** Gaussian noise: Random perturbations used to test robustness of images and detect adversarial examples.

**27.-** PGD and Carlini-Wagner attacks: Common methods for generating adversarial examples.

**28.-** Logits: Unnormalized outputs of neural networks before final activation function.

**29.-** Softmax: Function used to convert logits into probability distributions.

**30.-** Fairness constraints: Conditions imposed on models to ensure equal treatment across demographics.

Knowledge Vault built byDavid Vivancos 2024