Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, Kunal Talwar ICLR 2017 - Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data
graph LR classDef privacy fill:#f9d4d4, font-weight:bold, font-size:14px; classDef pate fill:#d4f9d4, font-weight:bold, font-size:14px; classDef student fill:#d4d4f9, font-weight:bold, font-size:14px; classDef experiments fill:#f9f9d4, font-weight:bold, font-size:14px; A[Nicolas Papernot et al
ICLR 2017] --> B[Private learning: membership,
data extraction threats. 1] A --> C[Threat model: white-box,
black-box adversaries. 2] A --> D[Quantifying privacy: randomized
algorithms reveal input. 3] A --> E[Goals: generic, intuitive
privacy preservation. 4] A --> F[PATE: Private Aggregation
of Teacher Ensembles. 5] F --> G[PATE: ensemble predictions
from disjoint partitions. 6] G --> H[Aggregation: vote count,
maximum votes output. 7] H --> I[Teacher agreement: small
privacy cost. 8] F --> J[Noisy aggregation: Laplacian
noise for privacy. 9] F --> K[Aggregated teacher trains
student on public data. 10] K --> L[Student needed: aggregation
increases privacy loss. 11] K --> M[Student deployed, protects
training data privacy. 12] A --> N[Differential privacy quantifies
privacy guarantees. 13] N --> O[Moments accountant analyzes
PATE's privacy. 14] O --> P[Strong quorums: small
privacy costs, data-dependent. 15] A --> Q[PATE-G: generative variant
using GANs. 16] Q --> R[GANs: generator outputs
synthetic data, discriminator classifies. 17] R --> S[Semi-supervised GAN: discriminator
predicts class, real/fake. 18] Q --> T[PATE-G student: GAN-based
semi-supervised learning. 19] T --> U[Deployed discriminator predicts,
aims to protect privacy. 20] A --> V[Experiments: MNIST, SVHN,
UCI Adult, Diabetes datasets. 21] V --> W[Results: aggregated teacher
accuracy on test sets. 22] V --> X[Privacy-utility tradeoff: student
accuracy vs privacy strength. 23] V --> Y[PATE: student accuracy
close to non-private baselines. 24] A --> Z[More details: code
repository, conference poster. 25] class B,C,D,E,N,O,P privacy; class F,G,H,I,J,Q,R,S pate; class K,L,M,T,U student; class V,W,X,Y,Z experiments;


1.-Learning from private data faces challenges like membership attacks and training-data extraction attacks.

2.-The threat model assumes adversaries can make unlimited queries and access model internals (white-box) or only query the model (black-box).

3.-Quantifying privacy involves analyzing randomized algorithms to understand how different answers reveal information about the input data.

4.-The design goals are to preserve training data privacy with differential privacy guarantees in a way that is generic and intuitive.

5.-PATE stands for Private Aggregation of Teacher Ensembles. It partitions sensitive data and trains teacher models on each partition.

6.-In PATE, predictions from an ensemble of teacher models trained on disjoint data partitions are aggregated.

7.-The aggregation takes a vote count of the teacher predictions and outputs the class with the maximum votes.

8.-Intuitively, if most teachers agree on the label, it doesn't depend on specific data partitions, so the privacy cost is small.

9.-To provide differential privacy, noisy aggregation is used by adding Laplacian noise to the vote counts before taking the maximum.

10.-The aggregated teacher is used to train a student model using queries on available public data.

11.-Training a student model is needed because each aggregated teacher prediction increases privacy loss, and inspecting its internals could reveal private data.

12.-At inference time, the student model is deployed and available to the adversary. It aims to provide privacy for the training data.

13.-Differential privacy quantifies privacy guarantees. An algorithm is differentially private if similar datasets produce statistically close outputs.

14.-The moments accountant technique is applied to analyze the differential privacy guarantees of PATE.

15.-Strong quorums (agreement between teachers) result in small privacy costs. The privacy bound is data-dependent.

16.-PATE-G is a generative variant of PATE that uses GANs.

17.-GANs have a generator that outputs synthetic data and a discriminator that classifies data as real or fake.

18.-In semi-supervised GAN training, the discriminator also predicts the class for real data in addition to the real/fake distinction.

19.-In PATE-G, the student is trained using GAN-based semi-supervised learning by querying the aggregated teacher.

20.-For deployment, the discriminator component of the GAN student is used to make predictions while aiming to protect privacy.

21.-Experiments evaluate PATE on datasets like MNIST, SVHN, UCI Adult, and UCI Diabetes with various model architectures.

22.-Results show the accuracy of the aggregated teacher ensemble on the test sets.

23.-There is a privacy-utility tradeoff between the student model's accuracy and the strength of privacy guarantees.

24.-PATE achieves student accuracy close to non-private baselines while providing meaningful differential privacy guarantees.

25.-More details are available in the linked code repository and the authors' poster at the conference.

