Knowledge Vault 6 /28 - ICML 2017
Robustness Meets Algorithms (and Vice-Versa)
Ankur Moitra
< Resume Image >

Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:

graph LR classDef supervised fill:#f9d4d4, font-weight:bold, font-size:14px classDef interactive fill:#d4f9d4, font-weight:bold, font-size:14px classDef contextual fill:#d4d4f9, font-weight:bold, font-size:14px classDef algorithms fill:#f9f9d4, font-weight:bold, font-size:14px classDef practical fill:#f9d4f9, font-weight:bold, font-size:14px Main[Robustness Meets Algorithms
and Vice-Versa] Main --> A[Supervised learning ignores
valuable interaction data 1] Main --> B[Interactive learning leverages
interaction data 2] B --> C[Algorithm learns from
features, actions, rewards 3] B --> D[Full RL: special
domains, large samples 4] Main --> E[Contextual bandits: right
signal, handle non-stationarity 5] E --> F[Fits real-world problems:
recommendations, ads, education 6] E --> G[Tutorial: algorithms, theory,
evaluation, exploration 7] E --> H[Observe features, choose
action, receive reward 8] H --> I[Policies map features
to actions 9] H --> J[Offline evaluation using
inverse propensity scoring 10] Main --> K[Offline learning: importance
weighted multi-class classification 11] K --> L[Exploration algorithms: epsilon-greedy,
Thompson sampling 12] K --> M[Progressive validation for
unbiased offline evaluation 13] K --> N[Rejection sampling evaluates
full interaction loop 14] Main --> O[Failure modes: mismatched
probabilities, non-stationarity 15] O --> P[Learning systems needed:
modular, scalable, general 16] P --> Q[Decision Service: open-source
contextual bandit system 17] P --> R[Other systems: NEXT,
StreamingBandit, different capabilities 18] Main --> S[Non-stationarity: key issue
requiring special techniques 19] S --> T[Combinatorial actions need
semi-bandits, submodularity approaches 20] S --> U[Reward specification critical,
map goals to proxies 21] S --> V[Smart encoding reduces
variance, improves efficiency 22] Main --> W[Workable recipes exist
for common scenarios 23] W --> X[Complex.com case study:
substantial real-world benefits 24] W --> Y[Offline validation enables
rapid evaluation 25] W --> Z[Contextual bandits fit
for broad consumption 26] Main --> AA[Research needed: automatic
algorithms, expanding RL 27] AA --> AB[Contextual bandits: reliable,
robust for applications 28] AB --> AC[Example: personalizing EEG-based
typing for disabled 29] AB --> AD[Research benefited from
many collaborators 30] class A,B,C,D supervised class E,F,G,H,I,J contextual class K,L,M,N algorithms class O,P,Q,R,S,T,U,V practical class W,X,Y,Z,AA,AB,AC,AD interactive

Resume:

1.- Robustness can mean different things in different contexts. This talk explores robustness in algorithm design for high-dimensional problems.

2.- In 1D robust parameter estimation, the median and median absolute deviation are more robust than the mean and variance.

3.- In high dimensions, a tradeoff emerges between provable robustness guarantees and computational tractability of estimation algorithms.

4.- A general recipe for robust high-dimensional estimation: 1) Find good parameter distance 2) Detect corruptions 3) Create win-win algorithm

5.- For robust mean estimation with covariance=I, use empirical mean but detect corruptions in 2nd moment. Filter and repeat.

6.- For robust covariance estimation, look at 4th moment tensor structure to detect corruptions in empirical covariance.

7.- Can get dimension-independent error of ε*polylog(1/ε) for robustly estimating Gaussian mean and covariance in polynomial time.

8.- Statistical query lower bounds show ε*√log(1/ε) error is needed for computationally efficient robust Gaussian estimation.

9.- With >50% corruptions, can still do robust Gaussian mean estimation by outputting a small list containing a good estimate.

10.- Combining sparsity with robustness assumptions improves sample complexity to logarithmic in dimension for sparse robust mean estimation.

11.- Robust Gaussian estimation algorithms, despite distributional assumptions, can help with exploratory data analysis on real datasets.

12.- The stochastic block model generates graphs with planted community structure based on within-community and across-community edge probabilities.

13.- Many algorithms solve partial recovery in the stochastic block model when a-b^2 > 2(a+b). This was conjectured optimal.

14.- Belief propagation and statistical physics techniques predict the a-b^2 > 2(a+b) threshold in the stochastic block model.

15.- The a-b^2 > 2(a+b) threshold was proven tight for the stochastic block model. Convex relaxations don't match it.

16.- Semi-random models allow "helpful" changes to stochastic block model, e.g. strengthening within-community edges and weakening between-community edges.

17.- Simple algorithms like common neighbor counting fail in semi-random models due to the allowed edge probability changes.

18.- Algorithms that match the a-b^2 > 2(a+b) threshold fail in semi-random models. The information-theoretic threshold increases.

19.- Semi-definite programs still solve partial recovery in semi-random models, sacrificing sharpness but gaining robustness compared to belief propagation.

20.- The broadcast tree model captures local structure of the stochastic block model. It has a sharp recovery threshold.

21.- Kesten-Stigum bound: In the broadcast tree model, root color is recoverable iff a-b^2 > 2(a+b). Optimal algorithm is majority.

22.- In a semi-random broadcast tree model allowing "helpful" changes, the Kesten-Stigum bound and majority reconstruction break down.

23.- Recursive majority still works in the semi-random broadcast tree model, showing a robustness vs sharpness tradeoff.

24.- Studying semi-random models provides insight into which algorithms are overtuned to exactly the distributional assumptions of average-case models.

25.- Semi-random models are a way to "stress test" average-case algorithmic guarantees by adding realistic deviations to the model.

26.- Matrix completion is the problem of recovering a low-rank matrix from sparse observations. Two main approaches: convex relaxation and alternating minimization.

27.- Convex relaxation for matrix completion is more robust to an adversary revealing additional entries. Alternating minimization breaks down.

28.- No single notion of robustness is universally appropriate. Studying different robustness models provides insight into algorithms.

29.- An open question is to get matrix completion algorithms with the computational benefits of alternating minimization and robustness of convex relaxation.

30.- Investigating appropriate notions of robustness for problems in algorithms and machine learning is a promising research direction.

Knowledge Vault built byDavid Vivancos 2024