Robustness Meets Algorithms (and Vice-Versa)

Ankur Moitra

**Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:**

graph LR
classDef supervised fill:#f9d4d4, font-weight:bold, font-size:14px
classDef interactive fill:#d4f9d4, font-weight:bold, font-size:14px
classDef contextual fill:#d4d4f9, font-weight:bold, font-size:14px
classDef algorithms fill:#f9f9d4, font-weight:bold, font-size:14px
classDef practical fill:#f9d4f9, font-weight:bold, font-size:14px
Main[Robustness Meets Algorithms

and Vice-Versa] Main --> A[Supervised learning ignores

valuable interaction data 1] Main --> B[Interactive learning leverages

interaction data 2] B --> C[Algorithm learns from

features, actions, rewards 3] B --> D[Full RL: special

domains, large samples 4] Main --> E[Contextual bandits: right

signal, handle non-stationarity 5] E --> F[Fits real-world problems:

recommendations, ads, education 6] E --> G[Tutorial: algorithms, theory,

evaluation, exploration 7] E --> H[Observe features, choose

action, receive reward 8] H --> I[Policies map features

to actions 9] H --> J[Offline evaluation using

inverse propensity scoring 10] Main --> K[Offline learning: importance

weighted multi-class classification 11] K --> L[Exploration algorithms: epsilon-greedy,

Thompson sampling 12] K --> M[Progressive validation for

unbiased offline evaluation 13] K --> N[Rejection sampling evaluates

full interaction loop 14] Main --> O[Failure modes: mismatched

probabilities, non-stationarity 15] O --> P[Learning systems needed:

modular, scalable, general 16] P --> Q[Decision Service: open-source

contextual bandit system 17] P --> R[Other systems: NEXT,

StreamingBandit, different capabilities 18] Main --> S[Non-stationarity: key issue

requiring special techniques 19] S --> T[Combinatorial actions need

semi-bandits, submodularity approaches 20] S --> U[Reward specification critical,

map goals to proxies 21] S --> V[Smart encoding reduces

variance, improves efficiency 22] Main --> W[Workable recipes exist

for common scenarios 23] W --> X[Complex.com case study:

substantial real-world benefits 24] W --> Y[Offline validation enables

rapid evaluation 25] W --> Z[Contextual bandits fit

for broad consumption 26] Main --> AA[Research needed: automatic

algorithms, expanding RL 27] AA --> AB[Contextual bandits: reliable,

robust for applications 28] AB --> AC[Example: personalizing EEG-based

typing for disabled 29] AB --> AD[Research benefited from

many collaborators 30] class A,B,C,D supervised class E,F,G,H,I,J contextual class K,L,M,N algorithms class O,P,Q,R,S,T,U,V practical class W,X,Y,Z,AA,AB,AC,AD interactive

and Vice-Versa] Main --> A[Supervised learning ignores

valuable interaction data 1] Main --> B[Interactive learning leverages

interaction data 2] B --> C[Algorithm learns from

features, actions, rewards 3] B --> D[Full RL: special

domains, large samples 4] Main --> E[Contextual bandits: right

signal, handle non-stationarity 5] E --> F[Fits real-world problems:

recommendations, ads, education 6] E --> G[Tutorial: algorithms, theory,

evaluation, exploration 7] E --> H[Observe features, choose

action, receive reward 8] H --> I[Policies map features

to actions 9] H --> J[Offline evaluation using

inverse propensity scoring 10] Main --> K[Offline learning: importance

weighted multi-class classification 11] K --> L[Exploration algorithms: epsilon-greedy,

Thompson sampling 12] K --> M[Progressive validation for

unbiased offline evaluation 13] K --> N[Rejection sampling evaluates

full interaction loop 14] Main --> O[Failure modes: mismatched

probabilities, non-stationarity 15] O --> P[Learning systems needed:

modular, scalable, general 16] P --> Q[Decision Service: open-source

contextual bandit system 17] P --> R[Other systems: NEXT,

StreamingBandit, different capabilities 18] Main --> S[Non-stationarity: key issue

requiring special techniques 19] S --> T[Combinatorial actions need

semi-bandits, submodularity approaches 20] S --> U[Reward specification critical,

map goals to proxies 21] S --> V[Smart encoding reduces

variance, improves efficiency 22] Main --> W[Workable recipes exist

for common scenarios 23] W --> X[Complex.com case study:

substantial real-world benefits 24] W --> Y[Offline validation enables

rapid evaluation 25] W --> Z[Contextual bandits fit

for broad consumption 26] Main --> AA[Research needed: automatic

algorithms, expanding RL 27] AA --> AB[Contextual bandits: reliable,

robust for applications 28] AB --> AC[Example: personalizing EEG-based

typing for disabled 29] AB --> AD[Research benefited from

many collaborators 30] class A,B,C,D supervised class E,F,G,H,I,J contextual class K,L,M,N algorithms class O,P,Q,R,S,T,U,V practical class W,X,Y,Z,AA,AB,AC,AD interactive

**Resume: **

**1.-** Robustness can mean different things in different contexts. This talk explores robustness in algorithm design for high-dimensional problems.

**2.-** In 1D robust parameter estimation, the median and median absolute deviation are more robust than the mean and variance.

**3.-** In high dimensions, a tradeoff emerges between provable robustness guarantees and computational tractability of estimation algorithms.

**4.-** A general recipe for robust high-dimensional estimation: 1) Find good parameter distance 2) Detect corruptions 3) Create win-win algorithm

**5.-** For robust mean estimation with covariance=I, use empirical mean but detect corruptions in 2nd moment. Filter and repeat.

**6.-** For robust covariance estimation, look at 4th moment tensor structure to detect corruptions in empirical covariance.

**7.-** Can get dimension-independent error of ε*polylog(1/ε) for robustly estimating Gaussian mean and covariance in polynomial time.

**8.-** Statistical query lower bounds show ε*√log(1/ε) error is needed for computationally efficient robust Gaussian estimation.

**9.-** With >50% corruptions, can still do robust Gaussian mean estimation by outputting a small list containing a good estimate.

**10.-** Combining sparsity with robustness assumptions improves sample complexity to logarithmic in dimension for sparse robust mean estimation.

**11.-** Robust Gaussian estimation algorithms, despite distributional assumptions, can help with exploratory data analysis on real datasets.

**12.-** The stochastic block model generates graphs with planted community structure based on within-community and across-community edge probabilities.

**13.-** Many algorithms solve partial recovery in the stochastic block model when a-b^2 > 2(a+b). This was conjectured optimal.

**14.-** Belief propagation and statistical physics techniques predict the a-b^2 > 2(a+b) threshold in the stochastic block model.

**15.-** The a-b^2 > 2(a+b) threshold was proven tight for the stochastic block model. Convex relaxations don't match it.

**16.-** Semi-random models allow "helpful" changes to stochastic block model, e.g. strengthening within-community edges and weakening between-community edges.

**17.-** Simple algorithms like common neighbor counting fail in semi-random models due to the allowed edge probability changes.

**18.-** Algorithms that match the a-b^2 > 2(a+b) threshold fail in semi-random models. The information-theoretic threshold increases.

**19.-** Semi-definite programs still solve partial recovery in semi-random models, sacrificing sharpness but gaining robustness compared to belief propagation.

**20.-** The broadcast tree model captures local structure of the stochastic block model. It has a sharp recovery threshold.

**21.-** Kesten-Stigum bound: In the broadcast tree model, root color is recoverable iff a-b^2 > 2(a+b). Optimal algorithm is majority.

**22.-** In a semi-random broadcast tree model allowing "helpful" changes, the Kesten-Stigum bound and majority reconstruction break down.

**23.-** Recursive majority still works in the semi-random broadcast tree model, showing a robustness vs sharpness tradeoff.

**24.-** Studying semi-random models provides insight into which algorithms are overtuned to exactly the distributional assumptions of average-case models.

**25.-** Semi-random models are a way to "stress test" average-case algorithmic guarantees by adding realistic deviations to the model.

**26.-** Matrix completion is the problem of recovering a low-rank matrix from sparse observations. Two main approaches: convex relaxation and alternating minimization.

**27.-** Convex relaxation for matrix completion is more robust to an adversary revealing additional entries. Alternating minimization breaks down.

**28.-** No single notion of robustness is universally appropriate. Studying different robustness models provides insight into algorithms.

**29.-** An open question is to get matrix completion algorithms with the computational benefits of alternating minimization and robustness of convex relaxation.

**30.-** Investigating appropriate notions of robustness for problems in algorithms and machine learning is a promising research direction.

Knowledge Vault built byDavid Vivancos 2024