Two high stakes challenges in machine learning

Léon Bottou

**Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:**

graph LR
classDef main fill:#f9d4d4, font-weight:bold, font-size:14px
classDef intro fill:#d4f9d4, font-weight:bold, font-size:14px
classDef policysearch fill:#d4d4f9, font-weight:bold, font-size:14px
classDef gradients fill:#f9f9d4, font-weight:bold, font-size:14px
classDef em fill:#f9d4f9, font-weight:bold, font-size:14px
classDef advanced fill:#d4f9f9, font-weight:bold, font-size:14px
Main[Two high stakes

challenges in machine

learning] Main --> A[Introduction and Motivation] A --> A1[Autonomous robots need complex

skill learning 1] A --> A2[Challenges: high-dimensional spaces, data

costs, safety 2] A --> A3[Value-based RL: unstable, extensive

exploration 3] A --> A4[Policy search: parameterized, correlated,

local updates 4] A --> A5[Taxonomy: model-free vs model-based

methods 5] A --> A6[Outline: taxonomy, methods, extensions,

model-based 6] Main --> B[Policy Search Fundamentals] B --> B1[Policy representations: trajectories, controllers,

networks 7] B --> B2[Model-free vs model-based: samples

vs learning 8] B --> B3[Step vs episode-based exploration:

action/parameter space 9] B --> B4[Policy update: direct optimization

or EM 10] B --> B5[Exploration: balance smoothness and

variability 11] B --> B6[Correlated parameter exploration yields

smoother trajectories 12] Main --> C[Policy Gradient Methods] C --> C1[Conservative vs greedy updates:

exploration-exploitation tradeoff 13] C --> C2[Policy gradients: log-likelihood trick

estimates gradient 14] C --> C3[Baseline subtraction reduces variance

without bias 15] C --> C4[Step-based gradients use state-action

value function 16] C --> C5[State-dependent baseline further reduces

variance 17] C --> C6[Metric choice impacts update

step size 18] Main --> D[Advanced Policy Gradient Techniques] D --> D1[Natural gradients: Fisher information

normalizes gradient 19] D --> D2[Natural actor-critic: gradients with

function approximation 20] D --> D3[State-value function reduces advantage

function variance 21] D --> D4[Policy gradients learn motor

skills slowly 22] Main --> E[Expectation-Maximization Methods] E --> E1[EM-based search: reward-weighted maximum

likelihood 23] E --> E2[EM works for step/episode-based settings 24] E --> E3[Reward weighting: baseline subtraction,

rescaling 25] E --> E4[Moment projection: KL minimization,

closed-form updates 26] Main --> F[Advanced Topics and Applications] F --> F1[Applications: complex robot skills

forthcoming 27] F --> F2[Contextual search learns generalizable,

adaptable skills 28] F --> F3[Hierarchical search: high-level sequencing,

low-level primitives 29] F --> F4[Model-based search: PILCO, guided

policy search 30] class Main main class A,A1,A2,A3,A4,A5,A6 intro class B,B1,B2,B3,B4,B5,B6 policysearch class C,C1,C2,C3,C4,C5,C6,D,D1,D2,D3,D4 gradients class E,E1,E2,E3,E4 em class F,F1,F2,F3,F4 advanced

challenges in machine

learning] Main --> A[Introduction and Motivation] A --> A1[Autonomous robots need complex

skill learning 1] A --> A2[Challenges: high-dimensional spaces, data

costs, safety 2] A --> A3[Value-based RL: unstable, extensive

exploration 3] A --> A4[Policy search: parameterized, correlated,

local updates 4] A --> A5[Taxonomy: model-free vs model-based

methods 5] A --> A6[Outline: taxonomy, methods, extensions,

model-based 6] Main --> B[Policy Search Fundamentals] B --> B1[Policy representations: trajectories, controllers,

networks 7] B --> B2[Model-free vs model-based: samples

vs learning 8] B --> B3[Step vs episode-based exploration:

action/parameter space 9] B --> B4[Policy update: direct optimization

or EM 10] B --> B5[Exploration: balance smoothness and

variability 11] B --> B6[Correlated parameter exploration yields

smoother trajectories 12] Main --> C[Policy Gradient Methods] C --> C1[Conservative vs greedy updates:

exploration-exploitation tradeoff 13] C --> C2[Policy gradients: log-likelihood trick

estimates gradient 14] C --> C3[Baseline subtraction reduces variance

without bias 15] C --> C4[Step-based gradients use state-action

value function 16] C --> C5[State-dependent baseline further reduces

variance 17] C --> C6[Metric choice impacts update

step size 18] Main --> D[Advanced Policy Gradient Techniques] D --> D1[Natural gradients: Fisher information

normalizes gradient 19] D --> D2[Natural actor-critic: gradients with

function approximation 20] D --> D3[State-value function reduces advantage

function variance 21] D --> D4[Policy gradients learn motor

skills slowly 22] Main --> E[Expectation-Maximization Methods] E --> E1[EM-based search: reward-weighted maximum

likelihood 23] E --> E2[EM works for step/episode-based settings 24] E --> E3[Reward weighting: baseline subtraction,

rescaling 25] E --> E4[Moment projection: KL minimization,

closed-form updates 26] Main --> F[Advanced Topics and Applications] F --> F1[Applications: complex robot skills

forthcoming 27] F --> F2[Contextual search learns generalizable,

adaptable skills 28] F --> F3[Hierarchical search: high-level sequencing,

low-level primitives 29] F --> F4[Model-based search: PILCO, guided

policy search 30] class Main main class A,A1,A2,A3,A4,A5,A6 intro class B,B1,B2,B3,B4,B5,B6 policysearch class C,C1,C2,C3,C4,C5,C6,D,D1,D2,D3,D4 gradients class E,E1,E2,E3,E4 em class F,F1,F2,F3,F4 advanced

**Resume: **

**1.-** Challenges: software engineering and experimentation.

**2.-** Abstraction helps manage engineering complexity.

**3.-** Abstractions can leak, requiring deeper understanding.

**4.-** Math abstractions don't leak, aiding design.

**5.-** Software built on clean abstractions.

**6.-** Programming vs. learning: different computing approaches.

**7.-** Perceptrons lost to programming initially.

**8.-** Humans excel where specifications are elusive.

**9.-** ML needs software to have impact.

**10.-** Trained models make weak software components.

**11.-** Learning algorithms entangle complex systems.

**12.-** Examples illustrate integration problems with ML.

**13.-** ML in software: challenges remain.

**14.-** ML mixes science and engineering aspects.

**15.-** ML lacks specifications, relies on data.

**16.-** ML relies on single experiment paradigm.

**17.-** Single paradigm contrasts other sciences.

**18.-** Datasets have bias, can't be curated.

**19.-** Training data never covers all cases.

**20.-** Models fail on unseen edge cases.

**21.-** Computer vision isn't purely statistical.

**22.-** Evaluating AI-like tasks is difficult.

**23.-** Rethink experiment paradigm for ML progress.

**24.-** ML challenges are about process.

**25.-** ML engineering may prioritize productivity.

**26.-** Targeted experiments could reveal model reasoning.

**27.-** Diverse experiments, discussing limits openly.

**28.-** Contracts could make ML more robust.

**29.-** Reusing ML work remains challenging.

**30.-** Key challenges shape ML's future impact.

Knowledge Vault built byDavid Vivancos 2024