The End Of Knowledge - Vault 6/1 - CVPR - 2015 - Two high stakes challenges in machine learning

graph LR classDef main fill:#f9d4d4, font-weight:bold, font-size:14px classDef intro fill:#d4f9d4, font-weight:bold, font-size:14px classDef policysearch fill:#d4d4f9, font-weight:bold, font-size:14px classDef gradients fill:#f9f9d4, font-weight:bold, font-size:14px classDef em fill:#f9d4f9, font-weight:bold, font-size:14px classDef advanced fill:#d4f9f9, font-weight:bold, font-size:14px Main[Two high stakes
challenges in machine
learning] Main --> A[Introduction and Motivation] A --> A1[Autonomous robots need complex
skill learning 1] A --> A2[Challenges: high-dimensional spaces, data
costs, safety 2] A --> A3[Value-based RL: unstable, extensive
exploration 3] A --> A4[Policy search: parameterized, correlated,
local updates 4] A --> A5[Taxonomy: model-free vs model-based
methods 5] A --> A6[Outline: taxonomy, methods, extensions,
model-based 6] Main --> B[Policy Search Fundamentals] B --> B1[Policy representations: trajectories, controllers,
networks 7] B --> B2[Model-free vs model-based: samples
vs learning 8] B --> B3[Step vs episode-based exploration:
action/parameter space 9] B --> B4[Policy update: direct optimization
or EM 10] B --> B5[Exploration: balance smoothness and
variability 11] B --> B6[Correlated parameter exploration yields
smoother trajectories 12] Main --> C[Policy Gradient Methods] C --> C1[Conservative vs greedy updates:
exploration-exploitation tradeoff 13] C --> C2[Policy gradients: log-likelihood trick
estimates gradient 14] C --> C3[Baseline subtraction reduces variance
without bias 15] C --> C4[Step-based gradients use state-action
value function 16] C --> C5[State-dependent baseline further reduces
variance 17] C --> C6[Metric choice impacts update
step size 18] Main --> D[Advanced Policy Gradient Techniques] D --> D1[Natural gradients: Fisher information
normalizes gradient 19] D --> D2[Natural actor-critic: gradients with
function approximation 20] D --> D3[State-value function reduces advantage
function variance 21] D --> D4[Policy gradients learn motor
skills slowly 22] Main --> E[Expectation-Maximization Methods] E --> E1[EM-based search: reward-weighted maximum
likelihood 23] E --> E2[EM works for step/episode-based settings 24] E --> E3[Reward weighting: baseline subtraction,
rescaling 25] E --> E4[Moment projection: KL minimization,
closed-form updates 26] Main --> F[Advanced Topics and Applications] F --> F1[Applications: complex robot skills
forthcoming 27] F --> F2[Contextual search learns generalizable,
adaptable skills 28] F --> F3[Hierarchical search: high-level sequencing,
low-level primitives 29] F --> F4[Model-based search: PILCO, guided
policy search 30] class Main main class A,A1,A2,A3,A4,A5,A6 intro class B,B1,B2,B3,B4,B5,B6 policysearch class C,C1,C2,C3,C4,C5,C6,D,D1,D2,D3,D4 gradients class E,E1,E2,E3,E4 em class F,F1,F2,F3,F4 advanced