Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:
graph LR
classDef main fill:#f9d4d4, font-weight:bold, font-size:14px
classDef intro fill:#d4f9d4, font-weight:bold, font-size:14px
classDef policysearch fill:#d4d4f9, font-weight:bold, font-size:14px
classDef gradients fill:#f9f9d4, font-weight:bold, font-size:14px
classDef em fill:#f9d4f9, font-weight:bold, font-size:14px
classDef advanced fill:#d4f9f9, font-weight:bold, font-size:14px
Main[Two high stakes
challenges in machine
learning]
Main --> A[Introduction and Motivation]
A --> A1[Autonomous robots need complex
skill learning 1]
A --> A2[Challenges: high-dimensional spaces, data
costs, safety 2]
A --> A3[Value-based RL: unstable, extensive
exploration 3]
A --> A4[Policy search: parameterized, correlated,
local updates 4]
A --> A5[Taxonomy: model-free vs model-based
methods 5]
A --> A6[Outline: taxonomy, methods, extensions,
model-based 6]
Main --> B[Policy Search Fundamentals]
B --> B1[Policy representations: trajectories, controllers,
networks 7]
B --> B2[Model-free vs model-based: samples
vs learning 8]
B --> B3[Step vs episode-based exploration:
action/parameter space 9]
B --> B4[Policy update: direct optimization
or EM 10]
B --> B5[Exploration: balance smoothness and
variability 11]
B --> B6[Correlated parameter exploration yields
smoother trajectories 12]
Main --> C[Policy Gradient Methods]
C --> C1[Conservative vs greedy updates:
exploration-exploitation tradeoff 13]
C --> C2[Policy gradients: log-likelihood trick
estimates gradient 14]
C --> C3[Baseline subtraction reduces variance
without bias 15]
C --> C4[Step-based gradients use state-action
value function 16]
C --> C5[State-dependent baseline further reduces
variance 17]
C --> C6[Metric choice impacts update
step size 18]
Main --> D[Advanced Policy Gradient Techniques]
D --> D1[Natural gradients: Fisher information
normalizes gradient 19]
D --> D2[Natural actor-critic: gradients with
function approximation 20]
D --> D3[State-value function reduces advantage
function variance 21]
D --> D4[Policy gradients learn motor
skills slowly 22]
Main --> E[Expectation-Maximization Methods]
E --> E1[EM-based search: reward-weighted maximum
likelihood 23]
E --> E2[EM works for step/episode-based settings 24]
E --> E3[Reward weighting: baseline subtraction,
rescaling 25]
E --> E4[Moment projection: KL minimization,
closed-form updates 26]
Main --> F[Advanced Topics and Applications]
F --> F1[Applications: complex robot skills
forthcoming 27]
F --> F2[Contextual search learns generalizable,
adaptable skills 28]
F --> F3[Hierarchical search: high-level sequencing,
low-level primitives 29]
F --> F4[Model-based search: PILCO, guided
policy search 30]
class Main main
class A,A1,A2,A3,A4,A5,A6 intro
class B,B1,B2,B3,B4,B5,B6 policysearch
class C,C1,C2,C3,C4,C5,C6,D,D1,D2,D3,D4 gradients
class E,E1,E2,E3,E4 em
class F,F1,F2,F3,F4 advanced
Resume:
1.- Challenges: software engineering and experimentation.
2.- Abstraction helps manage engineering complexity.
3.- Abstractions can leak, requiring deeper understanding.
4.- Math abstractions don't leak, aiding design.
5.- Software built on clean abstractions.
6.- Programming vs. learning: different computing approaches.
7.- Perceptrons lost to programming initially.
8.- Humans excel where specifications are elusive.
9.- ML needs software to have impact.
10.- Trained models make weak software components.
11.- Learning algorithms entangle complex systems.
12.- Examples illustrate integration problems with ML.
13.- ML in software: challenges remain.
14.- ML mixes science and engineering aspects.
15.- ML lacks specifications, relies on data.
16.- ML relies on single experiment paradigm.
17.- Single paradigm contrasts other sciences.
18.- Datasets have bias, can't be curated.
19.- Training data never covers all cases.
20.- Models fail on unseen edge cases.
21.- Computer vision isn't purely statistical.
22.- Evaluating AI-like tasks is difficult.
23.- Rethink experiment paradigm for ML progress.
24.- ML challenges are about process.
25.- ML engineering may prioritize productivity.
26.- Targeted experiments could reveal model reasoning.
27.- Diverse experiments, discussing limits openly.
28.- Contracts could make ML more robust.
29.- Reusing ML work remains challenging.
30.- Key challenges shape ML's future impact.
Knowledge Vault built byDavid Vivancos 2024