Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:
Resume:
1.- The tutorial covers deep reinforcement learning, decision making, and control. Slides are available online.
2.- Sequential decision making is needed when an agent's actions affect future states and decisions. Applications include robotics, autonomous driving, finance.
3.- Deep reinforcement learning combines deep learning for rich sensory inputs with reinforcement learning for actions that affect outcomes.
4.- Reinforcement learning involves generating samples, fitting a model/estimator to evaluate returns, and using it to improve the policy in a cycle.
5.- In the policy gradient method, the policy is directly differentiated to enable gradient ascent, formalizing trial and error learning.
6.- Variance of policy gradients can be reduced by exploiting causality and introducing a baseline. Natural gradients improve convergence.
7.- Actor-critic algorithms have an actor that predicts actions and a critic that evaluates actions. Critic is used to estimate advantage.
8.- In direct value function methods like Q-learning, the policy implicitly maximizes the learned Q-function. Can be used with continuous actions.
9.- Reinforcement learning can be viewed as probabilistic inference. Value functions and Q-functions emerge from inference in a graphical model.
10.- Soft optimality emerges from a graphical model of trajectories, values and rewards. Policy maximizes entropy along with expected reward.
11.- Soft Q-learning uses a soft max instead of a hard max for the Q-function. Helps with exploration and compositionality.
12.- Inverse reinforcement learning aims to infer the reward function from expert demonstrations. It's ambiguous and requires solving the forward problem.
13.- Maximum entropy inverse reinforcement learning handles ambiguity with a probabilistic model. It's equivalent to GAN with special discriminator.
14.- Guided cost learning and generative adversarial imitation learning are sampling-based inverse RL algorithms that work without solving the forward problem.
15.- Model-based RL aims to learn the dynamics model and optimize the policy using the model. More efficient than model-free RL.
16.- Ways to use a learned model include back-propagating gradients through it, model-predictive control, learning local models.
17.- Guided policy search learns local models and policies for multiple initial states and distills them into a global policy.
18.- With high-dimensional observations, the dynamics model can be learned in a low-dimensional latent space or directly in observation space.
19.- Model-based RL can be more efficient and generalizable than model-free RL, but is limited by model accuracy.
20.- Open challenges in deep RL include improving sample efficiency, safe exploration, reward specification, and transfer learning.
21.- Sample efficiency can potentially be improved through curiosity, hierarchy, stochastic policies, and transfer across tasks.
22.- Safe exploration may involve uncertainty estimation, learning from off-policy data, human oversight, or learning first in simulation.
23.- Reward specification can leverage human preferences, inverse RL, goal images, object motions, or language instructions.
24.- Agents should learn to quickly solve new tasks by building on knowledge from previous tasks, rather than learning tabula rasa.
25.- Automatically generating tasks and curricula is an important problem for building more capable agents.
26.- Incorporating uncertainty into policies can help agents explore safely by avoiding actions with highly uncertain outcomes.
27.- Learning from off-policy data, such as human demonstrations, can allow agents to learn without risky trial-and-error.
28.- Human intervention when an agent is about to make an unsafe decision can keep the agent within safe boundaries during learning.
29.- Simulation-to-real transfer allows agents to learn in a safe virtual environment before deploying those skills in the real world.
30.- The paradigm of learning from rewards with minimal supervision may help in pursuit of human-level artificial intelligence.
Knowledge Vault built byDavid Vivancos 2024