Doina Precup ICLR 2022 - Invited Talk - From Reinforcement Learning to AI
1.-Doina Precup is a prominent AI researcher at McGill University, DeepMind Montreal, and the Quebec AI Institute, with a focus on diversity.

2.-Reinforcement learning (RL) is a microcosm of AI, involving learning from interaction with an environment through observations, actions, and reward signals.

3.-RL has achieved impressive results in games, control tasks, healthcare, education, finance, and computer system optimization.

4.-AlphaGo, an RL agent, learns to play Go by self-experimentation and invents superior strategies compared to humans.

5.-The "reward is enough" hypothesis suggests intelligence and associated abilities can be understood as reward maximization in a complex environment.

6.-Biological examples (squirrels, robots) illustrate how maximizing simple reward signals in complex environments can lead to the development of intelligent behaviors.

7.-RL aims to build general AI agents that grow knowledge, learn efficiently, reason at multiple levels of abstraction, and adapt quickly.

8.-RL agents acquire procedural knowledge (policies) and predictive knowledge (value functions), which can be generalized to skills, goal-driven behavior, and general predictions.

9.-Agent knowledge should be expressive, learnable from data without supervision, and composable for fast planning in new situations.

10.-Options framework expresses procedural knowledge as temporally extended actions with initiation sets, internal policies, and termination conditions.

11.-General value functions estimate expected values of various cumulants (quantities of interest) over different time scales using continuation functions.

12.-Hierarchical RL with options and general value functions improves performance in complex environments like Android apps with high-dimensional action spaces.

13.-Option keyboard allows efficient policy composition and improvement by combining cumulants and quickly evaluating policies for combined cumulants.

14.-RL provides powerful tools for learning procedural (options) and predictive (general value functions) knowledge, which can be combined to solve new problems.

15.-RL emphasizes trade-offs between data efficiency, computational efficiency, and final performance while optimizing rewards.

16.-The frontier of RL research explores whether agents can discover the right abstractions and quickly learn in vast, open-ended environments.

17.-Goal conditioning and meta-learning are promising approaches for learning abstractions and adapting to new situations.

18.-Richer environments supporting learning of abstractions are needed beyond narrow tasks like individual games or simulations.

19.-Reward hypothesis suggests intelligence emerges from reward maximization, not necessarily specific objectives or algorithms like evolution.

20.-Reward signals for babies include basic needs, social interaction, and attention, driving learning and development.

21.-RL agents can learn from natural language task requests, with potential for combining language and RL research.

22.-Continual improvement relative to an agent's abilities, regardless of starting point, could be a measure of intelligence.

23.-Hierarchical RL agents may reward each other and pass information across layers while pursuing overall return optimization.

24.-Option keyboard allows combining cumulants (e.g., goals) linearly to generate new useful behaviors, with potential for nonlinear combinations.

25.-Inverse RL infers rewards from observed agent behavior but relies on expert data and requires the agent's own experimentation.

26.-Current research interests include never-ending RL without Markovian assumptions, formalizing benefits of hierarchical RL, and learning affordances.

27.-Sub-MDP analysis in hierarchical RL shows benefits when partitions are small, similar, and connected through a few states.

28.-Generalizing sub-MDP results to function approximation is an open question, potentially using complexity measures of function spaces.

29.-The talk emphasizes the potential of RL for building general AI agents and understanding intelligence through the reward is enough hypothesis.

30.-Open problems and future directions include continual learning, hierarchical RL, combining language and RL, and developing rich environments for learning abstractions.

