The End Of Knowledge - Vault 6/25 - CVPR - 2017 - A Closer Look at Memorization in Deep Networks

graph LR classDef main fill:#f9d9c9, font-weight:bold, font-size:14px classDef history fill:#d4f9d4, font-weight:bold, font-size:14px classDef techniques fill:#d4d4f9, font-weight:bold, font-size:14px classDef alphago fill:#f9f9d4, font-weight:bold, font-size:14px classDef future fill:#f9d4f9, font-weight:bold, font-size:14px Main[A Closer Look
at Memorization in
Deep Networks] Main --> A[Historical Context] Main --> B[Key Techniques] Main --> C[AlphaGo Breakthrough] Main --> D[Future Directions] A --> A1[2007: Go programs at
beginner level 1] A --> A2[Monte Carlo, reinforcement learning
showed potential 2] A --> A3[Go: simple rules, vast
search space 3] A --> A4[Mogo beat professional at
9x9 Go 17] A --> A5[2007 paper spurred MCTS
interest 29] B --> B1[Monte Carlo: random games
estimate position 4] B --> B2[MCTS improves Monte Carlo
over time 5] B --> B3[UCT balances exploration and
exploitation 6] B --> B4[RAVE generalizes positions for
faster estimates 8] B --> B5[Offline learning provides global
value, policy 9] B --> B6[Mogo: reinforcement learning with
MCTS search 10] C --> C1[AlphaGo: deep CNNs for
policy/value estimates 13] C --> C2[AlphaGos networks replaced rollouts,
focused search 14] C --> C3[AlphaGo defeated worlds top
players 15] C --> C4[Deep learning with MCTS
surpassed humans 16] C --> C5[AlphaGo maximizes winning probability
implicitly 23] C --> C6[AlphaGos CNNs challenging to
interpret 18] D --> D1[Adapting to opponents style:
future direction 19] D --> D2[Learning without human data
exploits weaknesses 24] D --> D3[StarCraft harder: complex inputs,
long horizons 25] D --> D4[Progress expected, requires large
training data 26] D --> D5[Robotics transfer promising but
challenging 27] D --> D6[Less data, simulation-to-reality transfer
important 28] class Main main class A,A1,A2,A3,A4,A5 history class B,B1,B2,B3,B4,B5,B6 techniques class C,C1,C2,C3,C4,C5,C6 alphago class D,D1,D2,D3,D4,D5,D6 future

Resume:

1.- In 2007, computer Go programs were only at beginner level using traditional search techniques like alpha-beta search.

2.- Monte Carlo search and reinforcement learning techniques were not mainstream in 2007 but showed potential for computer Go.

3.- Go has simple rules but a vast search space, making it challenging for traditional AI search techniques.

4.- Monte Carlo search estimates a position's value by playing random games and averaging the results.

5.- Monte Carlo tree search (MCTS) builds a search tree to improve the Monte Carlo evaluation over time.

6.- UCT (Upper Confidence bounds for Trees) balances exploration and exploitation in the MCTS search tree.

7.- MCTS has limitations - no generalization between positions, reliance on random playouts for value estimates.

8.- Rapid action value estimation (RAVE) generalizes across positions to provide faster Monte Carlo value estimates.

9.- Offline learning (supervised or reinforcement learning) can provide global value function and policy estimates.

10.- Mogo in 2007 combined offline reinforcement learning of a linear value function with online MCTS search.

11.- Using a strong policy for MCTS playouts surprisingly gave worse results than a weaker, more diverse policy.

12.- Crazy Stone and Zen used offline supervised learning of a policy to bias the MCTS search.

13.- AlphaGo used deep convolutional neural networks (CNNs) trained by supervised learning and reinforcement learning for policy/value estimates.

14.- AlphaGo's value network replaced rollouts, while the policy network focused search on promising moves.

15.- AlphaGo defeated the European champion, world champion, and world #1 ranked player in 2015-2017.

16.- Adding deep learning to MCTS led to a rapid performance increase, surpassing human level play in Go.

17.- Mogo in 2007 beat a professional at 9x9 Go and reached low dan level at 19x19 Go.

18.- AlphaGo's CNNs are challenging to interpret compared to search trees and traditional features.

19.- Adapting the search to the opponent's style is an interesting future direction not yet explored.

20.- More strategic rollout policies sometimes perform worse than simple ones due to needing diversity for accurate estimates.

21.- Computation spent on smarter value estimates often provides more benefit than additional rollouts.

22.- Asymmetric rollout policies for each player need to adapt to the opponent's style to be effective.

23.- AlphaGo aims to maximize winning probability, with strategies like short-term sacrifices emerging implicitly.

24.- Starting from scratch without human data could allow learning novel strategies to exploit human weaknesses.

25.- Games like StarCraft are much harder than Go due to complex inputs and long time horizons.

26.- Progress on complex games is expected, but requires large amounts of training data/self-play.

27.- Transferring learning techniques to robotics is promising but challenging due to constraints on real-world training.

28.- Adapting learning to use less data or transfer from simulation to reality is an important direction.

29.- The 2007 paper spurred interest in MCTS and other search/learning techniques for games and beyond.

30.- The combination of deep learning and MCTS has led to rapid progress and superhuman performance in Go.

Knowledge Vault built byDavid Vivancos 2024