Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:
Resume:
1.- In 2007, computer Go programs were only at beginner level using traditional search techniques like alpha-beta search.
2.- Monte Carlo search and reinforcement learning techniques were not mainstream in 2007 but showed potential for computer Go.
3.- Go has simple rules but a vast search space, making it challenging for traditional AI search techniques.
4.- Monte Carlo search estimates a position's value by playing random games and averaging the results.
5.- Monte Carlo tree search (MCTS) builds a search tree to improve the Monte Carlo evaluation over time.
6.- UCT (Upper Confidence bounds for Trees) balances exploration and exploitation in the MCTS search tree.
7.- MCTS has limitations - no generalization between positions, reliance on random playouts for value estimates.
8.- Rapid action value estimation (RAVE) generalizes across positions to provide faster Monte Carlo value estimates.
9.- Offline learning (supervised or reinforcement learning) can provide global value function and policy estimates.
10.- Mogo in 2007 combined offline reinforcement learning of a linear value function with online MCTS search.
11.- Using a strong policy for MCTS playouts surprisingly gave worse results than a weaker, more diverse policy.
12.- Crazy Stone and Zen used offline supervised learning of a policy to bias the MCTS search.
13.- AlphaGo used deep convolutional neural networks (CNNs) trained by supervised learning and reinforcement learning for policy/value estimates.
14.- AlphaGo's value network replaced rollouts, while the policy network focused search on promising moves.
15.- AlphaGo defeated the European champion, world champion, and world #1 ranked player in 2015-2017.
16.- Adding deep learning to MCTS led to a rapid performance increase, surpassing human level play in Go.
17.- Mogo in 2007 beat a professional at 9x9 Go and reached low dan level at 19x19 Go.
18.- AlphaGo's CNNs are challenging to interpret compared to search trees and traditional features.
19.- Adapting the search to the opponent's style is an interesting future direction not yet explored.
20.- More strategic rollout policies sometimes perform worse than simple ones due to needing diversity for accurate estimates.
21.- Computation spent on smarter value estimates often provides more benefit than additional rollouts.
22.- Asymmetric rollout policies for each player need to adapt to the opponent's style to be effective.
23.- AlphaGo aims to maximize winning probability, with strategies like short-term sacrifices emerging implicitly.
24.- Starting from scratch without human data could allow learning novel strategies to exploit human weaknesses.
25.- Games like StarCraft are much harder than Go due to complex inputs and long time horizons.
26.- Progress on complex games is expected, but requires large amounts of training data/self-play.
27.- Transferring learning techniques to robotics is promising but challenging due to constraints on real-world training.
28.- Adapting learning to use less data or transfer from simulation to reality is an important direction.
29.- The 2007 paper spurred interest in MCTS and other search/learning techniques for games and beyond.
30.- The combination of deep learning and MCTS has led to rapid progress and superhuman performance in Go.
Knowledge Vault built byDavid Vivancos 2024