The End Of Knowledge - Vault 7/334 - xHubAI - 14/07/2025 - 🚀El futuro de la AI ： Robots- Videojuegos y Agentes.AI

graph LR classDef keen fill:#ffd6a5, font-weight:bold, font-size:14px classDef agi fill:#fdffb6, font-weight:bold, font-size:14px classDef rl fill:#caffbf, font-weight:bold, font-size:14px classDef robot fill:#9bf6ff, font-weight:bold, font-size:14px classDef bench fill:#a0c4ff, font-weight:bold, font-size:14px classDef infra fill:#bdb2ff, font-weight:bold, font-size:14px classDef future fill:#ffc6ff, font-weight:bold, font-size:14px Main[Vault7-334] Main --> K1[Keen AGI beyond LLMs
via games & robotics 1]:::keen K1 --> K2[Six researchers advised by
Sutton & Alberta Plan 2]:::keen Main --> A1[Transformers miss core
animal learning dynamics 3]:::agi A1 --> A2[Atari 100K benchmark
forces 2-hour learning 4]:::agi Main --> R1[Sequential multitask avoids
catastrophic forgetting 5]:::rl R1 --> R2[Real 150–200 ms latency
breaks lab RL 6]:::rl R2 --> R3[Latency queues needed
in simulators 15]:::rl R1 --> R4[Continuous learning harms weights
buffers not enough 12]:::rl R4 --> R5[Offline RL risks
fantasy generalization 13]:::rl R1 --> R6[Transfer learning fails
after extensive training 14]:::rl Main --> B1[Eight-title Atari cycle
without task labels 8]:::bench B1 --> B2[Sticky actions & full
action sets for testing 9]:::bench B1 --> B3[Sparse Pitfall rewards
need curiosity not scores 10]:::bench B3 --> B4[Replace epsilon-greedy
with better exploration 11]:::bench Main --> P1[RoboTroller demo shows
reward & durability issues 7]:::robot P1 --> P2[Joystick latency needs
past action history 16]:::robot P1 --> P3[Score via camera
suffers lighting drift 17]:::robot P3 --> P4[Dev box with
fiducials before hardware 18]:::robot Main --> T1[CUDA graphs & cloud
replace hand-tuned kernels 19]:::infra T1 --> T2[Keen open-sources
RoboTroller & framework 25]:::infra Main --> C1[Atari unbiased pre-ML
challenge source 20]:::bench C1 --> C2[Humans focus playfield
not score 21]:::bench C1 --> C3[Frame skip four
causes buzzing motion 23]:::bench C1 --> C4[Recurrence shows no
benefit on Atari 24]:::bench Main --> F1[Host contrasts reward RL
with LLM emergence 26]:::future F1 --> F2[Rational Investment ep2
300 k views Aug 27]:::future F1 --> F3[Alicante meetup, Discord
growth, top guests 28]:::future F1 --> F4[Apple TV app &
monetization explored 29]:::future F1 --> F5[Discord feedback for
nightly streams, paper dives 30]:::future

Resume:

The host opens with thanks and casual greetings, then explains that the recording of the second part of the Rational Investment program took place in Madrid on Friday and will air in August. He notes his packed agenda, the upcoming Wednesday event, and the pleasure he derives from producing content despite irregular scheduling. The summer season is branded as X-Hab ahí, the fifth, promising continuity through July, August and September.
He transitions to the day’s InsideX episode featuring John Carmack, legendary developer of Doom and Quake, now leading Keen Technologies toward artificial general intelligence. After a brief technical setup, the host presses play on Carmack’s fifty-minute talk titled “The Future of AI, Robots, Videogames and AI Agents.”
Carmack recounts his journey from pioneering first-person shooters and GPU adoption to aerospace and VR latency optimization at Oculus. Flattered by an early recruitment attempt from OpenAI’s founders, he pivoted full-time to AGI research, eventually forming Keen Technologies with six researchers and advisor Richard Sutton. He positions large language models as powerful yet insufficient for true intelligence, arguing that transformer architectures miss fundamental learning dynamics present even in cats and dogs.
The talk then drills into reinforcement learning within the Atari 100K benchmark, advocating for low-data, high-velocity experimentation. Carmack explains why the classic 200-million-frame regime is misleading, how sequential multitask learning catastrophically forgets, and why real-world latency obliterates lab-perfect scores. A physical demo shows a robotic arm learning to play real Atari hardware via camera and joystick, exposing latency, reward-detection and durability challenges the community rarely confronts.
He proposes a new sequential benchmark cycling eight games three times with strict rules: no task IDs, sticky actions, full action sets, no separate evaluation, and explicit episode boundaries. The goal is to force agents to accumulate and transfer knowledge like humans, addressing reward sparsity, exploration, and continuous learning damage. Carmack plans open-source releases of both simulation and physical-robot code, hoping to shift research culture toward harder, more realistic problems.
The host returns to reflect that Carmack’s bet on reward-only, game-based reinforcement learning is bold but leaves open questions about bridging to broader intelligence. He contrasts this path with LLMs, notes Sutton’s alignment with the Alberta Plan, and teases upcoming guests including Emilio Soria Olivas and Hector Moreno. He invites Discord feedback, hints at a possible late-night stream, and signs off reminding viewers to subscribe and support the community.

30 Key Ideas:

1.- Carmack’s new company Keen pursues AGI beyond LLMs via games and robotics.

2.- Keen employs six researchers advised by Sutton, creator of the Alberta Plan.

3.- Carmack believes transformers miss core learning dynamics present in animals.

4.- Atari 100K benchmark forces agents to learn games in only two hours of play.

5.- Sequential multitask learning must prevent catastrophic forgetting across games.

6.- Real-world latency of 150–200 ms breaks many lab-perfect RL algorithms.

7.- Physical robot joystick demo exposes reward detection and durability issues.

8.- New benchmark cycles eight Atari titles three times without task labels.

9.- Sticky actions and full action sets ensure robust, reproducible testing.

10.- Sparse rewards like Pitfall demand intrinsic curiosity rather than scores.

11.- Exploration strategies must replace naive epsilon-greedy random actions.

12.- Continuous learning damages weights; replay buffers may not fully solve this.

13.- Offline RL risks fantasy generalization without live environment feedback.

14.- Transfer learning failure persists even after extensive prior game training.

15.- Latency queues should be added to simulators to match real-world delays.

16.- Agent needs past action history to handle joystick phantom commands.

17.- Score detection via camera suffers from lighting and tablecloth drift.

18.- Custom dev box with fiducials eases reward reading before hardware trials.

19.- CUDA graphs and cloud training replaced early hand-tuned CUDA kernels.

20.- Atari games provide unbiased challenges because they predate ML research.

21.- Humans focus on playfield, not score, guiding intrinsic reward design.

22.- Modern controllers’ million-action space dwarfs Atari’s discrete 18 actions.

23.- Frame skip four regime causes buzzing motion unlike human-directed goals.

24.- Recurrence shows no benefit on Atari; richer tasks are needed for RNNs.

25.- Keen open-sources 3D-printed RoboTroller and RL framework post-conference.

26.- Host contrasts Carmack’s reward-centric RL with emergent LLM capabilities.

27.- Upcoming Rational Investment episode two promises 300 k views in August.

28.- Summer events include Alicante meetup, Discord growth, and top-tier guests.

29.- Platform independence and Apple TV app are explored for future monetization.

30.- Community feedback requested on Discord for nightly streams and paper deep dives.

Interviews by Plácido Doménech Espí & Guests - Knowledge Vault built byDavid Vivancos 2025