Knowledge Vault 7 /300 - xHubAI 11/06/2025
🔴MIDIENDO LA AGI : Benchmarks de razonamiento interactivo
< Resume Image >
Link to InterviewOriginal xHubAI Video

Concept Graph, Resume & KeyIdeas using Moonshot Kimi K2 :

graph LR classDef arc fill:#d4f1f9, font-weight:bold, font-size:14px; classDef intel fill:#f9d4d4, font-weight:bold, font-size:14px; classDef test fill:#d4f9d4, font-weight:bold, font-size:14px; classDef model fill:#f9f9d4, font-weight:bold, font-size:14px; classDef future fill:#e6d4f9, font-weight:bold, font-size:14px; classDef comm fill:#f9e6d4, font-weight:bold, font-size:14px; Main[Vault7-300] --> G1[ArcAGI Worlds
Probe AGI 1] Main --> G2[Camrath: Skill
Acquisition = Intelligence 2] Main --> G3[Benchmark: Core
Priors Only 3] Main --> G4[Human Baseline
Actions + Time 4] Main --> G5[Private Set
Blocks Leaks 5] Main --> G6[Preview Games
120 by 2026 6] Main --> G7[ArcPrize Seeks
Donors, Devs 7] Main --> G8[Plácido Mocks
Apple Claims 8] Main --> G9[Discord Hub
Open Papers 9] Main --> G10[July Sequel
Teased 10] G2 --> G11[Energy + Data
True Metrics 11] G2 --> G12[Crow Cognition
Challenges Anthropocentrism 12] G2 --> G13[Gato Shows
Cross-Domain Transfer 13] G2 --> G14[Atari Era
Flawed Rewards 14] G2 --> G15[McCarthy: Solve
Unseen Problems 15] G2 --> G16[Chollet 2019
Formalizes Skill 16] G2 --> G17[o3 Beat
ArcAGI-1 17] G1 --> G18[Static Tasks
Insufficient 18] G1 --> G19[Interactive Reasoning
Open Worlds 19] G1 --> G20[Locksmith Needs
No Manual 20] G1 --> G21[Core Priors:
Count, Geometry 21] G1 --> G22[Objectness Unifies
Pixel Bodies 22] G1 --> G23[Efficiency: AI vs
Human Actions 23] G1 --> G24[San Diego
400 Humans Validated 24] G1 --> G25[Unity Ditched
Python Framework 25] G2 --> G26[Superhuman ≠ AGI
Superintelligence 26] G2 --> G27[Overton Window
Gradual Acceptance 27] Main --> G28[Daily Updates
+ Deep Dives 28] G28 --> G29[Multi-Platform
Live Streams 29] G28 --> G30[Support via
Ko-fi, Shares 30] class G1,G3,G4,G5,G6,G18,G19,G20,G21,G22,G23,G24,G25 arc class G2,G11,G12,G13,G14,G15,G16,G17,G26,G27 intel class G7,G9,G10,G28,G29,G30 comm class G8 model class G1 future

Resume:

The transcript captures an informal yet dense live-streamed discussion led by PlácidoDoménech, host of the Spanish-language AI program InsideX. Opening with greetings and community updates, he frames the session as a relaxed dialogue rather than a formal debate, inviting viewers to share thoughts on recent controversies surrounding Apple’s “LLMs don’t reason” paper and the broader discourse on benchmarking artificial general intelligence. Menes stresses that the Apple paper and its viral reception exemplify reductionist tendencies in AI commentary, promising a future CDX episode to dissect the dogmatism and nihilism he perceives in today’s discourse.
Central to the stream is a detailed walk-through of a twenty-minute keynote by Greg Camrath, president of ArcPrize, outlining the upcoming ArcAGI-3 benchmark. Camrath argues that AGI measurement must target human-level generalization rather than narrow task mastery. He explains ArcAGI’s evolution from static, single-turn puzzles to an interactive, game-based environment where agents must explore unknown worlds without developer knowledge or internet access. The new benchmark, scheduled for full release in Q1 2026, will feature 120 procedurally generated mini-games—collectively named “World’s Fair”—designed so that neither the AI nor its creators have prior exposure, ensuring a true test of skill acquisition efficiency.
The conversation repeatedly circles the philosophical question of what counts as intelligence. Menes and chat participants challenge the anthropocentric yardstick, noting that humans may fail tasks machines ace and vice versa. They invoke Gödelian limits, consciousness paradoxes, and the sorites problem of gradual emergence to caution against binary labels like “reasoning” or “not reasoning.” Camrath’s operational approach—measure first, define later—is praised yet critiqued for sidestepping deeper epistemic issues. References to crows solving novel puzzles and to DeepMind’s Gato model underscore the tension between innate priors and learned capacities.
Practical concerns surface around energy efficiency, data priors, and benchmarking fairness. Camrath advocates for minimal core-knowledge assumptions—counting, geometry, object permanence, and theory of mind—while excluding language and cultural trivia. He invites open-source contributions, philanthropic funding, and adversarial testers, revealing ArcPrize’s pivot from Unity to a lightweight Python engine. Menes echoes the call for community participation, touting the growing Discord server and teasing a July sequel to his most-watched interview, whose guest remains unnamed but is hinted to be a pivotal figure in AI governance.
Closing reflections juxtapose corporate narratives: Apple’s perceived stagnation versus Google’s latent potential, Anthropic’s alignment focus, and OpenAI’s strategic alliances. Menes warns that society, especially in Europe and Spain, must prepare for augmented-human policy debates already incubating in elite circles. He urges viewers to eschew headline-driven fatalism, study primary sources, and engage politically, lest incremental revelations—the Overton window’s proverbial boiling frog—render them passive spectators to irreversible change.

30 Key Ideas:

1.- ArcAGI-3 introduces interactive game worlds to probe AGI without prior knowledge.

2.- Camrath defines intelligence as efficient skill acquisition on unseen tasks.

3.- Benchmark removes language, culture, symbols, testing only core priors.

4.- Humans provide baseline scores measured via actions and completion time.

5.- Private evaluation set bars internet access, preventing data leakage.

6.- Five preview games debut next month, 120 slated by 202

7.- ArcPrize seeks donors, testers, Python developers for lightweight engine.

8.- Plácido mocks Apple paper’s claim that LLMs lack reasoning entirely.

9.- Community Discord hosts free papers, courses, comics, fostering open dialogue.

10.- July sequel teased for most-viewed interview, topic undisclosed yet anticipated.

11.- Energy and training data proposed as denominators for true intelligence metrics.

12.- Crow cognition cited to challenge anthropocentric intelligence assumptions.

13.- Gato model demonstrates transfer learning across diverse tasks and domains.

14.- Atari benchmark era criticized for developer knowledge injection and dense rewards.

15.- McCarthy’s definition emphasizes solving problems never seen during training.

16.- Francois Chollet’s 2019 paper formalizes skill acquisition as intelligence measure.

17.- ArcAGI-1 puzzles surpassed by o3 model, prompting harder ArcAGI-2 release.

18.- Single-turn static tasks deemed insufficient for human-like intelligence assessment.

19.- Interactive reasoning requires agents to explore open worlds and infer goals.

20.- Locksmith game exemplifies need for discovery without instruction manuals.

21.- Core priors include counting up to ten, basic geometry, objectness, agentness.

22.- Objectness groups contiguous pixels into unified bodies for world modeling.

23.- Efficiency metric compares AI action counts against human baseline distributions.

24.- San Diego in-person testing of 400 humans validated every ArcAGI task solvability.

25.- Unity engine abandoned for custom Python framework targeting 64x64 grid worlds.

26.- Superhuman tasks exceeding human capability labeled superintelligence, not AGI.

27.- Overton window analogy warns of gradual acceptance of augmented human policies.

28.- Plácido advocates combining fast daily updates with occasional deep-dive specials.

29.- Live stream broadcasts simultaneously across YouTube, Twitch, LinkedIn, Rumble, Kik.

30.- Audience urged to support via Ko-fi, PayPal, or sharing content to sustain production.

Interviews by Plácido Doménech Espí & Guests - Knowledge Vault built byDavid Vivancos 2025