The End Of Knowledge - Vault 1 - Lex 100+ / 116 (03/07/2025) -Dylan Patel & Nathan Lambert : DeepSeek- China- OpenAI- NVIDIA- xAI- TSMC- Stargate- and AI Megaclusters

graph LR classDef open fill:#d4f9d4,font-weight:bold,font-size:14px classDef cost fill:#f9d4d4,font-weight:bold,font-size:14px classDef hw fill:#d4d4f9,font-weight:bold,font-size:14px classDef geo fill:#f9f9d4,font-weight:bold,font-size:14px classDef safety fill:#f9d4f9,font-weight:bold,font-size:14px classDef future fill:#d4f9f9,font-weight:bold,font-size:14px Main[DeepSeek Impact] Main --> O1[Open rivals GPT-4 27× cheaper 1] O1 -.-> G1[Open] Main --> O2[MoE latent shrinks mem compute 2] O2 -.-> G2[Cost] Main --> O3[MIT weights foster global replication 3] O3 -.-> G1 Main --> H1[2k H800 GPUs train 4] H1 -.-> G3[HW] Main --> H2[Export curbs choke H100 access 5] H2 -.-> G4[Geo] Main --> H3[TSMC Taiwan pivotal chip security 6] H3 -.-> G4 Main --> C1[Stargate gigawatt future training 7] C1 -.-> G3 Main --> C2[Post-train self-play tops pre-train 8] C2 -.-> G2 Main --> S1[Safety culture slows Anthropic 9] S1 -.-> G5[Safety] Main --> S2[Smuggling cloud bypass bans 10] S2 -.-> G4 Main --> S3[Distillation murky OpenAI copying 11] S3 -.-> G1 Main --> G4a[Export curbs risk Taiwan conflict 12] G4a -.-> G4 Main --> C3[200× inference price drop 13] C3 -.-> G2 Main --> T1[KV cache limits long context 14] T1 -.-> G3 Main --> M1[NVIDIA dip panic not collapse 15] M1 -.-> G3 Main --> G1a[Google TPU largest internal 16] G1a -.-> G1 Main --> C4[AWS early cheap user win 17] C4 -.-> G2 Main --> R1[RL verifiable rewards chain 18] R1 -.-> G2 Main --> R2[Human prefs guide post-train 19] R2 -.-> G5 Main --> F1[One million GPU gigawatt 20] F1 -.-> G6[Future] Main --> F2[Arizona Texas gas race 21] F2 -.-> G3 Main --> F3[Liquid cooling mandatory Blackwell 22] F3 -.-> G3 Main --> F4[China $160 bn 2025 plan 23] F4 -.-> G4 Main --> O4[Open Tulu3 frontier public 24] O4 -.-> G1 Main --> R3[RLHF system prompts post-train 25] R3 -.-> G5 Main --> F5[Grid costs beat gen 26] F5 -.-> G3 Main --> A1[Agent nine reliability self-drive 27] A1 -.-> G6 Main --> A2[Agents slash dev SaaS cost 28] A2 -.-> G6 Main --> S4[Cultural alignment hides backdoors 29] S4 -.-> G5 Main --> F6[AGI timeline lags compute 30] F6 -.-> G6 G1[Open] --> O1 G1 --> O3 G1 --> S3 G1 --> G1a G2[Cost] --> O1 G2 --> O2 G2 --> C2 G2 --> C3 G2 --> C4 G2 --> R1 G3[HW] --> H1 G3 --> T1 G3 --> M1 G3 --> C1 G3 --> F2 G3 --> F3 G3 --> F5 G4[Geo] --> H2 G4 --> H3 G4 --> S2 G4 --> G4a G4 --> F4 G5[Safety] --> S1 G5 --> R2 G5 --> S4 G6[Future] --> F1 G6 --> A1 G6 --> A2 G6 --> F6 class O1,O3,S3,O4 open class O2,C2,C3,C4,R1 cost class H1,T1,M1,C1,F2,F3,F5 hw class H2,H3,S2,G4a,F4 geo class S1,R2,S4 safety class F1,A1,A2,F6 future

Resume:

The conversation explores the seismic impact of DeepSeek’s V3 and R1 releases, unpacking how a Chinese hedge-fund spinoff trained competitive reasoning models on restricted H800 GPUs while publishing open weights under MIT license. Dylan and Nathan detail the mixture-of-experts architecture, multi-head latent attention, custom NCCL scheduling, and extreme sparsity that cut training cost to $5 million and inference to $2 per million tokens. They contrast this with OpenAI’s closed O3 Mini, showing how open weights, permissive licensing, and detailed papers pressure Western labs to accelerate open-sourcing and rethink safety culture.

Geopolitically, the episode frames DeepSeek as a Cold-War catalyst, illustrating how U.S. export controls on H100/H800/H20 GPUs and EUV lithography aim to slow China’s compute advantage while China’s rumored 50k-GPU cluster and new trillion-RMB subsidy signal escalation. The speakers warn that limiting Taiwan-based TSMC’s foundry dominance risks triggering military action if China feels cornered, yet concede that America’s multi-gigawatt Stargate and Memphis clusters are racing to secure AI supremacy.

Looking forward, they foresee 500k-GPU clusters, post-training self-play, and agentic robotics consuming more flops than pre-training ever did. Nathan advocates open-source RL recipes like Tulu 3, while Dylan tracks supply-chain bottlenecks in power, water cooling, and optics. Both agree that super-intelligence will arrive gradually via cheaper reasoning, but warn of cultural backdoors and techno-authoritarianism if only a few control the models.

30 Key Ideas:

1.- DeepSeek V3/R1 open-weight reasoning models rival GPT-4 at 27× lower cost.

2.- Mixture-of-experts plus latent attention shrinks memory and compute needs.

3.- Open weights under MIT license foster global replication and innovation.

4.- Training reportedly used 2k H800 GPUs despite wider 50k cluster rumors.

5.- U.S. export controls throttle China access to H100/H800/H20 tiers.

6.- TSMC foundry dominance makes Taiwan central to semiconductor security.

7.- Stargate and Memphis clusters target multi-gigawatt power for future training.

8.- Post-training self-play will soon eclipse pre-training in compute demand.

9.- Safety culture divide slows Anthropic releases versus DeepSeek speed.

10.- Smuggling and cloud rentals still funnel GPUs into China despite bans.

11.- Distillation from OpenAI outputs is common but legally murky practice.

12.- Export curbs risk forcing China toward military action over Taiwan chips.

13.- Cost curves show 1 200× inference price drop since GPT-3 launch.

14.- Memory-bound KV cache limits long-context reasoning scaling.

15.- NVIDIA stock dip reflects panic, not long-term demand collapse.

16.- Google TPU clusters remain largest yet internally focused.

17.- AWS dominance stems from early, cheap, and user-friendly services.

18.- Reinforcement learning with verifiable rewards unlocks emergent chain-of-thought.

19.- Human preference data still guides post-training safety and usability.

20.- Future clusters may reach one million GPUs and gigawatt power draw.

21.- Arizona, Texas, and Louisiana sites race to build natural-gas plants.

22.- Liquid cooling becomes mandatory for next-gen Blackwell GPUs.

23.- China’s 2025 AI subsidy plan totals ~$160 billion RMB.

24.- Open-source recipes like Tulu 3 push frontier with public data and code.

25.- RLHF and system prompts shape model behavior after pre-training.

26.- Energy transmission costs now exceed generation in some U.S. regions.

27.- Agentic AI faces reliability “nines” problem like self-driving cars.

28.- Software engineering agents will slash development costs and SaaS reliance.

29.- Cultural alignment risks embedding hidden persuasion or backdoors.

30.- AGI timeline debated: capabilities may outpace deployable compute.

Interview byLex Fridman| Custom GPT and Knowledge Vault built byDavid Vivancos 2025