The End Of Knowledge - Vault 7/358 - xHubAI - 08/08/2025 - 🔴GPT-OSS Modelos Open de OpenAI Lo bueno y lo malo

graph LR classDef release fill:#d4f9d4, font-weight:bold, font-size:14px; classDef perf fill:#f9d4d4, font-weight:bold, font-size:14px; classDef safety fill:#d4d4f9, font-weight:bold, font-size:14px; classDef policy fill:#f9f9d4, font-weight:bold, font-size:14px; classDef infra fill:#f9d4f9, font-weight:bold, font-size:14px; classDef future fill:#d4f9f9, font-weight:bold, font-size:14px; Main[GPT-OSS Release] Main --> P1[Apache weights open data hidden 1] P1 -.-> G1[Release] Main --> P2[Context 128 k lags QWEN 2] P2 -.-> G2[Perf] Main --> P3[Text-only lacks vision 3] P3 -.-> G2 Main --> P4[120 B matches O4-Mini math 4] P4 -.-> G2 Main --> P5[Slider trades speed accuracy 5] P5 -.-> G2 Main --> P6[Groq 500 t/s 120 B 6] P6 -.-> G3[Infra] Main --> P7[MoE 5.1 % active 7] P7 -.-> G2 Main --> P8[RoPE GQA speed 8] P8 -.-> G2 Main --> P9[SFT RL from big 9] P9 -.-> G2 Main --> P10[Safety blocks bio chem nuke 10] P10 -.-> G4[Safety] Main --> P11[Jailbreak via fine-tune 11] P11 -.-> G4 Main --> P12[Red-team gain limited 12] P12 -.-> G4 Main --> P13[Apache shields China 13] P13 -.-> G5[Policy] Main --> P14[Counter China OSS surge 14] P14 -.-> G5 Main --> P15[QWEN 4 B beats 20 B 15] P15 -.-> G2 Main --> P16[Synthetic data avoids suits 16] P16 -.-> G1 Main --> P17[GPT-4O tokenizer reused 17] P17 -.-> G1 Main --> P18[Ollama LM-Studio VLLM etc 18] P18 -.-> G3 Main --> P19[Azure Bedrock host 19] P19 -.-> G3 Main --> P20[Playground low med high 20] P20 -.-> G3 Main --> P21[Tau-Bench near O4-Mini 21] P21 -.-> G2 Main --> P22[Health-Bench 120 B tops O3 22] P22 -.-> G2 Main --> P23[Humanity Exam 4-Turbo 23] P23 -.-> G2 Main --> P24[20 B beats 120 B science 24] P24 -.-> G2 Main --> P25[No vision OCR fail 25] P25 -.-> G2 Main --> P26[Truncation hurts long summary 26] P26 -.-> G2 Main --> P27[Red-team global challenge 27] P27 -.-> G4 Main --> P28[Prep Framework updated 28] P28 -.-> G5 Main --> P29[HF Rust Python TS 29] P29 -.-> G3 Main --> P30[ONNX GPU local 30] P30 -.-> G3 Main --> P31[Gemma-3N 4 B vision 31] P31 -.-> G6[Future] Main --> P32[LinkedIn AI populists lie 32] P32 -.-> G5 Main --> P33[EU Act slows uptake 33] P33 -.-> G5 Main --> P34[Three-tier education push 34] P34 -.-> G5 Main --> P35[Discord 500 Sept course 35] P35 -.-> G3 Main --> P36[GPT-5 mini regular nano 36] P36 -.-> G6 Main --> P37[5 jump smaller 6 big 37] P37 -.-> G6 Main --> P38[Anthropic Google quiet wins 38] P38 -.-> G6 Main --> P39[Microsoft Azure exclusive 39] P39 -.-> G3 Main --> P40[US counters China OSS 40] P40 -.-> G5 Main --> P41[Altman shifts open stance 41] P41 -.-> G5 Main --> P42[Ilya warns free weights 42] P42 -.-> G4 Main --> P43[Code agents security channel 43] P43 -.-> G6 Main --> P44[Agent Kit PDF free 44] P44 -.-> G6 Main --> P45[SLMs edge multi-agent 45] P45 -.-> G6 Main --> P46[Distill 4 B rivals 20 B 46] P46 -.-> G6 Main --> P47[Local cuts privacy risk 47] P47 -.-> G6 Main --> P48[Benchmark brands ignore hype 48] P48 -.-> G6 Main --> P49[50 to 300 k views 49] P49 -.-> G3 Main --> P50[First weights since GPT-2 50] P50 -.-> G1 G1[Release] --> P1 G1 --> P16 G1 --> P17 G1 --> P50 G2[Perf] --> P2 G2 --> P3 G2 --> P4 G2 --> P5 G2 --> P7 G2 --> P8 G2 --> P9 G2 --> P15 G2 --> P21 G2 --> P22 G2 --> P23 G2 --> P24 G2 --> P25 G2 --> P26 G3[Infra] --> P6 G3 --> P18 G3 --> P19 G3 --> P20 G3 --> P29 G3 --> P30 G3 --> P35 G3 --> P39 G3 --> P49 G4[Safety] --> P10 G4 --> P11 G4 --> P12 G4 --> P27 G4 --> P42 G5[Policy] --> P13 G5 --> P14 G5 --> P28 G5 --> P32 G5 --> P33 G5 --> P34 G5 --> P40 G5 --> P41 G6[Future] --> P31 G6 --> P36 G6 --> P37 G6 --> P38 G6 --> P43 G6 --> P44 G6 --> P45 G6 --> P46 G6 --> P47 G6 --> P48 class P1,P16,P17,P50 release class P2,P3,P4,P5,P7,P8,P9,P15,P21,P22,P23,P24,P25,P26 perf class P10,P11,P12,P27,P42 safety class P13,P14,P28,P32,P33,P34,P40,P41 policy class P6,P18,P19,P20,P29,P30,P35,P39,P49 infra class P31,P36,P37,P38,P43,P44,P45,P46,P47,P48 future

Resume:

The speaker opens by thanking the audience for the success of “Rational Investment Round 2,” underlining that 99.9 % of participants are still engaged. He announces a double session: first, an analysis of OpenAI’s new “GPT-OSS” open-weights models (20 B and 120 B parameters), then a live viewing of the GPT-5 launch. He warns that hype is drowning critical thought; AI chatter on LinkedIn is dominated by populists who parrot marketing slogans without understanding data pipelines, evaluation benchmarks or legal risk. Europe’s AI Act is already being mis-explained by self-proclaimed experts, and the speaker fears that mediocrity is becoming the accepted norm. OpenAI’s return to “open” is therefore less a philanthropic gesture than a strategic reply to Chinese labs such as DeepSeek, Kimi and QWEN that have released fully open models under permissive licences. The talk will therefore compare weights, licences, context length, reasoning modes, safety alignment and downstream tooling, and will ask whether GPT-OSS really democratises AI or simply offers a sanitised, US-government-friendly version whose 128 k context and Apache licence still lag behind Asian rivals.
After dissecting architecture diagrams, the speaker concludes that GPT-OSS is engineered for safety first: multi-expert mixture-of-experts, rotary positional embeddings, grouped-query attention and supervised fine-tuning on synthetic data produced by larger internal OpenAI models. Benchmarks show the 120 B model approaching O4-Mini on maths (AIME 2024) and tool-calling (Tau-Bench), while the 20 B variant matches O3-Mini on science and health tasks yet remains strictly text-only and limited to 128 k tokens—half of what QWEN 32 B already offers. The Apache licence is not fully “open source” because it carries litigation shields for OpenAI and export clauses that discourage Chinese re-use. More importantly, the weights are released but the training corpus, data-filtering recipes and alignment prompts are withheld, so malicious actors could still fine-tune the model on sensitive biology or cyber-security corpora; OpenAI’s own red-team report admits that after extensive adversarial fine-tuning the model did not reach dangerous capability levels, but the speaker stresses that this is no guarantee once the weights circulate on torrents. He praises the adjustable “reasoning” slider (low, medium, high) that lets developers trade latency for accuracy, and notes that Groq already serves the 120 B model at 500 tokens/s and the 20 B at 1 000 tokens/s, making them attractive for local agent loops orchestrated via LM-Studio, Ollama or ONNX.
Looking forward, the speaker positions GPT-OSS as a stop-gap: China’s 4 B “thinking” models distilled from larger teachers already outperform the 20 B OSS on several benchmarks, proving that size is no longer destiny. The real battlefield will be multi-agent frameworks where dozens of small, specialised models collaborate; OpenAI’s move is therefore defensive, aimed at regaining narrative control before GPT-5 arrives. He predicts GPT-5 will be a unified frontier model (mini, regular, nano) that swallows multimodal, tool-use and reasoning into one endpoint, but the leap will be incremental compared with the jump from GPT-3.5 to GPT-4; the truly disruptive shift will come from Google Gemini 2.5, Anthropic Claude-4 Opus and China’s next wave. Until then, developers should treat GPT-OSS as a reliable, safety-aligned, medium-cost option for narrow workflows, but continue benchmarking QWEN, Gemma-3N and Mistral for richer context, vision and full open-source transparency. The session ends with an invitation to the live GPT-5 viewing party and a reminder that critical thinking, not brand loyalty, will decide which ecosystem ultimately delivers human-centric AI.

Key Ideas:

1.- GPT-OSS 20 B and 120 B released under Apache licence, weights open but training data withheld.

2.- 128 k context length lags behind QWEN 32 B 256 k, limiting long-doc tasks.

3.- Text-only models lack multimodal vision, unlike Gemma-3N or Gemini-Pro.

4.- 120 B reaches O4-Mini maths score on AIME 2024; 20 B equals O3-Mini.

5.- Adjustable reasoning slider trades latency for accuracy in production.

6.- Groq serves 120 B at 500 tokens/s, 20 B at 1 000 tokens/s via GroqFlow.

7.- Mixture-of-experts architecture activates 5.1 % of 117 B params per token.

8.- Rotary positional embedding and grouped-query attention optimise throughput.

9.- Supervised fine-tuning plus reinforcement learning from larger internal models.

10.- Safety alignment blocks bio/chem/nuclear prompts but can be jail-broken via fine-tuning.

11.- Red-team evaluation shows limited capability gain after malicious fine-tuning.

12.- Apache licence includes litigation shields discouraging Chinese commercial reuse.

13.- OpenAI move responds to Chinese open-source surge: DeepSeek, Kimi, QWEN, Mistral.

14.- QWEN 4 B “thinking” distilled model reportedly beats GPT-OSS 20 B on benchmarks.

15.- GPT-OSS training relied heavily on synthetic data to avoid copyright lawsuits.

16.- Tokenizer reused from GPT-4O; full tokenizer release promised later.

17.- Models compatible with Ollama, LM-Studio, VLLM, Azure, AWS, NVIDIA, AMD, Apple Metal.

18.- Azure AI Foundry and AWS Bedrock already host endpoints for enterprise use.

19.- Playground interface offers low/medium/high reasoning modes without code changes.

20.- Function-calling score on Tau-Bench nears O4-Mini, useful for agent workflows.

21.- Health-Bench evaluation shows 120 B surpasses O3 on medical question sets.

22.- Humanity Last Exam places 120 B between GPT-4 and GPT-4-Turbo levels.

23.- 20 B model outperforms 120 B on some science tasks, showing size efficiency.

24.- Models lack vision, thus unsuitable for image-to-text or OCR-heavy pipelines.

25.- Context window truncation affects long-form summarisation and code-base queries.

26.- OpenAI hosts global red-team challenge to find residual safety risks.

27.- Preparedness Framework document updated alongside model release for policy makers.

28.- Weights released on Hugging Face with Rust, Python and TypeScript bindings.

29.- ONNX runtime on Windows enables local GPU inference without cloud calls.

30.- Gemma-3N 4 B multimodal model cited as superior for on-device vision tasks.

31.- Speaker criticises LinkedIn AI populists for spreading misinformation about AI Act.

32.- Europe’s AI Act compliance debated; legal uncertainty slows adoption.

33.- Speaker advocates three-tier education: user, business, technical levels.

34.- Community Discord nears 500 members; plans September AI engineering courses.

35.- Live GPT-5 viewing scheduled at 18:30 with five Spanish AI experts.

36.- GPT-5 expected in mini/regular/nano tiers, unifying modalities into one endpoint.

37.- Speaker predicts GPT-5 jump smaller than GPT-3.5 to GPT-4; GPT-6 will be disruptive.

38.- Anthropic and Google models quietly outperform OpenAI on several benchmarks.

39.- Microsoft retains exclusive cloud rights; OpenAI depends on Azure infrastructure.

40.- US political narrative frames open-source release as counter to Chinese dominance.

41.- Sam Altman’s congressional testimonies reveal shifting stance on open-source value.

42.- Ilya Sutskever’s Tel Aviv chat highlights safety concerns over unrestricted weights.

43.- Speaker plans new engineering channel focused on code, agents, security, frameworks.

44.- Agentic AI Kit PDF compiles free guides from Microsoft, Amazon, Anthropic for developers.

45.- Small language models seen as future for multi-agent systems at edge.

46.- Distillation techniques allow 4 B models to rival 20 B while cutting compute.

47.- Local execution reduces privacy risk but demands careful hardware optimisation.

48.- Speaker urges audience to benchmark multiple models rather than follow brand hype.

49.- Community growth: from 50 monthly listeners to 300 000 projected views per episode.

50.- Historical moment claimed: first time OpenAI releases weights since GPT-2 era.

Interviews by Plácido Doménech Espí & Guests - Knowledge Vault built byDavid Vivancos 2025