Remember when DeepSeek dropped R1 for $6 million and Nvidia lost 17% in a day? They just did it again.
On December 1, 2025—the Monday of NeurIPS week—DeepSeek released V3.2 and V3.2-Speciale. A 685-billion parameter model that matches or beats GPT-5 High and Gemini 3 Pro on math and coding benchmarks. Open source. Under MIT license. With a new Sparse Attention technique that cuts inference costs by 70%.
Not iterating. Leapfrogging.
The Numbers That Matter
- AIME 2025: 96.0% (GPT-5 High: 94.6%, Gemini 3 Pro: 95.0%)
- HMMT 2025: 99.2% (Gemini 3 Pro: 97.5%)
- International Math Olympiad: 35/42 points—Gold medal status
- International Olympiad in Informatics: 492/600—Gold medal, ranked 10th
- ICPC World Finals: Solved 10/12 problems, placed 2nd
- Codeforces Rating: 2701 ("Grandmaster"—top 1% of humans)
Let's be clear about what this means. The International Mathematical Olympiad is the most prestigious math competition in the world. High school students train for years. Countries send their six best. Getting gold requires solving extraordinarily difficult proofs under time pressure.
DeepSeek's model scored gold-medal level. Not "approaching human performance." Exceeding most humans who dedicate their lives to competitive mathematics.
Sparse Attention: The Technical Breakthrough
But the benchmarks aren't even the most interesting part. The cost structure is.
DeepSeek introduced "Sparse Attention" (DSA)—a technique that fundamentally changes how the model processes long contexts. Here's what it does:
Standard transformer attention scales as O(L²)—quadratic with sequence length. Double your context, quadruple your compute. This is why long-context models are expensive.
DeepSeek's Sparse Attention reduces this to O(Lk), where k is much smaller than L. The technique uses two components:
- Lightning Indexer: Computes which tokens are most relevant to the current query
- Fine-grained Token Selection: Retrieves only the top-k most important key-value pairs (2,048 per query token)
The result? For 128K context sequences:
- Prefill cost: $0.70 → $0.20 per million tokens
- Decoding cost: $2.40 → $0.80 per million tokens
- No significant performance degradation on long-context benchmarks
Benchmark Breakdown: Who Wins Where
Let's get specific about the comparisons:
| Benchmark | DeepSeek V3.2 | GPT-5 High | Gemini 3 Pro |
|---|---|---|---|
| AIME 2025 (Math) | 96.0% | 94.6% | 95.0% |
| HMMT 2025 (Math) | 99.2% | — | 97.5% |
| IMO 2025 (Score/42) | 35 (Gold) | — | — |
| Codeforces Rating | 2701 | — | 2708 |
| Humanity's Last Exam | 30.6 | — | 37.7 |
| MMLU-Pro | 85.0 | — | 85.0 |
The pattern is clear: DeepSeek leads on math and competitive programming. Gemini leads on "Humanity's Last Exam" (HLE)—a benchmark specifically designed to be difficult for AI. They're roughly tied on general knowledge (MMLU-Pro).
The Cost Comparison That Matters
Here's where it gets uncomfortable for OpenAI:
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Notes |
|---|---|---|---|
| DeepSeek V3.2 | $0.14 | $0.28 | Open source |
| DeepSeek Reasoner | $0.55 | $2.19 | V3.2-Speciale |
| GPT-4o | $2.50 | $10.00 | API pricing |
| Claude Sonnet | $3.00 | $15.00 | API pricing |
| OpenAI o1 | $15.00 | $60.00 | Reasoning model |
DeepSeek's reasoning model costs about $2 per million tokens. OpenAI's o1 costs $60. That's a 30x difference for comparable reasoning capability.
As one analysis put it: "OpenAI charges 10-25x more than DeepSeek for comparable performance, requiring customers to value factors beyond raw capability metrics."
What r/LocalLLaMA Is Saying
The AI community reaction has been... complicated.
The enthusiasts:
"DeepSeek V3.2 is the versatile generalist everyone's been waiting for. It handles coding, math, and general reasoning at a fraction of the cost."
The skeptics:
"Early user sentiment is mixed—some call it 'frontier at last' while others find the chat UI experience underwhelming compared to benchmarks."
The technical community: Impressed by the Sparse Attention innovation. One moderator of r/DeepSeek and r/LocalLLaMA discovered new special tokens enabling "thinking in tool-use"—the ability to reason while executing code, searching, and manipulating files.
The geopolitical observers: DeepSeek continues to demonstrate that US export controls haven't stopped China from producing frontier AI. If anything, constraints may have forced more efficient approaches.
The Caveats (Because There Are Always Caveats)
- Humanity's Last Exam: Gemini 3 Pro significantly outperforms (37.7 vs 30.6)
- Chat experience: Some users report benchmarks don't reflect actual use quality
- Temporary availability: deepseek-reasoner endpoint expires December 15, 2025
- Chinese regulations: Model operates under different content guidelines
- Benchmark gaming: Models may be optimized for specific tests
And the elephant in the room: DeepSeek is a Chinese company. For some use cases—government, defense, certain enterprise applications—that's a non-starter regardless of performance.
What This Means (Beyond the Benchmarks)
For the AI Industry
The trillion-dollar infrastructure narrative keeps getting harder to defend. If a Chinese lab under export restrictions can produce frontier models at 10-20x lower cost, what exactly is the massive spending buying?
For Developers
Open-source, MIT-licensed frontier AI is now available. You can download the weights, run them locally, fine-tune them, deploy them without API dependency. That changes the economics of AI development fundamentally.
For OpenAI/Anthropic
The "safety through closure" argument gets harder when open-source models match your performance. If DeepSeek can release weights and still maintain their position, what's the moat for closed models?
For Export Controls
DeepSeek built this using export-controlled H800 chips—the downgraded version of H100s. The strategy of starving China of compute hasn't stopped frontier AI development. It may have accelerated efficiency innovation.
The Bottom Line
DeepSeek V3.2 isn't just another model release. It's a data point that challenges fundamental assumptions about AI economics:
- Frontier AI doesn't require unlimited compute
- Efficiency innovations can compress costs by 70%+
- Open source can match closed models
- Export controls haven't stopped Chinese AI development
The $6 million model that crashed Nvidia's stock was just the beginning. DeepSeek is now matching GPT-5 at a fraction of the cost, releasing everything open source, and demonstrating efficiency techniques that make long-context AI economically viable.
Silicon Valley's trillion-dollar infrastructure narrative isn't just being questioned. It's being actively disproven, one benchmark at a time.
Frequently Asked Questions
How does DeepSeek V3.2 compare to GPT-5?
DeepSeek V3.2 scored 96.0% on AIME 2025, surpassing GPT-5 High's 94.6%. On the International Mathematical Olympiad, V3.2-Speciale achieved 35/42 points (gold medal). On Codeforces, it achieved a 2701 "Grandmaster" rating. GPT-5 and Gemini still lead on some benchmarks like Humanity's Last Exam.
What is DeepSeek Sparse Attention?
DeepSeek Sparse Attention (DSA) is a technique that reduces attention complexity from O(L²) to O(Lk) by selecting only the most relevant tokens. It uses a "lightning indexer" and fine-grained token selection to reduce 128K context inference costs by 70%, increase speed by 3.5x, and reduce memory usage by 70%.
Is DeepSeek V3.2 open source?
Yes, DeepSeek V3.2 is released under an MIT license as fully open source, including model weights and training methodology. This contrasts with GPT-5 and Claude which are API-only. However, the deepseek-reasoner endpoint (V3.2-Speciale) has a temporary availability window ending December 15, 2025.
How much cheaper is DeepSeek V3.2 than GPT-5?
DeepSeek V3.2 API costs approximately $0.14-0.28 per million tokens, compared to GPT-4o's $2.50+ per million tokens—roughly 10-20x cheaper. OpenAI's o1 reasoning model costs $60 per million output tokens, while DeepSeek's reasoner costs about $2.
What are the limitations of DeepSeek V3.2?
On "Humanity's Last Exam" (HLE), Gemini 3.0 Pro significantly outperforms DeepSeek (37.7 vs 30.6). Some users report the chat UI experience is underwhelming compared to benchmarks. The model also operates under Chinese regulations, which may affect certain use cases.