DeepSeek V3: Training a SOTA AI Model for Just $5.5M

Dec 28, 2024

DeepSeek has released their V3 model - a 671B parameter MoE architecture that activates 37B parameters and was trained on 14.8T high-quality tokens. The most striking aspect? It achieved this with just $5.57M in training costs.

Unprecedented Cost Efficiency

Key metrics:

Training time: 280K GPU hours (vs. Llama 3's 30.8M GPU hours)
Total cost: $5.57M (vs. $760K for just 7B Llama 2)
Performance: Competitive with GPT-4 and Claude 3.5 Sonnet

Former OpenAI researcher Andrej Karpathy noted that achieving this level of performance typically requires about 16,000 GPUs, with some clusters now reaching 100,000 GPUs.

Technical Innovation

DeepSeek V3's efficiency stems from several innovations:

256 routed experts + 1 shared expert
8 activated experts per token
Maximum 4-node token distribution
Novel load balancing without auxiliary loss
FP8 mixed precision training framework

Real-World Performance

Practical advantages:

3x faster generation (60 tokens/second)
API pricing at 1/53rd of Claude 3.5 Sonnet
Input: ¥0.5-2 per million tokens
Output: ¥8 per million tokens

Community Response

The AI community has embraced V3's accessibility:

Developers running it on clusters of M4 Mac minis
Creation of AI-powered games and applications
Stability AI's former CEO noted it costs just $2/day to run continuously

Training Details

Notable optimizations:

DualPipe pipeline parallelism
Efficient cross-node communication
Knowledge distillation from long-chain reasoning models
Redundant expert deployment for load balancing

The model is available at:

Chat interface: chat.deepseek.com
Technical documentation: github.com/deepseek-ai/DeepSeek-V3
Model download: huggingface.co/deepseek-ai/DeepSeek-V3

Discussion about this post

No posts

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts