中文
DeepSeek-V3 Capabilities

DeepSeek-V3 achieves a significant breakthrough in inference speed over previous models.

It tops the leaderboard among open-source models and rivals the most advanced closed-source models globally.

Benchmark (Metric)DeepSeek V3DeepSeek V2.5Qwen2.5Llama3.1Claude-3.5GPT-4o
090572B-Inst405B-InstSonnet-10220513
ArchitectureMoEMoEDenseDense--
# Activated Params 37B21B72B405B--
# Total Params671B 236B72B405B--
EnglishMMLU (EM)88.580.685.388.688.387.2
MMLU-Redux (EM)89.180.385.686.2 88.988.0
MMLU-Pro (EM) 75.966.2 71.673.378.072.6
DROP (3-shot F1)91.687.8 76.7 88.788.383.7
IF-Eval (Prompt Strict) 86.180.6 84.186.0 86.584.3
GPQA-Diamond (Pass@1)59.141.349.051.165.049.9
SimpleQA (Correct)24.910.2 9.117.128.438.2
FRAMES (Acc.)73.365.469.870.072.580.5
LongBench v2 (Acc.)48.735.439.436.141.048.1
CodeHumanEval-Mul (Pass@1)82.677.4 77.377.281.780.5
LiveCodeBench (Pass@1-COT)40.529.231.128.436.333.4
LiveCodeBench (Pass@1)37.628.428.730.132.834.2
Codeforces (Percentile)51.635.624.825.320.323.6
SWE Verified (Resolved)42.022.623.824.5 50.838.8
Aider-Edit (Acc.)79.771.6 65.4 63.9 84.272.9
Aider-Polyglot (Acc.)49.618.2 7.6 5.8 45.3 16.0
MathAIME 2024 (Pass@1)39.216.7 23.3 23.3 16.0 9.3
MATH-500 (EM)90.274.7 80.0 73.878.3 74.6
CNMO 2024 (Pass@1)43.210.8 15.9 6.8 13.110.8
ChineseCLUEWSC (EM)90.990.4 91.484.7 85.4 87.9
C-Eval (EM)86.579.5 86.161.5 76.7 76.0
C-SimpleQA (Correct)64.154.1 48.4 50.4 51.359.3