Anthropic
Anthropic

Claude Opus 4

Model ID
2025-05-22Proprietary Model

Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in software engineering, achieving leading results on SWE-bench (72.5%) and Terminal-bench (43.2%). Opus 4 supports extended, agentic workflows, handling thousands of task steps continuously for hours without degradation.

API
Knowledge Cutoff
2025-01-31
Input → Output Format
Context Memory
200KIN32KOUT
Cost/1M Words
$15IN$75OUT

AI Performance Evaluation

Arena Overall Score
1424
±4
As of 2026-04-07
Overall Rank
No.57
37,201 Votes
Arena by Ability
Hard Prompts
1456±6No.44
Expert Knowledge
1448±14No.55
Instruction Following
1442±7No.28
Conversation Memory
1437±8No.48
Creative
1431±9No.26
Coding
1498±8No.30
Math
1418±12No.61
Arena by Occupation
Creative Writing
1429±7No.32
Social Sciences
1440±8No.61
Media
1420±8No.33
Business
1412±8No.71
Healthcare
1447±13No.58
Legal
1435±12No.57
Software
1466±6No.45
Mathematics
1423±13No.62
Overall
AA Intelligence Index
39%↑1%
ForecastBench
61%↑1%
Reasoning & Math
AA Math Index
73%↑0%
GPQA Diamond
80%↓1%
HLE
12%↓5%
MMLU-Pro
87%↑6%
AIME 2025
73%↑0%
MATH-500
98%↑5%
Coding
AA Coding Index
34%↑0%
LiveCodeBench
64%↓1%
TAU2
73%↑2%
TerminalBench
31%↑0%
SciCode
40%↓1%
Language & Instructions
IFBench
54%↓2%
AA-LCR
34%↓28%
Hallucination (HHEM)
12%↑1%
Factual (HHEM)
88%↓1%
Output Speed
Standard Mode
34tok/s↓45
First Output 1.33s
Reasoning Mode
36tok/s↓95
First Output 7.18s