Anthropic

Claude Opus 4

Model IDclaude-opus-4-20250514

2025-05-22Proprietary Model

Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in software engineering, achieving leading results on SWE-bench (72.5%) and Terminal-bench (43.2%). Opus 4 supports extended, agentic workflows, handling thousands of task steps continuously for hours without degradation.

API

Knowledge Cutoff

2025-01-31

The date this AI finished learning. It may not know about things that happened after this date.

Input → Output Format

The types of content this AI can receive, and what it can produce in return.

Context Memory

200KIN32KOUT

The maximum amount of text the AI can read and process in a single request. A larger number means it can handle longer documents or conversations.

Cost/1M Words

$15IN$75OUT

The cost of using this AI directly in your own application. Shown in USD per 1 million units of text (tokens).

Source:Official Docs OpenRouter

AI Performance Evaluation

Arena Overall Score

1424

±4

As of 2026-04-07

Overall Rank

No.57

37,201 Votes

Arena by Ability

Hard Prompts

1456±6No.44

Expert Knowledge

1448±14No.55

Instruction Following

1442±7No.28

Conversation Memory

1437±8No.48

Creative

1431±9No.26

Coding

1498±8No.30

Math

1418±12No.61

Arena by Occupation

Creative Writing

1429±7No.32

Social Sciences

1440±8No.61

Media

1420±8No.33

Business

1412±8No.71

Healthcare

1447±13No.58

Legal

1435±12No.57

Software

1466±6No.45

Mathematics

1423±13No.62

Source:Arena Intelligence

Overall

AA Intelligence Index

39%↑1%

ForecastBench

61%↑1%

Reasoning & Math

AA Math Index

73%↑0%

GPQA Diamond

80%↓1%

HLE

12%↓5%

MMLU-Pro

87%↑6%

AIME 2025

73%↑0%

MATH-500

98%↑5%

Coding

AA Coding Index

34%↑0%

LiveCodeBench

64%↓1%

TAU2

73%↑2%

TerminalBench

31%↑0%

SciCode

40%↓1%

Language & Instructions

IFBench

54%↓2%

AA-LCR

34%↓28%

Hallucination (HHEM)

12%↑1%

Factual (HHEM)

88%↓1%

Output Speed

Standard Mode

34tok/s↓45

First Output 1.33s

Reasoning Mode

36tok/s↓95

First Output 7.18s

Source:Artificial Analysis ForecastBench Vectara HHEM

← Back to AI Models