AI Comparison

Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in software engineering, achieving leading results on SWE-bench (72.5%) and Terminal-bench (43.2%). Opus 4 supports extended, agentic workflows, handling thousands of task steps continuously for hours without degradation.

Author

Anthropic

Release Date

2025-05-22

Knowledge Cutoff

2025-05-01

License

Proprietary

I/O Format

Context Length

1M / 128K

API I/O (1M)

$15 / $75

How to Use

API Access

Output Speed

34 tok/s

Arena Overall

1424

Intelligence Index

39.0

Coding Index

34.0

Math Index

73.3

LiveBench

—

ForecastBench

60.6

GPQA Diamond

79.6%

HLE

11.7%

MMLU-Pro

87.3%

AIME 2025

73.3%

MATH-500

98.2%

LB Reasoning

—

LB Math

—

LB Data Analysis

—

LiveCodeBench

63.6%

LB Coding

—

LB Agentic

—

TAU2

73.4%

TerminalBench

31.1%

SciCode

39.8%

IFBench

53.7%

AA-LCR

0.3

Hallucination (HHEM)

12.0%

Factual Consistency (HHEM)

88.0%

LB Language

—

LB Instruction Following

—

View Model Details

1 / 3

좌우로 스와이프하여 전환