Back to Leaderboard

Claude Sonnet 4.5

Anthropic

Rank #1 of 8 models

87.1%

+2.2 vs avg

Coverage
85.0%+6.2 vs avg
Validity
89.2%-1.8 vs avg
Local Score
84.3%-0.3 vs avg
Cross-File
90.7%+5.0 vs avg

Score Distribution

Performance by Language

Category Comparison

Local Logic
84.3%
Cross-File
90.7%

Judge Analysis (Sonnet vs GPT)

Latency (p50 / p90 / p99)

10ms

p50

41.3s

p90

51.2s

p99

GLM-5
6ms
Gemini 2.5 Pro
8ms
Kimi K2.5
8ms
Claude Haiku 4.5
8ms
Gemini 3 Flash
8ms
Claude Sonnet 4.5
10ms
Gemini 3.1 Pro
21ms
GPT-5.2
19.3s
Pass Rate
45.3%
Parse Rate
45.3%
Tests
75
Errors
41

Sample Traces (10 of 34)

View all in Explorer →