LiveBench Coding category score (0–100). Evaluates algorithm implementation, bug fixing, and code comprehension.
Source: LiveBench| Rank | Model | |
|---|---|---|
| #1 | Gemini 3 Flash | 78.6 |
| #2 | Anthropic Claude Opus 4.5 | 78.2 |
| #3 | Anthropic Claude Opus 4.6 | 78.2 |
| #4 | Qwen Qwen3.6 Plus | 78.2 |
| #5 | Moonshot AI Kimi K2.5 | 77.9 |
| #6 | ChatGPT GPT-5.4 | 77.5 |
| #7 | Gemini 3.1 Pro | 76.5 |
| #8 | Anthropic Claude Sonnet 4.5 | 76.1 |
| #9 | ChatGPT GPT-5 Mini | 76.1 |
| #10 | DeepSeek DeepSeek V3.2 | 75.7 |
| #11 | Gemini 2.5 Pro | 75.7 |
| #12 | Z.ai GLM-5.1 | 75.4 |
| #13 | ChatGPT GPT-5.4 Mini | 74.7 |
| #14 | Anthropic Claude Sonnet 4.6 | 74.3 |
| #15 | Z.ai GLM-5 | 73.6 |
| #16 | Anthropic Claude Haiku 4.5 | 72.2 |
| #17 | MiniMax MiniMax M2.5 | 70.7 |
| #18 | Xiaomi MiMo-V2-Pro | 68.8 |
| #19 | Gemini 3.1 Flash Lite | 68.5 |
| #20 | ChatGPT GPT-5 Nano | 67.4 |
| #21 | Gemini 2.5 Flash Lite | 66.4 |
| #22 | Gemini 2.5 Flash | 66.0 |
| #23 | ChatGPT GPT-5.4 Nano | 61.9 |
| #24 | Gemma 4 31B | 60.3 |
| #25 | ChatGPT GPT OSS 120B | 60.2 |
| #26 | MiniMax MiniMax M2.7 | 54.9 |