lmarena.ai排行榜
排名更新时间: 2025-01-23
1
3
Gemini-2.0-Flash-Thinking-Exp-01-21
1382
+8/-6
6,437
1
1
Gemini-Exp-1206
1374
+5/-4
22,116
3
1
ChatGPT-4o-latest (2024-11-20)
1365
+4/-4
35,328
OpenAI
3
1
DeepSeek-R1
1357
+12/-13
1,883
DeepSeek
4
5
Gemini-2.0-Flash-Exp
1356
+4/-4
20,939
4
1
o1-2024-12-17
1352
+6/-6
9,230
OpenAI
7
4
o1-preview
1335
+3/-3
33,186
OpenAI
8
9
DeepSeek-V3
1317
+6/-5
13,640
DeepSeek
8
11
Step-2-16K-Exp
1305
+9/-7
4,533
StepFun
9
12
o1-mini
1305
+2/-3
49,952
OpenAI
9
9
Gemini-1.5-Pro-002
1302
+3/-4
46,621
12
14
Grok-2-08-13
1288
+3/-3
67,150
xAI
12
17
Yi-Lightning
1287
+3/-4
28,955
01 AI
12
10
GPT-4o-2024-05-13
1285
+2/-2
117,745
OpenAI
12
8
Claude 3.5 Sonnet (20241022)
1283
+3/-3
48,847
Anthropic
12
22
Qwen2.5-plus-1127
1283
+5/-7
9,050
Alibaba
13
20
Deepseek-v2.5-1210
1279
+5/-6
7,261
DeepSeek
15
24
Athene-v2-Chat-72B
1276
+4/-5
22,355
NexusFlow
16
22
GLM-4-Plus
1274
+4/-4
27,771
Zhipu AI
16
23
GPT-4o-mini-2024-07-18
1273
+3/-3
61,233
OpenAI
17
24
Gemini-1.5-Flash-002
1271
+4/-3
35,199
17
36
Llama-3.1-Nemotron-70B-Instruct
1269
+6/-5
7,598
Nvidia
18
12
Meta-Llama-3.1-405B-Instruct-bf16
1268
+4/-4
22,703
Meta
19
11
Claude 3.5 Sonnet (20240620)
1268
+2/-3
86,167
Anthropic
19
13
Meta-Llama-3.1-405B-Instruct-fp8
1267
+3/-3
63,187
Meta
21
12
Gemini Advanced App (2024-05-14)
1267
+3/-2
52,145
21
33
Grok-2-Mini-08-13
1266
+3/-3
55,507
xAI
21
14
GPT-4o-2024-08-06
1265
+3/-3
47,975
OpenAI
21
24
Qwen-Max-0919
1263
+6/-5
17,434
Alibaba
29
20
Gemini-1.5-Pro-001
1260
+3/-2
82,433
29
29
Deepseek-v2.5
1258
+4/-4
26,345
DeepSeek
29
36
Qwen2.5-72B-Instruct
1257
+4/-3
40,664
Alibaba
29
20
GPT-4-Turbo-2024-04-09
1256
+2/-2
102,125
OpenAI
29
21
Llama-3.3-70B-Instruct
1256
+4/-6
16,905
Meta
34
24
Mistral-Large-2407
1251
+3/-3
48,205
Mistral
34
34
Athene-70B
1250
+3/-5
20,609
NexusFlow
34
23
GPT-4-1106-preview
1250
+2/-2
103,732
OpenAI
34
40
Meta-Llama-3.1-70B-Instruct
1248
+3/-3
58,785
Meta
34
38
Llama-3.1-Tulu-3-70B
1244
+9/-10
3,031
Ai2
35
21
Claude 3 Opus
1247
+2/-2
202,713
Anthropic
35
40
Amazon Nova Pro 1.0
1243
+6/-4
13,738
Amazon
36
36
Mistral-Large-2411
1244
+5/-5
12,008
Mistral
37
25
GPT-4-0125-preview
1245
+2/-2
97,064
OpenAI
40
21
Claude 3.5 Haiku (20241022)
1239
+6/-7
12,181
Anthropic
41
40
Reka-Core-20240904
1235
+6/-5
7,942
Reka AI
46
43
Gemini-1.5-Flash-001
1227
+3/-2
65,656
46
40
Jamba-1.5-Large
1221
+5/-5
9,125
AI21 Labs
46
48
Qwen2.5-Coder-32B-Instruct
1217
+8/-8
5,730
Alibaba
47
43
Gemma-2-27B-it
1220
+3/-3
73,168
47
56
Amazon Nova Lite 1.0
1219
+5/-6
11,563
Amazon
47
44
Command R+ (08-2024)
1215
+5/-5
10,541
Cohere
47
43
Gemma-2-9B-it-SimPO
1216
+5/-6
10,551
Princeton
47
56
Phi-4
1211
+7/-10
4,039
Microsoft
47
40
Llama-3.1-Nemotron-51B-Instruct
1211
+8/-10
3,898
Nvidia
49
58
Gemini-1.5-Flash-8B-001
1212
+4/-3
36,856
49
47
Nemotron-4-340B-Instruct
1209
+4/-3
20,604
Nvidia
49
56
Aya-Expanse-32B
1209
+4/-4
27,764
Cohere
50
52
GLM-4-0520
1207
+5/-5
10,212
Zhipu AI
50
48
Reka-Flash-20240904
1205
+6/-7
8,131
Reka AI
53
49
Llama-3-70B-Instruct
1206
+2/-2
163,809
Meta
57
48
Claude 3 Sonnet
1201
+2/-3
113,011
Anthropic
58
70
Amazon Nova Micro 1.0
1197
+5/-5
11,643
Amazon
61
70
Hunyuan-Standard-256K
1189
+10/-11
2,900
Tencent
62
58
Gemma-2-9B-it
1191
+4/-3
50,951
62
56
Command R+ (04-2024)
1190
+3/-3
80,852
Cohere
62
71
Llama-3.1-Tulu-3-8B
1185
+9/-8
3,079
Ai2
63
57
Qwen2-72B-Instruct
1187
+3/-3
38,880
Alibaba
63
44
GPT-4-0314
1186
+4/-3
55,956
OpenAI
63
70
Ministral-8B-2410
1182
+9/-6
5,114
Mistral
65
61
Command R (08-2024)
1180
+5/-5
10,846
Cohere
65
71
Aya-Expanse-8B
1179
+7/-8
9,571
Cohere
67
62
Claude 3 Haiku
1179
+2/-2
122,299
Anthropic
67
56
DeepSeek-Coder-V2-Instruct
1178
+4/-5
15,754
DeepSeek AI
67
70
Jamba-1.5-Mini
1176
+6/-6
9,270
AI21 Labs
67
85
Meta-Llama-3.1-8B-Instruct
1176
+3/-2
52,651
Meta
76
56
GPT-4-0613
1163
+3/-3
91,646
OpenAI
76
70
Qwen1.5-110B-Chat
1161
+4/-3
27,466
Alibaba
76
85
Yi-1.5-34B-Chat
1157
+4/-4
25,126
01 AI
76
103
QwQ-32B-Preview
1153
+11/-11
3,412
Alibaba
77
70
Mistral-Large-2402
1157
+2/-3
64,925
Mistral
77
71
Reka-Flash-21B-online
1156
+4/-4
16,034
Reka AI
78
93
InternLM2.5-20B-chat
1149
+6/-6
10,595
InternLM
79
78
Llama-3-8B-Instruct
1152
+2/-2
109,237
Meta
80
79
Granite-3.1-8B-Instruct
1142
+10/-12
3,064
IBM
81
75
Command R (04-2024)
1149
+3/-3
56,375
Cohere
81
77
Mistral Medium
1148
+4/-3
35,554
Mistral
81
76
Reka-Flash-21B
1148
+3/-4
25,798
Reka AI
81
73
Mixtral-8x22b-Instruct-v0.1
1148
+3/-3
53,787
Mistral
81
75
Qwen1.5-72B-Chat
1147
+3/-3
40,630
Alibaba
83
93
Gemma-2-2b-it
1143
+3/-3
42,631
90
73
Gemini-1.0-Pro-001
1131
+4/-4
18,792
90
83
Zephyr-ORPO-141b-A35b-v0.1
1127
+6/-7
4,863
HuggingFace
91
87
Qwen1.5-32B-Chat
1125
+5/-4
22,759
Alibaba
91
96
Granite-3.1-2B-Instruct
1118
+10/-10
3,127
IBM
92
93
Phi-3-Medium-4k-Instruct
1123
+3/-4
26,110
Microsoft
92
105
Starling-LM-7B-beta
1119
+4/-5
16,672
Nexusflow
95
94
Mixtral-8x7B-Instruct-v0.1
1114
+0/-0
76,133
Mistral
95
98
Yi-34B-Chat
1111
+5/-4
15,920
01 AI
95
85
Gemini Pro
1110
+7/-8
6,559
97
96
Qwen1.5-14B-Chat
1109
+5/-4
18,675
Alibaba
97
97
WizardLM-70B-v1.0
1106
+6/-6
8,379
Microsoft
97
83
GPT-3.5-Turbo-0125
1106
+3/-2
68,861
OpenAI
97
102
Meta-Llama-3.2-3B-Instruct
1103
+6/-7
8,403
Meta
99
93
DBRX-Instruct-Preview
1103
+4/-3
33,730
Databricks
99
100
Phi-3-Small-8k-Instruct
1102
+4/-3
18,478
Microsoft
99
101
Tulu-2-DPO-70B
1099
+7/-7
6,664
AllenAI/UW
103
93
Granite-3.0-8B-Instruct
1093
+6/-7
7,002
IBM
105
98
OpenChat-3.5-0106
1092
+6/-6
12,984
OpenChat
106
112
Llama-2-70B-chat
1093
+2/-2
39,634
Meta
106
105
Vicuna-33B
1091
+4/-4
22,950
LMSYS
106
98
Snowflake Arctic Instruct
1090
+3/-3
34,173
Snowflake
106
108
Starling-LM-7B-alpha
1088
+6/-5
10,417
UC Berkeley
106
113
Nous-Hermes-2-Mixtral-8x7B-DPO
1084
+9/-8
3,834
NousResearch
108
99
Gemma-1.1-7B-it
1084
+4/-4
25,059
108
113
NV-Llama2-70B-SteerLM-Chat
1081
+9/-12
3,637
Nvidia
111
102
DeepSeek-LLM-67B-Chat
1077
+9/-7
4,988
DeepSeek AI
113
100
OpenChat-3.5
1076
+7/-8
8,111
OpenChat
113
104
OpenHermes-2.5-Mistral-7B
1074
+9/-7
5,091
NousResearch
113
108
Granite-3.0-2B-Instruct
1074
+7/-7
7,195
IBM
114
119
Mistral-7B-Instruct-v0.2
1072
+5/-4
20,051
Mistral
114
119
Qwen1.5-7B-Chat
1070
+8/-7
4,869
Alibaba
115
119
Phi-3-Mini-4K-Instruct-June-24
1071
+6/-4
12,818
Microsoft
115
94
GPT-3.5-Turbo-1106
1068
+5/-4
17,033
OpenAI
115
123
Phi-3-Mini-4k-Instruct
1066
+4/-5
21,095
Microsoft
115
115
Dolphin-2.2.1-Mistral-7B
1062
+13/-13
1,713
Cognitive Computations
115
118
SOLAR-10.7B-Instruct-v1.0
1062
+8/-9
4,287
Upstage AI
119
124
Llama-2-13b-chat
1063
+4/-4
19,736
Meta
122
119
WizardLM-13b-v1.2
1059
+7/-7
7,177
Microsoft
125
127
Meta-Llama-3.2-1B-Instruct
1054
+7/-6
8,531
Meta
126
125
Zephyr-7B-beta
1053
+6/-6
11,332
HuggingFace
126
123
SmolLM2-1.7B-Instruct
1047
+12/-14
2,370
HuggingFace
126
119
MPT-30B-chat
1046
+11/-12
2,648
MosaicML
126
124
Zephyr-7B-alpha
1042
+13/-16
1,815
HuggingFace
126
125
CodeLlama-70B-instruct
1041
+14/-20
1,193
Meta
128
126
CodeLlama-34B-instruct
1043
+7/-8
7,512
Meta
128
119
falcon-180b-chat
1034
+16/-16
1,326
TII
131
120
Vicuna-13B
1042
+4/-4
19,786
LMSYS
131
126
Gemma-7B-it
1037
+7/-5
9,174
131
127
Phi-3-Mini-128k-Instruct
1037
+5/-4
21,627
Microsoft
131
120
Qwen-14B-Chat
1035
+8/-7
5,072
Alibaba
131
143
Llama-2-7B-chat
1037
+4/-5
14,553
Meta
131
129
Guanaco-33B
1033
+10/-8
3,000
UW
140
132
Gemma-1.1-2b-it
1021
+5/-7
11,346
140
136
StripedHyena-Nous-7B
1017
+7/-8
5,271
Together AI
141
150
OLMo-7B-instruct
1015
+7/-7
6,508
Allen AI
143
141
Mistral-7B-Instruct-v0.1
1008
+5/-6
9,144
Mistral
144
143
Vicuna-7B
1005
+6/-8
7,016
LMSYS
144
132
PaLM-Chat-Bison-001
1004
+6/-5
8,743
148
147
Gemma-2B-it
989
+9/-9
4,921
149
147
Qwen1.5-4B-Chat
988
+5/-6
7,814
Alibaba
151
151
Koala-13B
964
+6/-7
7,036
UC Berkeley
151
151
ChatGLM3-6B
955
+9/-7
4,764
Tsinghua
153
151
GPT4All-13B-Snoozy
932
+11/-14
1,786
Nomic AI
153
151
MPT-7B-Chat
928
+10/-11
4,013
MosaicML
153
156
ChatGLM2-6B
924
+11/-10
2,707
Tsinghua
153
156
RWKV-4-Raven-14B
922
+8/-10
4,934
RWKV
157
151
Alpaca-13B
902
+7/-9
5,876
Stanford
157
157
OpenAssistant-Pythia-12B
894
+6/-8
6,381
OpenAssistant
158
159
ChatGLM-6B
879
+8/-11
4,988
Tsinghua
159
159
FastChat-T5-3B
868
+7/-11
4,299
LMSYS
161
162
StableLM-Tuned-Alpha-7B
841
+9/-12
3,341
Stability AI
161
159
Dolly-V2-12B
822
+8/-11
3,485
Databricks
162
160
LLaMA-13B
800
+14/-14
2,444
Meta
说明
排名(UB):基于 Bradley-Terry 模型的上界排名
排名(StyleCtrl):考虑对话风格的样式控制排名
置信区间:模型表现的置信区间
分数:基于模型性能的竞技场得分
数据来源
数据来自 lmarena.ai
最后更新于
这有帮助吗?