lmarena.ai排行榜

排名更新时间: 2025-01-23

排名(UB)
排名(StyleCtrl)
模型名
分数
置信区间
票数
服务商

1

3

Gemini-2.0-Flash-Thinking-Exp-01-21

1382

+8/-6

6,437

Google

1

1

Gemini-Exp-1206

1374

+5/-4

22,116

Google

3

1

ChatGPT-4o-latest (2024-11-20)

1365

+4/-4

35,328

OpenAI

3

1

DeepSeek-R1

1357

+12/-13

1,883

DeepSeek

4

5

Gemini-2.0-Flash-Exp

1356

+4/-4

20,939

Google

4

1

o1-2024-12-17

1352

+6/-6

9,230

OpenAI

7

4

o1-preview

1335

+3/-3

33,186

OpenAI

8

9

DeepSeek-V3

1317

+6/-5

13,640

DeepSeek

8

11

Step-2-16K-Exp

1305

+9/-7

4,533

StepFun

9

12

o1-mini

1305

+2/-3

49,952

OpenAI

9

9

Gemini-1.5-Pro-002

1302

+3/-4

46,621

Google

12

14

Grok-2-08-13

1288

+3/-3

67,150

xAI

12

17

Yi-Lightning

1287

+3/-4

28,955

01 AI

12

10

GPT-4o-2024-05-13

1285

+2/-2

117,745

OpenAI

12

8

Claude 3.5 Sonnet (20241022)

1283

+3/-3

48,847

Anthropic

12

22

Qwen2.5-plus-1127

1283

+5/-7

9,050

Alibaba

13

20

Deepseek-v2.5-1210

1279

+5/-6

7,261

DeepSeek

15

24

Athene-v2-Chat-72B

1276

+4/-5

22,355

NexusFlow

16

22

GLM-4-Plus

1274

+4/-4

27,771

Zhipu AI

16

23

GPT-4o-mini-2024-07-18

1273

+3/-3

61,233

OpenAI

17

24

Gemini-1.5-Flash-002

1271

+4/-3

35,199

Google

17

36

Llama-3.1-Nemotron-70B-Instruct

1269

+6/-5

7,598

Nvidia

18

12

Meta-Llama-3.1-405B-Instruct-bf16

1268

+4/-4

22,703

Meta

19

11

Claude 3.5 Sonnet (20240620)

1268

+2/-3

86,167

Anthropic

19

13

Meta-Llama-3.1-405B-Instruct-fp8

1267

+3/-3

63,187

Meta

21

12

Gemini Advanced App (2024-05-14)

1267

+3/-2

52,145

Google

21

33

Grok-2-Mini-08-13

1266

+3/-3

55,507

xAI

21

14

GPT-4o-2024-08-06

1265

+3/-3

47,975

OpenAI

21

24

Qwen-Max-0919

1263

+6/-5

17,434

Alibaba

29

20

Gemini-1.5-Pro-001

1260

+3/-2

82,433

Google

29

29

Deepseek-v2.5

1258

+4/-4

26,345

DeepSeek

29

36

Qwen2.5-72B-Instruct

1257

+4/-3

40,664

Alibaba

29

20

GPT-4-Turbo-2024-04-09

1256

+2/-2

102,125

OpenAI

29

21

Llama-3.3-70B-Instruct

1256

+4/-6

16,905

Meta

34

24

Mistral-Large-2407

1251

+3/-3

48,205

Mistral

34

34

Athene-70B

1250

+3/-5

20,609

NexusFlow

34

23

GPT-4-1106-preview

1250

+2/-2

103,732

OpenAI

34

40

Meta-Llama-3.1-70B-Instruct

1248

+3/-3

58,785

Meta

34

38

Llama-3.1-Tulu-3-70B

1244

+9/-10

3,031

Ai2

35

21

Claude 3 Opus

1247

+2/-2

202,713

Anthropic

35

40

Amazon Nova Pro 1.0

1243

+6/-4

13,738

Amazon

36

36

Mistral-Large-2411

1244

+5/-5

12,008

Mistral

37

25

GPT-4-0125-preview

1245

+2/-2

97,064

OpenAI

40

21

Claude 3.5 Haiku (20241022)

1239

+6/-7

12,181

Anthropic

41

40

Reka-Core-20240904

1235

+6/-5

7,942

Reka AI

46

43

Gemini-1.5-Flash-001

1227

+3/-2

65,656

Google

46

40

Jamba-1.5-Large

1221

+5/-5

9,125

AI21 Labs

46

48

Qwen2.5-Coder-32B-Instruct

1217

+8/-8

5,730

Alibaba

47

43

Gemma-2-27B-it

1220

+3/-3

73,168

Google

47

56

Amazon Nova Lite 1.0

1219

+5/-6

11,563

Amazon

47

44

Command R+ (08-2024)

1215

+5/-5

10,541

Cohere

47

43

Gemma-2-9B-it-SimPO

1216

+5/-6

10,551

Princeton

47

56

Phi-4

1211

+7/-10

4,039

Microsoft

47

40

Llama-3.1-Nemotron-51B-Instruct

1211

+8/-10

3,898

Nvidia

49

58

Gemini-1.5-Flash-8B-001

1212

+4/-3

36,856

Google

49

47

Nemotron-4-340B-Instruct

1209

+4/-3

20,604

Nvidia

49

56

Aya-Expanse-32B

1209

+4/-4

27,764

Cohere

50

52

GLM-4-0520

1207

+5/-5

10,212

Zhipu AI

50

48

Reka-Flash-20240904

1205

+6/-7

8,131

Reka AI

53

49

Llama-3-70B-Instruct

1206

+2/-2

163,809

Meta

57

48

Claude 3 Sonnet

1201

+2/-3

113,011

Anthropic

58

70

Amazon Nova Micro 1.0

1197

+5/-5

11,643

Amazon

61

70

Hunyuan-Standard-256K

1189

+10/-11

2,900

Tencent

62

58

Gemma-2-9B-it

1191

+4/-3

50,951

Google

62

56

Command R+ (04-2024)

1190

+3/-3

80,852

Cohere

62

71

Llama-3.1-Tulu-3-8B

1185

+9/-8

3,079

Ai2

63

57

Qwen2-72B-Instruct

1187

+3/-3

38,880

Alibaba

63

44

GPT-4-0314

1186

+4/-3

55,956

OpenAI

63

70

Ministral-8B-2410

1182

+9/-6

5,114

Mistral

65

61

Command R (08-2024)

1180

+5/-5

10,846

Cohere

65

71

Aya-Expanse-8B

1179

+7/-8

9,571

Cohere

67

62

Claude 3 Haiku

1179

+2/-2

122,299

Anthropic

67

56

DeepSeek-Coder-V2-Instruct

1178

+4/-5

15,754

DeepSeek AI

67

70

Jamba-1.5-Mini

1176

+6/-6

9,270

AI21 Labs

67

85

Meta-Llama-3.1-8B-Instruct

1176

+3/-2

52,651

Meta

76

56

GPT-4-0613

1163

+3/-3

91,646

OpenAI

76

70

Qwen1.5-110B-Chat

1161

+4/-3

27,466

Alibaba

76

85

Yi-1.5-34B-Chat

1157

+4/-4

25,126

01 AI

76

103

QwQ-32B-Preview

1153

+11/-11

3,412

Alibaba

77

70

Mistral-Large-2402

1157

+2/-3

64,925

Mistral

77

71

Reka-Flash-21B-online

1156

+4/-4

16,034

Reka AI

78

93

InternLM2.5-20B-chat

1149

+6/-6

10,595

InternLM

79

78

Llama-3-8B-Instruct

1152

+2/-2

109,237

Meta

80

79

Granite-3.1-8B-Instruct

1142

+10/-12

3,064

IBM

81

75

Command R (04-2024)

1149

+3/-3

56,375

Cohere

81

77

Mistral Medium

1148

+4/-3

35,554

Mistral

81

76

Reka-Flash-21B

1148

+3/-4

25,798

Reka AI

81

73

Mixtral-8x22b-Instruct-v0.1

1148

+3/-3

53,787

Mistral

81

75

Qwen1.5-72B-Chat

1147

+3/-3

40,630

Alibaba

83

93

Gemma-2-2b-it

1143

+3/-3

42,631

Google

90

73

Gemini-1.0-Pro-001

1131

+4/-4

18,792

Google

90

83

Zephyr-ORPO-141b-A35b-v0.1

1127

+6/-7

4,863

HuggingFace

91

87

Qwen1.5-32B-Chat

1125

+5/-4

22,759

Alibaba

91

96

Granite-3.1-2B-Instruct

1118

+10/-10

3,127

IBM

92

93

Phi-3-Medium-4k-Instruct

1123

+3/-4

26,110

Microsoft

92

105

Starling-LM-7B-beta

1119

+4/-5

16,672

Nexusflow

95

94

Mixtral-8x7B-Instruct-v0.1

1114

+0/-0

76,133

Mistral

95

98

Yi-34B-Chat

1111

+5/-4

15,920

01 AI

95

85

Gemini Pro

1110

+7/-8

6,559

Google

97

96

Qwen1.5-14B-Chat

1109

+5/-4

18,675

Alibaba

97

97

WizardLM-70B-v1.0

1106

+6/-6

8,379

Microsoft

97

83

GPT-3.5-Turbo-0125

1106

+3/-2

68,861

OpenAI

97

102

Meta-Llama-3.2-3B-Instruct

1103

+6/-7

8,403

Meta

99

93

DBRX-Instruct-Preview

1103

+4/-3

33,730

Databricks

99

100

Phi-3-Small-8k-Instruct

1102

+4/-3

18,478

Microsoft

99

101

Tulu-2-DPO-70B

1099

+7/-7

6,664

AllenAI/UW

103

93

Granite-3.0-8B-Instruct

1093

+6/-7

7,002

IBM

105

98

OpenChat-3.5-0106

1092

+6/-6

12,984

OpenChat

106

112

Llama-2-70B-chat

1093

+2/-2

39,634

Meta

106

105

Vicuna-33B

1091

+4/-4

22,950

LMSYS

106

98

Snowflake Arctic Instruct

1090

+3/-3

34,173

Snowflake

106

108

Starling-LM-7B-alpha

1088

+6/-5

10,417

UC Berkeley

106

113

Nous-Hermes-2-Mixtral-8x7B-DPO

1084

+9/-8

3,834

NousResearch

108

99

Gemma-1.1-7B-it

1084

+4/-4

25,059

Google

108

113

NV-Llama2-70B-SteerLM-Chat

1081

+9/-12

3,637

Nvidia

111

102

DeepSeek-LLM-67B-Chat

1077

+9/-7

4,988

DeepSeek AI

113

100

OpenChat-3.5

1076

+7/-8

8,111

OpenChat

113

104

OpenHermes-2.5-Mistral-7B

1074

+9/-7

5,091

NousResearch

113

108

Granite-3.0-2B-Instruct

1074

+7/-7

7,195

IBM

114

119

Mistral-7B-Instruct-v0.2

1072

+5/-4

20,051

Mistral

114

119

Qwen1.5-7B-Chat

1070

+8/-7

4,869

Alibaba

115

119

Phi-3-Mini-4K-Instruct-June-24

1071

+6/-4

12,818

Microsoft

115

94

GPT-3.5-Turbo-1106

1068

+5/-4

17,033

OpenAI

115

123

Phi-3-Mini-4k-Instruct

1066

+4/-5

21,095

Microsoft

115

115

Dolphin-2.2.1-Mistral-7B

1062

+13/-13

1,713

Cognitive Computations

115

118

SOLAR-10.7B-Instruct-v1.0

1062

+8/-9

4,287

Upstage AI

119

124

Llama-2-13b-chat

1063

+4/-4

19,736

Meta

122

119

WizardLM-13b-v1.2

1059

+7/-7

7,177

Microsoft

125

127

Meta-Llama-3.2-1B-Instruct

1054

+7/-6

8,531

Meta

126

125

Zephyr-7B-beta

1053

+6/-6

11,332

HuggingFace

126

123

SmolLM2-1.7B-Instruct

1047

+12/-14

2,370

HuggingFace

126

119

MPT-30B-chat

1046

+11/-12

2,648

MosaicML

126

124

Zephyr-7B-alpha

1042

+13/-16

1,815

HuggingFace

126

125

CodeLlama-70B-instruct

1041

+14/-20

1,193

Meta

128

126

CodeLlama-34B-instruct

1043

+7/-8

7,512

Meta

128

119

falcon-180b-chat

1034

+16/-16

1,326

TII

131

120

Vicuna-13B

1042

+4/-4

19,786

LMSYS

131

126

Gemma-7B-it

1037

+7/-5

9,174

Google

131

127

Phi-3-Mini-128k-Instruct

1037

+5/-4

21,627

Microsoft

131

120

Qwen-14B-Chat

1035

+8/-7

5,072

Alibaba

131

143

Llama-2-7B-chat

1037

+4/-5

14,553

Meta

131

129

Guanaco-33B

1033

+10/-8

3,000

UW

140

132

Gemma-1.1-2b-it

1021

+5/-7

11,346

Google

140

136

StripedHyena-Nous-7B

1017

+7/-8

5,271

Together AI

141

150

OLMo-7B-instruct

1015

+7/-7

6,508

Allen AI

143

141

Mistral-7B-Instruct-v0.1

1008

+5/-6

9,144

Mistral

144

143

Vicuna-7B

1005

+6/-8

7,016

LMSYS

144

132

PaLM-Chat-Bison-001

1004

+6/-5

8,743

Google

148

147

Gemma-2B-it

989

+9/-9

4,921

Google

149

147

Qwen1.5-4B-Chat

988

+5/-6

7,814

Alibaba

151

151

Koala-13B

964

+6/-7

7,036

UC Berkeley

151

151

ChatGLM3-6B

955

+9/-7

4,764

Tsinghua

153

151

GPT4All-13B-Snoozy

932

+11/-14

1,786

Nomic AI

153

151

MPT-7B-Chat

928

+10/-11

4,013

MosaicML

153

156

ChatGLM2-6B

924

+11/-10

2,707

Tsinghua

153

156

RWKV-4-Raven-14B

922

+8/-10

4,934

RWKV

157

151

Alpaca-13B

902

+7/-9

5,876

Stanford

157

157

OpenAssistant-Pythia-12B

894

+6/-8

6,381

OpenAssistant

158

159

ChatGLM-6B

879

+8/-11

4,988

Tsinghua

159

159

FastChat-T5-3B

868

+7/-11

4,299

LMSYS

161

162

StableLM-Tuned-Alpha-7B

841

+9/-12

3,341

Stability AI

161

159

Dolly-V2-12B

822

+8/-11

3,485

Databricks

162

160

LLaMA-13B

800

+14/-14

2,444

Meta

说明

  • 排名(UB):基于 Bradley-Terry 模型的上界排名

  • 排名(StyleCtrl):考虑对话风格的样式控制排名

  • 置信区间:模型表现的置信区间

  • 分数:基于模型性能的竞技场得分

数据来源

最后更新于

这有帮助吗?