lmarena.ai排行榜

排名更新时间: 2025-01-19

排名(UB)
排名(StyleCtrl)
模型名
分数
置信区间
票数
服务商

1

1

Gemini-Exp-1206

1374

+5/-4

20,845

Google

2

1

ChatGPT-4o-latest (2024-11-20)

1365

+4/-4

34,030

OpenAI

2

4

Gemini-2.0-Flash-Thinking-Exp-1219

1364

+5/-5

16,357

Google

2

4

Gemini-2.0-Flash-Exp

1357

+5/-5

19,640

Google

4

1

o1-2024-12-17

1352

+7/-7

7,957

OpenAI

6

4

o1-preview

1335

+5/-3

33,197

OpenAI

7

7

DeepSeek-V3

1320

+5/-6

11,591

DeepSeek

7

10

Step-2-16K-Exp

1306

+8/-8

4,028

StepFun

8

11

o1-mini

1305

+4/-3

48,654

OpenAI

8

8

Gemini-1.5-Pro-002

1303

+3/-3

45,203

Google

11

13

Grok-2-08-13

1288

+4/-3

66,281

xAI

11

15

Yi-Lightning

1287

+5/-3

28,959

01 AI

11

10

GPT-4o-2024-05-13

1285

+3/-2

117,760

OpenAI

11

7

Claude 3.5 Sonnet (20241022)

1284

+3/-3

47,437

Anthropic

11

22

Qwen2.5-plus-1127

1282

+7/-6

7,680

Alibaba

11

18

Deepseek-v2.5-1210

1279

+7/-5

7,261

DeepSeek

14

22

Athene-v2-Chat-72B

1277

+5/-4

21,014

NexusFlow

15

21

GLM-4-Plus

1274

+5/-3

27,773

Zhipu AI

15

21

GPT-4o-mini-2024-07-18

1273

+3/-2

60,551

OpenAI

16

24

Gemini-1.5-Flash-002

1271

+3/-3

34,540

Google

16

35

Llama-3.1-Nemotron-70B-Instruct

1269

+6/-6

7,596

Nvidia

18

11

Meta-Llama-3.1-405B-Instruct-bf16

1268

+4/-3

21,285

Meta

20

10

Claude 3.5 Sonnet (20240620)

1268

+2/-3

86,177

Anthropic

20

12

Meta-Llama-3.1-405B-Instruct-fp8

1267

+3/-3

63,202

Meta

20

11

Gemini Advanced App (2024-05-14)

1266

+3/-3

52,148

Google

20

31

Grok-2-Mini-08-13

1266

+3/-3

54,893

xAI

20

12

GPT-4o-2024-08-06

1265

+3/-2

47,981

OpenAI

21

23

Qwen-Max-0919

1263

+4/-4

17,436

Alibaba

28

19

Gemini-1.5-Pro-001

1260

+2/-2

82,432

Google

28

28

Deepseek-v2.5

1258

+3/-4

26,353

DeepSeek

28

33

Qwen2.5-72B-Instruct

1257

+3/-4

39,984

Alibaba

28

20

Llama-3.3-70B-Instruct

1257

+5/-4

15,516

Meta

29

18

GPT-4-Turbo-2024-04-09

1256

+2/-2

102,126

OpenAI

30

23

Mistral-Large-2407

1252

+3/-3

48,207

Mistral

30

30

Athene-70B

1250

+5/-5

20,618

NexusFlow

30

38

Llama-3.1-Tulu-3-70B

1244

+11/-9

3,026

Ai2

33

22

GPT-4-1106-preview

1250

+3/-2

103,732

OpenAI

34

39

Meta-Llama-3.1-70B-Instruct

1248

+3/-3

58,786

Meta

34

20

Claude 3 Opus

1247

+2/-2

202,735

Anthropic

34

39

Amazon Nova Pro 1.0

1244

+5/-5

13,076

Amazon

36

24

GPT-4-0125-preview

1245

+3/-2

97,070

OpenAI

36

37

Mistral-Large-2411

1243

+5/-5

10,634

Mistral

39

20

Claude 3.5 Haiku (20241022)

1237

+7/-4

10,773

Anthropic

40

39

Reka-Core-20240904

1235

+6/-6

7,937

Reka AI

44

42

Gemini-1.5-Flash-001

1227

+3/-2

65,650

Google

45

40

Jamba-1.5-Large

1221

+5/-5

9,125

AI21 Labs

46

41

Gemma-2-27B-it

1220

+3/-3

72,425

Google

46

55

Amazon Nova Lite 1.0

1218

+6/-5

10,868

Amazon

46

48

Qwen2.5-Coder-32B-Instruct

1217

+7/-6

5,731

Alibaba

46

43

Gemma-2-9B-it-SimPO

1216

+5/-6

10,557

Princeton

46

44

Command R+ (08-2024)

1215

+5/-5

10,546

Cohere

46

39

Llama-3.1-Nemotron-51B-Instruct

1211

+7/-8

3,897

Nvidia

46

54

Phi-4

1210

+9/-10

2,485

Microsoft

47

56

Gemini-1.5-Flash-8B-001

1212

+4/-4

36,162

Google

48

55

Aya-Expanse-32B

1209

+4/-4

26,994

Cohere

48

52

GLM-4-0520

1207

+8/-5

10,214

Zhipu AI

49

46

Nemotron-4-340B-Instruct

1209

+4/-3

20,609

Nvidia

49

47

Reka-Flash-20240904

1205

+7/-5

8,130

Reka AI

52

48

Llama-3-70B-Instruct

1206

+2/-3

163,797

Meta

57

46

Claude 3 Sonnet

1201

+2/-2

113,033

Anthropic

57

69

Amazon Nova Micro 1.0

1197

+5/-6

10,956

Amazon

61

57

Gemma-2-9B-it

1191

+3/-3

50,239

Google

61

56

Command R+ (04-2024)

1190

+2/-2

80,868

Cohere

61

69

Hunyuan-Standard-256K

1189

+8/-9

2,899

Tencent

61

70

Llama-3.1-Tulu-3-8B

1185

+11/-10

3,077

Ai2

62

56

Qwen2-72B-Instruct

1187

+3/-3

38,889

Alibaba

62

44

GPT-4-0314

1186

+3/-2

55,965

OpenAI

62

69

Ministral-8B-2410

1182

+7/-7

5,109

Mistral

64

70

Aya-Expanse-8B

1181

+6/-6

8,797

Cohere

64

56

Command R (08-2024)

1180

+5/-5

10,846

Cohere

66

60

Claude 3 Haiku

1179

+2/-2

122,291

Anthropic

66

56

DeepSeek-Coder-V2-Instruct

1178

+5/-5

15,752

DeepSeek AI

66

69

Jamba-1.5-Mini

1176

+5/-6

9,269

AI21 Labs

67

84

Meta-Llama-3.1-8B-Instruct

1176

+3/-3

52,646

Meta

75

55

GPT-4-0613

1163

+2/-3

91,642

OpenAI

75

69

Qwen1.5-110B-Chat

1161

+4/-4

27,467

Alibaba

75

85

Yi-1.5-34B-Chat

1157

+4/-4

25,126

01 AI

75

69

Mistral-Large-2402

1157

+3/-3

64,912

Mistral

75

70

Reka-Flash-21B-online

1156

+5/-4

16,024

Reka AI

75

103

QwQ-32B-Preview

1153

+11/-11

3,415

Alibaba

78

78

Llama-3-8B-Instruct

1152

+2/-2

109,211

Meta

78

90

InternLM2.5-20B-chat

1149

+5/-5

10,599

InternLM

80

74

Command R (04-2024)

1149

+2/-3

56,380

Cohere

80

78

Mistral Medium

1148

+3/-3

35,554

Mistral

80

78

Reka-Flash-21B

1148

+4/-3

25,803

Reka AI

80

72

Mixtral-8x22b-Instruct-v0.1

1148

+2/-3

53,788

Mistral

80

72

Qwen1.5-72B-Chat

1147

+3/-3

40,638

Alibaba

80

81

Granite-3.1-8B-Instruct

1139

+13/-12

2,468

IBM

82

91

Gemma-2-2b-it

1142

+3/-4

41,868

Google

89

72

Gemini-1.0-Pro-001

1131

+4/-4

18,793

Google

89

82

Zephyr-ORPO-141b-A35b-v0.1

1127

+10/-8

4,860

HuggingFace

89

86

Qwen1.5-32B-Chat

1125

+4/-4

22,772

Alibaba

89

93

Granite-3.1-2B-Instruct

1117

+11/-11

2,476

IBM

90

90

Phi-3-Medium-4k-Instruct

1123

+4/-3

26,112

Microsoft

91

103

Starling-LM-7B-beta

1119

+5/-5

16,671

Nexusflow

94

93

Mixtral-8x7B-Instruct-v0.1

1114

+0/-0

76,138

Mistral

94

97

Yi-34B-Chat

1111

+5/-4

15,922

01 AI

94

82

Gemini Pro

1110

+7/-8

6,559

Google

96

96

Qwen1.5-14B-Chat

1109

+3/-5

18,678

Alibaba

96

95

WizardLM-70B-v1.0

1106

+7/-6

8,380

Microsoft

96

82

GPT-3.5-Turbo-0125

1106

+3/-3

68,873

OpenAI

96

101

Meta-Llama-3.2-3B-Instruct

1103

+8/-7

8,411

Meta

97

92

DBRX-Instruct-Preview

1103

+4/-3

33,737

Databricks

97

99

Phi-3-Small-8k-Instruct

1102

+5/-6

18,477

Microsoft

98

102

Tulu-2-DPO-70B

1099

+6/-6

6,663

AllenAI/UW

103

92

Granite-3.0-8B-Instruct

1093

+6/-6

7,005

IBM

103

97

OpenChat-3.5-0106

1091

+5/-5

12,980

OpenChat

104

112

Llama-2-70B-chat

1093

+3/-3

39,634

Meta

105

104

Vicuna-33B

1091

+4/-4

22,950

LMSYS

105

107

Starling-LM-7B-alpha

1088

+7/-4

10,416

UC Berkeley

105

116

Nous-Hermes-2-Mixtral-8x7B-DPO

1084

+11/-10

3,835

NousResearch

106

97

Snowflake Arctic Instruct

1090

+3/-3

34,172

Snowflake

106

112

NV-Llama2-70B-SteerLM-Chat

1081

+10/-8

3,637

Nvidia

107

99

Gemma-1.1-7B-it

1084

+4/-4

25,065

Google

111

101

DeepSeek-LLM-67B-Chat

1077

+7/-7

4,987

DeepSeek AI

112

100

OpenChat-3.5

1076

+5/-8

8,112

OpenChat

112

100

OpenHermes-2.5-Mistral-7B

1074

+8/-6

5,089

NousResearch

112

107

Granite-3.0-2B-Instruct

1074

+7/-8

7,193

IBM

113

118

Mistral-7B-Instruct-v0.2

1072

+4/-4

20,058

Mistral

113

118

Phi-3-Mini-4K-Instruct-June-24

1071

+4/-4

12,818

Microsoft

113

118

Qwen1.5-7B-Chat

1070

+6/-9

4,868

Alibaba

114

93

GPT-3.5-Turbo-1106

1068

+5/-5

17,032

OpenAI

115

122

Phi-3-Mini-4k-Instruct

1066

+4/-4

21,085

Microsoft

115

116

Dolphin-2.2.1-Mistral-7B

1062

+10/-13

1,713

Cognitive Computations

115

117

SOLAR-10.7B-Instruct-v1.0

1062

+10/-10

4,288

Upstage AI

119

123

Llama-2-13b-chat

1063

+5/-5

19,738

Meta

121

118

WizardLM-13b-v1.2

1059

+7/-7

7,178

Microsoft

123

127

CodeLlama-70B-instruct

1041

+21/-17

1,194

Meta

124

128

Meta-Llama-3.2-1B-Instruct

1054

+6/-6

8,535

Meta

124

125

Zephyr-7B-beta

1053

+6/-6

11,334

HuggingFace

124

118

SmolLM2-1.7B-Instruct

1047

+13/-13

2,371

HuggingFace

124

118

MPT-30B-chat

1046

+12/-10

2,648

MosaicML

125

121

Zephyr-7B-alpha

1042

+11/-15

1,814

HuggingFace

127

126

CodeLlama-34B-instruct

1043

+7/-5

7,515

Meta

127

118

falcon-180b-chat

1034

+18/-17

1,327

TII

130

118

Vicuna-13B

1042

+5/-5

19,790

LMSYS

130

126

Gemma-7B-it

1037

+6/-5

9,176

Google

130

127

Phi-3-Mini-128k-Instruct

1037

+4/-4

21,632

Microsoft

130

141

Llama-2-7B-chat

1037

+6/-5

14,555

Meta

130

118

Qwen-14B-Chat

1035

+7/-7

5,070

Alibaba

130

128

Guanaco-33B

1033

+10/-12

2,999

UW

139

132

Gemma-1.1-2b-it

1021

+6/-5

11,348

Google

139

135

StripedHyena-Nous-7B

1018

+9/-7

5,273

Together AI

140

148

OLMo-7B-instruct

1016

+6/-7

6,504

Allen AI

143

140

Mistral-7B-Instruct-v0.1

1008

+6/-6

9,144

Mistral

143

142

Vicuna-7B

1005

+8/-7

7,015

LMSYS

143

129

PaLM-Chat-Bison-001

1004

+8/-5

8,744

Google

148

146

Gemma-2B-it

989

+7/-9

4,922

Google

148

145

Qwen1.5-4B-Chat

988

+6/-6

7,813

Alibaba

150

149

Koala-13B

964

+8/-6

7,034

UC Berkeley

150

150

ChatGLM3-6B

955

+8/-8

4,765

Tsinghua

152

149

GPT4All-13B-Snoozy

932

+13/-15

1,786

Nomic AI

152

150

MPT-7B-Chat

928

+11/-8

4,012

MosaicML

152

155

ChatGLM2-6B

924

+13/-10

2,707

Tsinghua

152

155

RWKV-4-Raven-14B

922

+9/-8

4,934

RWKV

156

150

Alpaca-13B

902

+9/-9

5,876

Stanford

156

156

OpenAssistant-Pythia-12B

893

+7/-8

6,380

OpenAssistant

157

158

ChatGLM-6B

879

+10/-9

4,988

Tsinghua

158

158

FastChat-T5-3B

868

+9/-8

4,302

LMSYS

160

160

StableLM-Tuned-Alpha-7B

840

+11/-11

3,341

Stability AI

160

158

Dolly-V2-12B

822

+11/-12

3,485

Databricks

161

160

LLaMA-13B

800

+12/-14

2,444

Meta

说明

  • 排名(UB):基于 Bradley-Terry 模型的上界排名

  • 排名(StyleCtrl):考虑对话风格的样式控制排名

  • 置信区间:模型表现的置信区间

  • 分数:基于模型性能的竞技场得分

数据来源

最后更新于