1 / 1

lmarena.ai排行榜

排名更新时间: 2025-01-19

排名(UB)

排名(StyleCtrl)

模型名

分数

置信区间

票数

服务商

Gemini-Exp-1206

1374

+5/-4

20,845

Google

ChatGPT-4o-latest (2024-11-20)

1365

+4/-4

34,030

OpenAI

Gemini-2.0-Flash-Thinking-Exp-1219

1364

+5/-5

16,357

Google

Gemini-2.0-Flash-Exp

1357

+5/-5

19,640

Google

o1-2024-12-17

1352

+7/-7

7,957

OpenAI

o1-preview

1335

+5/-3

33,197

OpenAI

DeepSeek-V3

1320

+5/-6

11,591

DeepSeek

Step-2-16K-Exp

1306

+8/-8

4,028

StepFun

o1-mini

1305

+4/-3

48,654

OpenAI

Gemini-1.5-Pro-002

1303

+3/-3

45,203

Google

Grok-2-08-13

1288

+4/-3

66,281

xAI

Yi-Lightning

1287

+5/-3

28,959

01 AI

GPT-4o-2024-05-13

1285

+3/-2

117,760

OpenAI

Claude 3.5 Sonnet (20241022)

1284

+3/-3

47,437

Anthropic

Qwen2.5-plus-1127

1282

+7/-6

7,680

Alibaba

Deepseek-v2.5-1210

1279

+7/-5

7,261

DeepSeek

Athene-v2-Chat-72B

1277

+5/-4

21,014

NexusFlow

GLM-4-Plus

1274

+5/-3

27,773

Zhipu AI

GPT-4o-mini-2024-07-18

1273

+3/-2

60,551

OpenAI

Gemini-1.5-Flash-002

1271

+3/-3

34,540

Google

Llama-3.1-Nemotron-70B-Instruct

1269

+6/-6

7,596

Nvidia

Meta-Llama-3.1-405B-Instruct-bf16

1268

+4/-3

21,285

Meta

Claude 3.5 Sonnet (20240620)

1268

+2/-3

86,177

Anthropic

Meta-Llama-3.1-405B-Instruct-fp8

1267

+3/-3

63,202

Meta

Gemini Advanced App (2024-05-14)

1266

+3/-3

52,148

Google

Grok-2-Mini-08-13

1266

+3/-3

54,893

xAI

GPT-4o-2024-08-06

1265

+3/-2

47,981

OpenAI

Qwen-Max-0919

1263

+4/-4

17,436

Alibaba

Gemini-1.5-Pro-001

1260

+2/-2

82,432

Google

Deepseek-v2.5

1258

+3/-4

26,353

DeepSeek

Qwen2.5-72B-Instruct

1257

+3/-4

39,984

Alibaba

Llama-3.3-70B-Instruct

1257

+5/-4

15,516

Meta

GPT-4-Turbo-2024-04-09

1256

+2/-2

102,126

OpenAI

Mistral-Large-2407

1252

+3/-3

48,207

Mistral

Athene-70B

1250

+5/-5

20,618

NexusFlow

Llama-3.1-Tulu-3-70B

1244

+11/-9

3,026

Ai2

GPT-4-1106-preview

1250

+3/-2

103,732

OpenAI

Meta-Llama-3.1-70B-Instruct

1248

+3/-3

58,786

Meta

Claude 3 Opus

1247

+2/-2

202,735

Anthropic

Amazon Nova Pro 1.0

1244

+5/-5

13,076

Amazon

GPT-4-0125-preview

1245

+3/-2

97,070

OpenAI

Mistral-Large-2411

1243

+5/-5

10,634

Mistral

Claude 3.5 Haiku (20241022)

1237

+7/-4

10,773

Anthropic

Reka-Core-20240904

1235

+6/-6

7,937

Reka AI

Gemini-1.5-Flash-001

1227

+3/-2

65,650

Google

Jamba-1.5-Large

1221

+5/-5

9,125

AI21 Labs

Gemma-2-27B-it

1220

+3/-3

72,425

Google

Amazon Nova Lite 1.0

1218

+6/-5

10,868

Amazon

Qwen2.5-Coder-32B-Instruct

1217

+7/-6

5,731

Alibaba

Gemma-2-9B-it-SimPO

1216

+5/-6

10,557

Princeton

Command R+ (08-2024)

1215

+5/-5

10,546

Cohere

Llama-3.1-Nemotron-51B-Instruct

1211

+7/-8

3,897

Nvidia

Phi-4

1210

+9/-10

2,485

Microsoft

Gemini-1.5-Flash-8B-001

1212

+4/-4

36,162

Google

Aya-Expanse-32B

1209

+4/-4

26,994

Cohere

GLM-4-0520

1207

+8/-5

10,214

Zhipu AI

Nemotron-4-340B-Instruct

1209

+4/-3

20,609

Nvidia

Reka-Flash-20240904

1205

+7/-5

8,130

Reka AI

Llama-3-70B-Instruct

1206

+2/-3

163,797

Meta

Claude 3 Sonnet

1201

+2/-2

113,033

Anthropic

Amazon Nova Micro 1.0

1197

+5/-6

10,956

Amazon

Gemma-2-9B-it

1191

+3/-3

50,239

Google

Command R+ (04-2024)

1190

+2/-2

80,868

Cohere

Hunyuan-Standard-256K

1189

+8/-9

2,899

Tencent

Llama-3.1-Tulu-3-8B

1185

+11/-10

3,077

Ai2

Qwen2-72B-Instruct

1187

+3/-3

38,889

Alibaba

GPT-4-0314

1186

+3/-2

55,965

OpenAI

Ministral-8B-2410

1182

+7/-7

5,109

Mistral

Aya-Expanse-8B

1181

+6/-6

8,797

Cohere

Command R (08-2024)

1180

+5/-5

10,846

Cohere

Claude 3 Haiku

1179

+2/-2

122,291

Anthropic

DeepSeek-Coder-V2-Instruct

1178

+5/-5

15,752

DeepSeek AI

Jamba-1.5-Mini

1176

+5/-6

9,269

AI21 Labs

Meta-Llama-3.1-8B-Instruct

1176

+3/-3

52,646

Meta

GPT-4-0613

1163

+2/-3

91,642

OpenAI

Qwen1.5-110B-Chat

1161

+4/-4

27,467

Alibaba

Yi-1.5-34B-Chat

1157

+4/-4

25,126

01 AI

Mistral-Large-2402

1157

+3/-3

64,912

Mistral

Reka-Flash-21B-online

1156

+5/-4

16,024

Reka AI

103

QwQ-32B-Preview

1153

+11/-11

3,415

Alibaba

Llama-3-8B-Instruct

1152

+2/-2

109,211

Meta

InternLM2.5-20B-chat

1149

+5/-5

10,599

InternLM

Command R (04-2024)

1149

+2/-3

56,380

Cohere

Mistral Medium

1148

+3/-3

35,554

Mistral

Reka-Flash-21B

1148

+4/-3

25,803

Reka AI

Mixtral-8x22b-Instruct-v0.1

1148

+2/-3

53,788

Mistral

Qwen1.5-72B-Chat

1147

+3/-3

40,638

Alibaba

Granite-3.1-8B-Instruct

1139

+13/-12

2,468

IBM

Gemma-2-2b-it

1142

+3/-4

41,868

Google

Gemini-1.0-Pro-001

1131

+4/-4

18,793

Google

Zephyr-ORPO-141b-A35b-v0.1

1127

+10/-8

4,860

HuggingFace

Qwen1.5-32B-Chat

1125

+4/-4

22,772

Alibaba

Granite-3.1-2B-Instruct

1117

+11/-11

2,476

IBM

Phi-3-Medium-4k-Instruct

1123

+4/-3

26,112

Microsoft

103

Starling-LM-7B-beta

1119

+5/-5

16,671

Nexusflow

Mixtral-8x7B-Instruct-v0.1

1114

+0/-0

76,138

Mistral

Yi-34B-Chat

1111

+5/-4

15,922

01 AI

Gemini Pro

1110

+7/-8

6,559

Google

Qwen1.5-14B-Chat

1109

+3/-5

18,678

Alibaba

WizardLM-70B-v1.0

1106

+7/-6

8,380

Microsoft

GPT-3.5-Turbo-0125

1106

+3/-3

68,873

OpenAI

101

Meta-Llama-3.2-3B-Instruct

1103

+8/-7

8,411

Meta

DBRX-Instruct-Preview

1103

+4/-3

33,737

Databricks

Phi-3-Small-8k-Instruct

1102

+5/-6

18,477

Microsoft

102

Tulu-2-DPO-70B

1099

+6/-6

6,663

AllenAI/UW

103

Granite-3.0-8B-Instruct

1093

+6/-6

7,005

IBM

103

OpenChat-3.5-0106

1091

+5/-5

12,980

OpenChat

104

112

Llama-2-70B-chat

1093

+3/-3

39,634

Meta

105

104

Vicuna-33B

1091

+4/-4

22,950

LMSYS

105

107

Starling-LM-7B-alpha

1088

+7/-4

10,416

UC Berkeley

105

116

Nous-Hermes-2-Mixtral-8x7B-DPO

1084

+11/-10

3,835

NousResearch

106

Snowflake Arctic Instruct

1090

+3/-3

34,172

Snowflake

106

112

NV-Llama2-70B-SteerLM-Chat

1081

+10/-8

3,637

Nvidia

107

Gemma-1.1-7B-it

1084

+4/-4

25,065

Google

111

101

DeepSeek-LLM-67B-Chat

1077

+7/-7

4,987

DeepSeek AI

112

100

OpenChat-3.5

1076

+5/-8

8,112

OpenChat

112

100

OpenHermes-2.5-Mistral-7B

1074

+8/-6

5,089

NousResearch

112

107

Granite-3.0-2B-Instruct

1074

+7/-8

7,193

IBM

113

118

Mistral-7B-Instruct-v0.2

1072

+4/-4

20,058

Mistral

113

118

Phi-3-Mini-4K-Instruct-June-24

1071

+4/-4

12,818

Microsoft

113

118

Qwen1.5-7B-Chat

1070

+6/-9

4,868

Alibaba

114

GPT-3.5-Turbo-1106

1068

+5/-5

17,032

OpenAI

115

122

Phi-3-Mini-4k-Instruct

1066

+4/-4

21,085

Microsoft

115

116

Dolphin-2.2.1-Mistral-7B

1062

+10/-13

1,713

Cognitive Computations

115

117

SOLAR-10.7B-Instruct-v1.0

1062

+10/-10

4,288

Upstage AI

119

123

Llama-2-13b-chat

1063

+5/-5

19,738

Meta

121

118

WizardLM-13b-v1.2

1059

+7/-7

7,178

Microsoft

123

127

CodeLlama-70B-instruct

1041

+21/-17

1,194

Meta

124

128

Meta-Llama-3.2-1B-Instruct

1054

+6/-6

8,535

Meta

124

125

Zephyr-7B-beta

1053

+6/-6

11,334

HuggingFace

124

118

SmolLM2-1.7B-Instruct

1047

+13/-13

2,371

HuggingFace

124

118

MPT-30B-chat

1046

+12/-10

2,648

MosaicML

125

121

Zephyr-7B-alpha

1042

+11/-15

1,814

HuggingFace

127

126

CodeLlama-34B-instruct

1043

+7/-5

7,515

Meta

127

118

falcon-180b-chat

1034

+18/-17

1,327

TII

130

118

Vicuna-13B

1042

+5/-5

19,790

LMSYS

130

126

Gemma-7B-it

1037

+6/-5

9,176

Google

130

127

Phi-3-Mini-128k-Instruct

1037

+4/-4

21,632

Microsoft

130

141

Llama-2-7B-chat

1037

+6/-5

14,555

Meta

130

118

Qwen-14B-Chat

1035

+7/-7

5,070

Alibaba

130

128

Guanaco-33B

1033

+10/-12

2,999

139

132

Gemma-1.1-2b-it

1021

+6/-5

11,348

Google

139

135

StripedHyena-Nous-7B

1018

+9/-7

5,273

Together AI

140

148

OLMo-7B-instruct

1016

+6/-7

6,504

Allen AI

143

140

Mistral-7B-Instruct-v0.1

1008

+6/-6

9,144

Mistral

143

142

Vicuna-7B

1005

+8/-7

7,015

LMSYS

143

129

PaLM-Chat-Bison-001

1004

+8/-5

8,744

Google

148

146

Gemma-2B-it

989

+7/-9

4,922

Google

148

145

Qwen1.5-4B-Chat

988

+6/-6

7,813

Alibaba

150

149

Koala-13B

964

+8/-6

7,034

UC Berkeley

150

ChatGLM3-6B

955

+8/-8

4,765

Tsinghua

152

149

GPT4All-13B-Snoozy

932

+13/-15

1,786

Nomic AI

152

150

MPT-7B-Chat

928

+11/-8

4,012

MosaicML

152

155

ChatGLM2-6B

924

+13/-10

2,707

Tsinghua

152

155

RWKV-4-Raven-14B

922

+9/-8

4,934

RWKV

156

150

Alpaca-13B

902

+9/-9

5,876

Stanford

156

OpenAssistant-Pythia-12B

893

+7/-8

6,380

OpenAssistant

157

158

ChatGLM-6B

879

+10/-9

4,988

Tsinghua

158

FastChat-T5-3B

868

+9/-8

4,302

LMSYS

160

StableLM-Tuned-Alpha-7B

840

+11/-11

3,341

Stability AI

160

158

Dolly-V2-12B

822

+11/-12

3,485

Databricks

161

160

LLaMA-13B

800

+12/-14

2,444

Meta

说明

排名(UB)：基于 Bradley-Terry 模型的上界排名
排名(StyleCtrl)：考虑对话风格的样式控制排名
置信区间：模型表现的置信区间
分数：基于模型性能的竞技场得分

数据来源

数据来自 lmarena.ai

lmarena.ai排行榜

排名更新时间: 2025-01-19

排名(UB)

排名(StyleCtrl)

模型名

分数

置信区间

票数

服务商

Gemini-Exp-1206

1374

+5/-4

20,845

Google

ChatGPT-4o-latest (2024-11-20)

1365

+4/-4

34,030

OpenAI

Gemini-2.0-Flash-Thinking-Exp-1219

1364

+5/-5

16,357

Google

Gemini-2.0-Flash-Exp

1357

+5/-5

19,640

Google

o1-2024-12-17

1352

+7/-7

7,957

OpenAI

o1-preview

1335

+5/-3

33,197

OpenAI

DeepSeek-V3

1320

+5/-6

11,591

DeepSeek

Step-2-16K-Exp

1306

+8/-8

4,028

StepFun

o1-mini

1305

+4/-3

48,654

OpenAI

Gemini-1.5-Pro-002

1303

+3/-3

45,203

Google

Grok-2-08-13

1288

+4/-3

66,281

xAI

Yi-Lightning

1287

+5/-3

28,959

01 AI

GPT-4o-2024-05-13

1285

+3/-2

117,760

OpenAI

Claude 3.5 Sonnet (20241022)

1284

+3/-3

47,437

Anthropic

Qwen2.5-plus-1127

1282

+7/-6

7,680

Alibaba

Deepseek-v2.5-1210

1279

+7/-5

7,261

DeepSeek

Athene-v2-Chat-72B

1277

+5/-4

21,014

NexusFlow

GLM-4-Plus

1274

+5/-3

27,773

Zhipu AI

GPT-4o-mini-2024-07-18

1273

+3/-2

60,551

OpenAI

Gemini-1.5-Flash-002

1271

+3/-3

34,540

Google

Llama-3.1-Nemotron-70B-Instruct

1269

+6/-6

7,596

Nvidia

Meta-Llama-3.1-405B-Instruct-bf16

1268

+4/-3

21,285

Meta

Claude 3.5 Sonnet (20240620)

1268

+2/-3

86,177

Anthropic

Meta-Llama-3.1-405B-Instruct-fp8

1267

+3/-3

63,202

Meta

Gemini Advanced App (2024-05-14)

1266

+3/-3

52,148

Google

Grok-2-Mini-08-13

1266

+3/-3

54,893

xAI

GPT-4o-2024-08-06

1265

+3/-2

47,981

OpenAI

Qwen-Max-0919

1263

+4/-4

17,436

Alibaba

Gemini-1.5-Pro-001

1260

+2/-2

82,432

Google

Deepseek-v2.5

1258

+3/-4

26,353

DeepSeek

Qwen2.5-72B-Instruct

1257

+3/-4

39,984

Alibaba

Llama-3.3-70B-Instruct

1257

+5/-4

15,516

Meta

GPT-4-Turbo-2024-04-09

1256

+2/-2

102,126

OpenAI

Mistral-Large-2407

1252

+3/-3

48,207

Mistral

Athene-70B

1250

+5/-5

20,618

NexusFlow

Llama-3.1-Tulu-3-70B

1244

+11/-9

3,026

Ai2

GPT-4-1106-preview

1250

+3/-2

103,732

OpenAI

Meta-Llama-3.1-70B-Instruct

1248

+3/-3

58,786

Meta

Claude 3 Opus

1247

+2/-2

202,735

Anthropic

Amazon Nova Pro 1.0

1244

+5/-5

13,076

Amazon

GPT-4-0125-preview

1245

+3/-2

97,070

OpenAI

Mistral-Large-2411

1243

+5/-5

10,634

Mistral

Claude 3.5 Haiku (20241022)

1237

+7/-4

10,773

Anthropic

Reka-Core-20240904

1235

+6/-6

7,937

Reka AI

Gemini-1.5-Flash-001

1227

+3/-2

65,650

Google

Jamba-1.5-Large

1221

+5/-5

9,125

AI21 Labs

Gemma-2-27B-it

1220

+3/-3

72,425

Google

Amazon Nova Lite 1.0

1218

+6/-5

10,868

Amazon

Qwen2.5-Coder-32B-Instruct

1217

+7/-6

5,731

Alibaba

Gemma-2-9B-it-SimPO

1216

+5/-6

10,557

Princeton

Command R+ (08-2024)

1215

+5/-5

10,546

Cohere

Llama-3.1-Nemotron-51B-Instruct

1211

+7/-8

3,897

Nvidia

Phi-4

1210

+9/-10

2,485

Microsoft

Gemini-1.5-Flash-8B-001

1212

+4/-4

36,162

Google

Aya-Expanse-32B

1209

+4/-4

26,994

Cohere

GLM-4-0520

1207

+8/-5

10,214

Zhipu AI

Nemotron-4-340B-Instruct

1209

+4/-3

20,609

Nvidia

Reka-Flash-20240904

1205

+7/-5

8,130

Reka AI

Llama-3-70B-Instruct

1206

+2/-3

163,797

Meta

Claude 3 Sonnet

1201

+2/-2

113,033

Anthropic

Amazon Nova Micro 1.0

1197

+5/-6

10,956

Amazon

Gemma-2-9B-it

1191

+3/-3

50,239

Google

Command R+ (04-2024)

1190

+2/-2

80,868

Cohere

Hunyuan-Standard-256K

1189

+8/-9

2,899

Tencent

Llama-3.1-Tulu-3-8B

1185

+11/-10

3,077

Ai2

Qwen2-72B-Instruct

1187

+3/-3

38,889

Alibaba

GPT-4-0314

1186

+3/-2

55,965

OpenAI

Ministral-8B-2410

1182

+7/-7

5,109

Mistral

Aya-Expanse-8B

1181

+6/-6

8,797

Cohere

Command R (08-2024)

1180

+5/-5

10,846

Cohere

Claude 3 Haiku

1179

+2/-2

122,291

Anthropic

DeepSeek-Coder-V2-Instruct

1178

+5/-5

15,752

DeepSeek AI

Jamba-1.5-Mini

1176

+5/-6

9,269

AI21 Labs

Meta-Llama-3.1-8B-Instruct

1176

+3/-3

52,646

Meta

GPT-4-0613

1163

+2/-3

91,642

OpenAI

Qwen1.5-110B-Chat

1161

+4/-4

27,467

Alibaba

Yi-1.5-34B-Chat

1157

+4/-4

25,126

01 AI

Mistral-Large-2402

1157

+3/-3

64,912

Mistral

Reka-Flash-21B-online

1156

+5/-4

16,024

Reka AI

103

QwQ-32B-Preview

1153

+11/-11

3,415

Alibaba

Llama-3-8B-Instruct

1152

+2/-2

109,211

Meta

InternLM2.5-20B-chat

1149

+5/-5

10,599

InternLM

Command R (04-2024)

1149

+2/-3

56,380

Cohere

Mistral Medium

1148

+3/-3

35,554

Mistral

Reka-Flash-21B

1148

+4/-3

25,803

Reka AI

Mixtral-8x22b-Instruct-v0.1

1148

+2/-3

53,788

Mistral

Qwen1.5-72B-Chat

1147

+3/-3

40,638

Alibaba

Granite-3.1-8B-Instruct

1139

+13/-12

2,468

IBM

Gemma-2-2b-it

1142

+3/-4

41,868

Google

Gemini-1.0-Pro-001

1131

+4/-4

18,793

Google

Zephyr-ORPO-141b-A35b-v0.1

1127

+10/-8

4,860

HuggingFace

Qwen1.5-32B-Chat

1125

+4/-4

22,772

Alibaba

Granite-3.1-2B-Instruct

1117

+11/-11

2,476

IBM

Phi-3-Medium-4k-Instruct

1123

+4/-3

26,112

Microsoft

103

Starling-LM-7B-beta

1119

+5/-5

16,671

Nexusflow

Mixtral-8x7B-Instruct-v0.1

1114

+0/-0

76,138

Mistral

Yi-34B-Chat

1111

+5/-4

15,922

01 AI

Gemini Pro

1110

+7/-8

6,559

Google

Qwen1.5-14B-Chat

1109

+3/-5

18,678

Alibaba

WizardLM-70B-v1.0

1106

+7/-6

8,380

Microsoft

GPT-3.5-Turbo-0125

1106

+3/-3

68,873

OpenAI

101

Meta-Llama-3.2-3B-Instruct

1103

+8/-7

8,411

Meta

DBRX-Instruct-Preview

1103

+4/-3

33,737

Databricks

Phi-3-Small-8k-Instruct

1102

+5/-6

18,477

Microsoft

102

Tulu-2-DPO-70B

1099

+6/-6

6,663

AllenAI/UW

103

Granite-3.0-8B-Instruct

1093

+6/-6

7,005

IBM

103

OpenChat-3.5-0106

1091

+5/-5

12,980

OpenChat

104

112

Llama-2-70B-chat

1093

+3/-3

39,634

Meta

105

104

Vicuna-33B

1091

+4/-4

22,950

LMSYS

105

107

Starling-LM-7B-alpha

1088

+7/-4

10,416

UC Berkeley

105

116

Nous-Hermes-2-Mixtral-8x7B-DPO

1084

+11/-10

3,835

NousResearch

106

Snowflake Arctic Instruct

1090

+3/-3

34,172

Snowflake

106

112

NV-Llama2-70B-SteerLM-Chat

1081

+10/-8

3,637

Nvidia

107

Gemma-1.1-7B-it

1084

+4/-4

25,065

Google

111

101

DeepSeek-LLM-67B-Chat

1077

+7/-7

4,987

DeepSeek AI

112

100

OpenChat-3.5

1076

+5/-8

8,112

OpenChat

112

100

OpenHermes-2.5-Mistral-7B

1074

+8/-6

5,089

NousResearch

112

107

Granite-3.0-2B-Instruct

1074

+7/-8

7,193

IBM

113

118

Mistral-7B-Instruct-v0.2

1072

+4/-4

20,058

Mistral

113

118

Phi-3-Mini-4K-Instruct-June-24

1071

+4/-4

12,818

Microsoft

113

118

Qwen1.5-7B-Chat

1070

+6/-9

4,868

Alibaba

114

GPT-3.5-Turbo-1106

1068

+5/-5

17,032

OpenAI

115

122

Phi-3-Mini-4k-Instruct

1066

+4/-4

21,085

Microsoft

115

116

Dolphin-2.2.1-Mistral-7B

1062

+10/-13

1,713

Cognitive Computations

115

117

SOLAR-10.7B-Instruct-v1.0

1062

+10/-10

4,288

Upstage AI

119

123

Llama-2-13b-chat

1063

+5/-5

19,738

Meta

121

118

WizardLM-13b-v1.2

1059

+7/-7

7,178

Microsoft

123

127

CodeLlama-70B-instruct

1041

+21/-17

1,194

Meta

124

128

Meta-Llama-3.2-1B-Instruct

1054

+6/-6

8,535

Meta

124

125

Zephyr-7B-beta

1053

+6/-6

11,334

HuggingFace

124

118

SmolLM2-1.7B-Instruct

1047

+13/-13

2,371

HuggingFace

124

118

MPT-30B-chat

1046

+12/-10

2,648

MosaicML

125

121

Zephyr-7B-alpha

1042

+11/-15

1,814

HuggingFace

127

126

CodeLlama-34B-instruct

1043

+7/-5

7,515

Meta

127

118

falcon-180b-chat

1034

+18/-17

1,327

TII

130

118

Vicuna-13B

1042

+5/-5

19,790

LMSYS

130

126

Gemma-7B-it

1037

+6/-5

9,176

Google

130

127

Phi-3-Mini-128k-Instruct

1037

+4/-4

21,632

Microsoft

130

141

Llama-2-7B-chat

1037

+6/-5

14,555

Meta

130

118

Qwen-14B-Chat

1035

+7/-7

5,070

Alibaba

130

128

Guanaco-33B

1033

+10/-12

2,999

139

132

Gemma-1.1-2b-it

1021

+6/-5

11,348

Google

139

135

StripedHyena-Nous-7B

1018

+9/-7

5,273

Together AI

140

148

OLMo-7B-instruct

1016

+6/-7

6,504

Allen AI

143

140

Mistral-7B-Instruct-v0.1

1008

+6/-6

9,144

Mistral

143

142

Vicuna-7B

1005

+8/-7

7,015

LMSYS

143

129

PaLM-Chat-Bison-001

1004

+8/-5

8,744

Google

148

146

Gemma-2B-it

989

+7/-9

4,922

Google

148

145

Qwen1.5-4B-Chat

988

+6/-6

7,813

Alibaba

150

149

Koala-13B

964

+8/-6

7,034

UC Berkeley

150

ChatGLM3-6B

955

+8/-8

4,765

Tsinghua

152

149

GPT4All-13B-Snoozy

932

+13/-15

1,786

Nomic AI

152

150

MPT-7B-Chat

928

+11/-8

4,012

MosaicML

152

155

ChatGLM2-6B

924

+13/-10

2,707

Tsinghua

152

155

RWKV-4-Raven-14B

922

+9/-8

4,934

RWKV

156

150

Alpaca-13B

902

+9/-9

5,876

Stanford

156

OpenAssistant-Pythia-12B

893

+7/-8

6,380

OpenAssistant

157

158

ChatGLM-6B

879

+10/-9

4,988

Tsinghua

158

FastChat-T5-3B

868

+9/-8

4,302

LMSYS

160

StableLM-Tuned-Alpha-7B

840

+11/-11

3,341

Stability AI

160

158

Dolly-V2-12B

822

+11/-12

3,485

Databricks

161

160

LLaMA-13B

800

+12/-14

2,444

Meta

说明

排名(UB)：基于 Bradley-Terry 模型的上界排名
排名(StyleCtrl)：考虑对话风格的样式控制排名
置信区间：模型表现的置信区间
分数：基于模型性能的竞技场得分

数据来源

数据来自 lmarena.ai