Model Leaderboard
This document was translated from Chinese by AI and has not yet been reviewed.
LLM Arena Leaderboard (Live Updates)
This is a leaderboard based on data from Chatbot Arena (lmarena.ai), generated through an automated process.
Data Updated: 2025-06-12 11:42:10 UTC / 2025-06-12 19:42:10 CST (Beijing Time)
Leaderboard
Explanation
Rank (UB): A ranking calculated based on the Bradley-Terry model. This rank reflects the model's overall performance in the arena and provides an upper bound estimate of its Elo score, helping to understand the model's potential competitiveness.
Rank (StyleCtrl): The ranking after applying dialogue style control. This ranking aims to reduce preference bias caused by the model's response style (e.g., verbosity, conciseness) to more purely evaluate its core capabilities.
Model Name: The name of the Large Language Model (LLM). This column has embedded links to the models; click to navigate.
Score: The Elo rating the model received from user votes in the arena. The Elo rating is a relative ranking system where a higher score indicates better performance. This score is dynamic and reflects the model's relative strength in the current competitive environment.
Confidence Interval: The 95% confidence interval for the model's Elo rating (e.g.,
+6/-6
). A smaller interval indicates that the model's rating is more stable and reliable; conversely, a larger interval may suggest insufficient data or significant performance fluctuations. It provides a quantitative assessment of the rating's accuracy.Votes: The total number of votes the model has received in the arena. A higher number of votes generally means higher statistical reliability of its rating.
Provider: The organization or company that provides the model.
License: The type of license for the model, such as Proprietary, Apache 2.0, MIT, etc.
Knowledge Cutoff: The knowledge cutoff date for the model's training data. No data indicates that the relevant information is not provided or is unknown.
Data Source and Update Frequency
The data for this leaderboard is automatically generated and provided by the fboulnois/llm-leaderboard-csv project, which sources and processes data from lmarena.ai. This leaderboard is updated daily via GitHub Actions.
Disclaimer
This report is for reference only. The leaderboard data is dynamic and based on user preference votes on Chatbot Arena over a specific period. The completeness and accuracy of the data depend on the upstream data source and the updates and processing from the fboulnois/llm-leaderboard-csv
project. Different models may have different license agreements; please refer to the official documentation from the model provider before use.
最后更新于
这有帮助吗?