Model Leaderboard
This document was translated from Chinese by AI and has not yet been reviewed.
This is a leaderboard based on Chatbot Arena (lmarena.ai) data, generated through an automated process.
Data updated: 2025-06-25 11:42:53 UTC / 2025-06-25 19:42:53 CST (Beijing Time)
Leaderboard
Explanation
Rank (UB): Ranking calculated based on the Bradley-Terry model. This ranking reflects the overall performance of the model in the arena and provides an upper bound estimate of its Elo score, helping to understand the potential competitiveness of the model.
Rank (StyleCtrl): Ranking after controlling for conversation style. This ranking aims to reduce bias in preferences caused by the model's response style (e.g., verbose, concise), providing a more pure evaluation of the model's core capabilities.
Model Name: The name of the Large Language Model (LLM). This column embeds relevant model links, which can be clicked to navigate.
Score: The Elo rating obtained by the model through user votes in the arena. Elo rating is a relative ranking system, where a higher score indicates better model performance. This score is dynamic and reflects the model's relative strength in the current competitive environment.
Confidence Interval: The 95% confidence interval of the model's Elo score (e.g.,
+6/-6
). A smaller interval indicates a more stable and reliable model score; conversely, a larger interval may indicate insufficient data or significant fluctuations in model performance. It provides a quantitative assessment of the score's accuracy.Votes: The total number of votes received by the model in the arena. More votes generally mean higher statistical reliability of its score.
Provider: The organization or company that provides the model.
License: The license type of the model, such as Proprietary, Apache 2.0, MIT, etc.
Knowledge Cutoff Date: The knowledge cutoff date for the model's training data. No data available indicates that the relevant information is not provided or is unknown.
Data Source and Update Frequency
The data for this leaderboard is automatically generated and provided by the fboulnois/llm-leaderboard-csv project, which obtains and processes data from lmarena.ai. This leaderboard is automatically updated daily by GitHub Actions.
Disclaimer
This report is for reference only. The leaderboard data is dynamic and based on user preference votes on Chatbot Arena during a specific period. The completeness and accuracy of the data depend on the updates and processing of the upstream data source and the fboulnois/llm-leaderboard-csv
project. Different models may use different license agreements; please refer to the official documentation of the model provider when using them.
最后更新于
这有帮助吗?