Model Rankings
This document was translated from Chinese by AI and has not yet been reviewed.
This is a leaderboard based on Chatbot Arena (lmarena.ai) data, generated through an automated process.
Data Update Time: 2025-08-07 11:57:06 UTC / 2025-08-07 19:57:06 CST (Beijing Time)
Leaderboard
Explanation
Rank (UB): Ranking calculated based on the Bradley-Terry model. This ranking reflects the overall performance of the model in the arena and provides an upper bound estimate of its Elo score, helping to understand the model's potential competitiveness.
Rank (StyleCtrl): Ranking after controlling for conversational style. This ranking aims to reduce preference bias caused by model response styles (e.g., verbose, concise), providing a purer assessment of the model's core capabilities.
Model Name: The name of the Large Language Model (LLM). This column has embedded links to relevant model pages, which can be clicked to navigate.
Score: The Elo rating obtained by the model through user votes in the arena. Elo rating is a relative ranking system, where a higher score indicates better model performance. This score is dynamic, reflecting the model's relative strength in the current competitive environment.
Confidence Interval: The 95% confidence interval for the model's Elo rating (e.g.,
+6/-6
). A smaller interval indicates a more stable and reliable rating; conversely, a larger interval may suggest insufficient data or greater fluctuations in model performance. It provides a quantitative assessment of the rating's accuracy.Votes: The total number of votes received by the model in the arena. A higher vote count generally indicates higher statistical reliability of its rating.
Provider: The organization or company that provides the model.
License: The type of license for the model, such as Proprietary, Apache 2.0, MIT, etc.
Knowledge Cutoff Date: The knowledge cutoff date for the model's training data. No data available indicates that the relevant information is not provided or unknown.
Data Source and Update Frequency
This leaderboard data is automatically generated and provided by the fboulnois/llm-leaderboard-csv project, which retrieves and processes data from lmarena.ai. This leaderboard is automatically updated daily by GitHub Actions.
Disclaimer
This report is for reference only. The leaderboard data is dynamic and based on user preference votes on Chatbot Arena within a specific period. The completeness and accuracy of the data depend on the updates and processing of the upstream data sources and the fboulnois/llm-leaderboard-csv
project. Different models may adopt different license agreements; please refer to the official documentation from the model provider when using them.
最后更新于
这有帮助吗?