Model Leaderboard
This document was translated from Chinese by AI and has not yet been reviewed.
This is a leaderboard based on Chatbot Arena (lmarena.ai) data, generated through an automated process.
Data last updated: 2025-09-22 11:35:55 UTC / 2025-09-22 19:35:55 CST (Beijing Time)
Leaderboard
Explanation
Rank (UB): Ranking calculated based on the Bradley-Terry model. This ranking reflects the model's overall performance in the arena and provides an upper bound estimate of its Elo score, helping to understand the model's potential competitiveness.
Rank (StyleCtrl): Ranking after controlling for conversational style. This ranking aims to reduce preference bias caused by model response styles (e.g., verbosity, conciseness), providing a purer evaluation of the model's core capabilities.
Model Name: The name of the Large Language Model (LLM). This column includes embedded links to model details; click to jump.
Score: The Elo rating obtained by the model through user votes in the arena. Elo rating is a relative ranking system, where a higher score indicates better model performance. This score is dynamic and reflects the model's relative strength in the current competitive environment.
Confidence Interval: The 95% confidence interval for the model's Elo rating (e.g.,
+6/-6
). A smaller interval indicates more stable and reliable scores; conversely, a larger interval may suggest insufficient data or greater volatility in model performance. It provides a quantitative assessment of rating accuracy.Votes: The total number of votes received by the model in the arena. More votes generally mean higher statistical reliability for its score.
Provider: The organization or company providing the model.
License: The type of license for the model, such as Proprietary, Apache 2.0, MIT, etc.
Knowledge Cutoff Date: The knowledge cutoff date for the model's training data. No data available indicates that the relevant information is not provided or unknown.
Data Source and Update Frequency
The data for this leaderboard is automatically generated and provided by the fboulnois/llm-leaderboard-csv project, which retrieves and processes data from lmarena.ai. This leaderboard is automatically updated daily by GitHub Actions.
Disclaimer
This report is for reference only. Leaderboard data is dynamic and based on user preference votes on Chatbot Arena over a specific period. The completeness and accuracy of the data depend on the upstream data source and the updates and processing of the fboulnois/llm-leaderboard-csv
project. Different models may use different license agreements; please refer to the official documentation of the model provider when using them.
Last updated
Was this helpful?