仅本页所有页面
由 GitBook 提供支持
1 / 83

English

Cherry Studio

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Basic Tutorials

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Knowledge Base Tutorials

Loading...

Loading...

Loading...

Loading...

Advanced Tutorials

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Project Contribution

Loading...

Loading...

Issues & Feedback

Loading...

Loading...

Loading...

Loading...

Contact Us

Loading...

Loading...

About

Loading...

Other

Loading...

Loading...

Client Download

This document was translated from Chinese by AI and has not yet been reviewed.

Client Download

Current latest stable version: v1.4.3

Direct Download

Windows Version

Note: Installing Cherry Studio is not supported on Windows 7.

Installer Version (Setup)

x64 Version

Main Link:

【Cherry Studio Official Website】 【GitHub】

Alternate Links:

【Link 1】 【Link 2】 【Link 3】

ARM64 Version

Main Link:

【Cherry Studio Official Website】 【GitHub】

Alternate Links:

【Link 1】 【Link 2】 【Link 3】

Portable Version (Portable)

x64 Version

Main Link:

【Cherry Studio Official Website】 【GitHub】

Alternate Links:

【Link 1】 【Link 2】 【Link 3】

ARM64 Version

Main Link:

【Cherry Studio Official Website】 【GitHub】

Alternate Links:

【Link 1】 【Link 2】 【Link 3】


macOS Version

Intel Chip Version (x64)

Main Link:

【Cherry Studio Official Website】 【GitHub】

Alternate Links:

【Link 1】 【Link 2】 【Link 3】

Apple Silicon Version (ARM64, M-series chips)

Main Link:

【Cherry Studio Official Website】 【GitHub】

Alternate Links:

【Link 1】 【Link 2】 【Link 3】


Linux Version

x86_64 Version

Main Link:

【Cherry Studio Official Website】 【GitHub】

Alternate Links:

【Link 1】 【Link 2】 【Link 3】

ARM64 Version

Main Link:

【Cherry Studio Official Website】 【GitHub】

Alternate Links:

【Link 1】 【Link 2】 【Link 3】


Cloud Drive Download

Quark

Feature Introduction

This document was translated from Chinese by AI and has not yet been reviewed.

Feature Overview

Settings

This document was translated from Chinese by AI and has not yet been reviewed.

Settings

Project Introduction

This document was translated from Chinese by AI and has not yet been reviewed.

Project Introduction

Follow our social accounts: Twitter(X), Xiaohongshu, Weibo, Bilibili, Douyin

Join our communities: QQ Group(575014769), Telegram, Discord, WeChat Group(click to view)


Cherry Studio is an all-in-one AI assistant platform integrating multi-model conversations, knowledge base management, AI painting, translation, and more. Cherry Studio's highly customizable design, powerful extensibility, and user-friendly experience make it an ideal choice for professional users and AI enthusiasts. Whether you are a beginner or a developer, you can find suitable AI functions in Cherry Studio to enhance your work efficiency and creativity.


Core Features & Highlights

1. Basic Chat Functionality

  • One Question, Multiple Answers: Supports generating replies from multiple models simultaneously for the same question, allowing users to compare the performance of different models. For details, see Chat Interface.

  • Automatic Grouping: Conversation records for each assistant are automatically grouped and managed, making it easy for users to quickly find historical conversations.

  • Conversation Export: Supports exporting full or partial conversations to various formats (e.g., Markdown, Word) for easy storage and sharing.

  • Highly Customizable Parameters: In addition to basic parameter adjustments, it also supports custom parameters to meet personalized needs.

  • Assistant Market: Built-in with over a thousand industry-specific assistants, covering fields like translation, programming, and writing, while also supporting user-defined assistants.

  • Multiple Format Rendering: Supports Markdown rendering, formula rendering, real-time HTML preview, and other functions to enhance content display.

2. Integration of Various Special Features

  • AI Painting: Provides a dedicated painting panel where users can generate high-quality images through natural language descriptions.

  • AI Mini-programs: Integrates various free web-based AI tools, allowing direct use without switching browsers.

  • Translation Function: Supports a dedicated translation panel, in-conversation translation, prompt translation, and other translation scenarios.

  • File Management: Files from conversations, paintings, and knowledge bases are managed in a unified and classified manner, avoiding tedious searches.

  • Global Search: Supports quick location of historical records and knowledge base content, improving work efficiency.

3. Unified Management for Multiple Service Providers

  • Service Provider Model Aggregation: Supports unified calling of models from major service providers like OpenAI, Gemini, Anthropic, and Azure.

  • Automatic Model Fetching: One-click to get a complete list of models without manual configuration.

  • Multi-key Polling: Supports rotating multiple API keys to avoid rate limit issues.

  • Precise Avatar Matching: Automatically matches each model with an exclusive avatar for better recognition.

  • Custom Service Providers: Supports third-party service providers that comply with specifications like OpenAI, Gemini, and Anthropic, offering strong compatibility.

4. Highly Customizable Interface and Layout

  • Custom CSS: Supports global style customization to create a unique interface style.

  • Custom Chat Layout: Supports list or bubble style layouts and allows customization of message styles (e.g., code snippet styles).

  • Custom Avatars: Supports setting personalized avatars for the software and assistants.

  • Custom Sidebar Menu: Users can hide or reorder sidebar functions according to their needs to optimize the user experience.

5. Local Knowledge Base System

  • Multiple Format Support: Supports importing various file formats such as PDF, DOCX, PPTX, XLSX, TXT, and MD.

  • Multiple Data Source Support: Supports local files, URLs, sitemaps, and even manually entered content as knowledge base sources.

  • Knowledge Base Export: Supports exporting processed knowledge bases to share with others.

  • Search and Check Support: After importing a knowledge base, users can perform real-time retrieval tests to check the processing results and segmentation effects.

6. Special Focus Features

  • Quick Q&A: Summon a quick assistant in any context (e.g., WeChat, browser) to get answers quickly.

  • Quick Translation: Supports quick translation of words or text from other contexts.

  • Content Summarization: Quickly summarizes long text content to improve information extraction efficiency.

  • Explanation: Explains complex issues with one click, without needing complicated prompts.

7. Data Security

  • Multiple Backup Solutions: Supports local backup, WebDAV backup, and scheduled backups to ensure data safety.

  • Data Security: Supports fully local usage scenarios, combined with local large models, to avoid data leakage risks.


Project Advantages

  1. Beginner-Friendly: Cherry Studio is committed to lowering the technical barrier, allowing even users with no prior experience to get started quickly, focusing on their work, study, or creation.

  2. Comprehensive Documentation: Provides detailed user manuals and FAQs to help users solve problems quickly.

  3. Continuous Iteration: The project team actively responds to user feedback and continuously optimizes features to ensure the project's healthy development.

  4. Open Source and Extensibility: Supports customization and extension through open-source code to meet personalized needs.


Applicable Scenarios

  • Knowledge Management and Query: Quickly build and query exclusive knowledge bases using the local knowledge base feature, suitable for research, education, and other fields.

  • Multi-model Conversation and Creation: Supports simultaneous conversation with multiple models, helping users quickly obtain information or generate content.

  • Translation and Office Automation: Built-in translation assistants and file processing functions are suitable for users who need cross-lingual communication or document processing.

  • AI Painting and Design: Generate images from natural language descriptions to meet creative design needs.

Star History

Follow Our Social Accounts

Alibaba Cloud Bailian

This document was translated from Chinese by AI and has not yet been reviewed.

Alibaba Cloud Bailian

  1. Log in to Alibaba Cloud Bailian. If you don't have an Alibaba Cloud account, you'll need to register one.

  2. Click the 创建我的 API-KEY (Create My API-KEY) button in the upper right corner.

  1. In the pop-up window, select the default business space (or you can customize it), and you can enter a description if you want.

  1. Click the 确定 (Confirm) button in the lower right corner.

  2. Afterward, you should see a new row added to the list. Click the 查看 (View) button on the right.

  3. Click the 复制 (Copy) button.

  4. Go to Cherry Studio, navigate to Settings → Model Providers → Alibaba Cloud Bailian, find API Key, and paste the copied API key here.

  5. You can adjust the relevant settings as described in Model Providers, and then you can start using it.

If you don't find any models from Alibaba Cloud Bailian in the model list, please ensure you have added the models and enabled this provider as described in Model Providers.

Display Settings

This document was translated from Chinese by AI and has not yet been reviewed.

Display Settings

Theme Selection

Here you can set the default interface color mode (Light Mode, Dark Mode, or Follow System).

Topic Settings

This setting is for the layout of the conversation interface.

Topic Position

Auto-switch to Topic

When this setting is enabled, clicking on the assistant's name will automatically switch to the corresponding topic page.

Show Topic Time

When enabled, the creation time of the topic will be displayed below the topic.

Custom CSS

Star History
Create API Key in Alibaba Cloud Bailian
Alibaba Cloud Bailian Create API Key Pop-up
View API Key in Alibaba Cloud Bailian
Copy API Key in Alibaba Cloud Bailian
Enter API Key for Alibaba Cloud Bailian

On this page, you can set the software's color theme, page layout, or use for personalized adjustments.

This setting allows for flexible and personalized changes to the interface. For specific methods, please refer to in the advanced tutorials.

Xiaohongshu

Bilibili

Weibo

Douyin

Twitter(X)

Custom CSS
Custom CSS

Agents

This document was translated from Chinese by AI and has not yet been reviewed.

Agents

The Agents page is a hub for assistants. Here, you can select or search for the model presets you want. Clicking on a card will add the assistant to the assistant list on the chat page.

You can also edit and create your own assistants on this page.

  • Click on My, then click on Create Agent to start creating your own assistant.

The button in the upper right corner of the prompt input box is for AI-optimizing the prompt. Clicking it will overwrite the original text. The model used is the Global Default Assistant Model.

Default Model Settings

This document was translated from Chinese by AI and has not yet been reviewed.

Default Model Settings

Default Assistant Model

When an assistant does not have a default assistant model set, the model selected by default in a new conversation will be the one set here.

The model set here is also used for optimizing prompts and the pop-up text assistant.

Topic Naming Model

After each conversation, a model is called to generate a topic name for the conversation. The model set here is the one used for naming.

Translation Model

The translation function in input boxes for conversations, drawing, etc., and the translation model on the translation interface all use the model set here.

Quick Assistant Model

The model used by the quick assistant feature. For details, see Quick Assistant

OneAPI and its Forks

This document was translated from Chinese by AI and has not yet been reviewed.

OneAPI and Its Forks

Installation Tutorial

This document was translated from Chinese by AI and has not yet been reviewed.

Installation Tutorial

Mini Programs

This document was translated from Chinese by AI and has not yet been reviewed.

Mini Programs

On the Mini Programs page, you can use the web versions of AI-related programs from major service providers within the client. Currently, custom adding and deleting are not yet supported.

Project Planning

This document was translated from Chinese by AI and has not yet been reviewed.

Project Plan

To-Do List

Data Settings

This document was translated from Chinese by AI and has not yet been reviewed.

Data Settings

General Settings

This document was translated from Chinese by AI and has not yet been reviewed.

General Settings

On this page, you can set the software's interface language, configure a proxy, etc.

Files

This document was translated from Chinese by AI and has not yet been reviewed.

Files

The Files interface displays all files related to conversations, paintings, knowledge bases, and more. You can manage and view them centrally on this page.

Personalization Settings

This document was translated from Chinese by AI and has not yet been reviewed.

Personalization Settings

Contributing Docs

This document was translated from Chinese by AI and has not yet been reviewed.

Contributing to the Documentation

Contact us via email at [email protected] to get editor access.

Title: Application for Cherry Studio Docs Editor Role

Body: State your reasons for applying

Free Web Search Mode

This document was translated from Chinese by AI and has not yet been reviewed.

Free Networking Mode

MCP Usage Tutorial

This document was translated from Chinese by AI and has not yet been reviewed.

MCP Usage Tutorial

FAQ

This document was translated from Chinese by AI and has not yet been reviewed.

Frequently Asked Questions

1. mcp-server-time

Solution

In the "Parameters" field, enter:

mcp-server-time
--local-timezone
<Your standard timezone, e.g., Asia/Shanghai>

Web Search Blacklist Configuration

This document was translated from Chinese by AI and has not yet been reviewed.

Web Search Blacklist Configuration

Cherry Studio supports configuring the blacklist manually or by adding subscription sources. For configuration rules, please refer to ublacklist.

Manual Configuration

You can add rules for search results or click the toolbar icon to block specified websites. Rules can be specified using either: match patterns (example: *://*.example.com/*) or regular expressions (example: /example\.(net|org)/).

Subscription Configuration

You can also subscribe to public rule sets. This website lists some subscriptions: https://iorate.github.io/ublacklist/subscriptions

Here are some recommended subscription source links:

Name
Link
Type

https://git.io/ublacklist

Chinese

https://raw.githubusercontent.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist/main/list_uBlacklist.txt

AI-generated

Model Provider Settings

This document was translated from Chinese by AI and has not yet been reviewed.

Provider Settings

  • When using built-in providers, you only need to fill in the corresponding key.

  • Different providers may have different names for the key. Secret, Key, API Key, Token, etc., all refer to the same thing.

API Key

In Cherry Studio, a single provider supports using multiple keys in a round-robin fashion. The rotation method is a list loop from front to back.

  • Add multiple keys separated by English commas. For example:

You must use English commas.

API Address

When using built-in providers, you generally do not need to fill in the API address. If you need to modify it, please strictly follow the address provided in the corresponding official documentation.

If the address provided by the provider is in the format https://xxx.xxx.com/v1/chat/completions, you only need to fill in the base URL part (https://xxx.xxx.com).

Cherry Studio will automatically append the remaining path (/v1/chat/completions). Failure to fill it in as required may result in it not working correctly.

Note: The large language model routes for most providers are standardized, so the following operations are generally not necessary. If the provider's API path is v2, v3/chat/completions, or another version, you can manually enter the corresponding version in the address field, ending with a /. When the provider's request route is not the standard /v1/chat/completions, use the complete address provided by the provider and end it with a #.

That is:

  • If the API address ends with /, only "chat/completions" will be appended.

  • If the API address ends with #, no appending operation is performed; only the entered address will be used.

Add Models

Usually, clicking the Manage button at the bottom left of the provider configuration page will automatically fetch all models supported by that provider. Click the + sign from the fetched list to add them to the model list.

When you click the manage button, not all models in the pop-up list will be added. You need to click the + to the right of a model to add it to the provider's model list on the configuration page for it to appear in the model selection list.

Connectivity Check

Click the check button after the API Key input box to test if the configuration is successful.

The model check uses the last chat model from the added model list by default. If the check fails, please verify that there are no incorrect or unsupported models in the model list.

After successful configuration, be sure to turn on the switch in the upper right corner. Otherwise, the provider will remain disabled, and you will not be able to find the corresponding models in the model list.

OneAPI

This document was translated from Chinese by AI and has not yet been reviewed.

OneAPI

  • Log in and go to the token page

  • Create a new token (or you can directly use the default token ↑)

  • Copy the token

  • Open CherryStudio's provider settings and click Add at the bottom of the provider list.

  • Enter a note name, select OpenAI as the provider, and click OK.

  • Paste the key you just copied.

  • Go back to the page where you got the API Key and copy the root address from the browser's address bar, for example:

  • When the address is an IP + port, fill in http://IP:port, for example: http://127.0.0.1:3000

  • Strictly distinguish between http and https. If SSL is not enabled, do not use https.

  • Add models (click Manage to automatically fetch or manually enter them) and toggle the switch in the upper right corner to enable them.

Other OneAPI themes may have different interfaces, but the method for adding them is the same as the process described above.

OpenAI

This document was translated from Chinese by AI and has not yet been reviewed.

OpenAI

Get API Key

  • Find the OpenAI provider and enter the key you just obtained.

  • Click Manage or Add at the bottom, add the supported models, and enable the provider switch in the upper right corner to start using it.

  • OpenAI services are not directly available in regions of China other than Taiwan; you will need to resolve the proxy issue yourself.

  • You must have a balance in your account.

Drawing

This document was translated from Chinese by AI and has not yet been reviewed.

Painting

For questions about the parameters, you can hover over the ? in the corresponding area to see the description.

More service providers will be added in the future. Please stay tuned.

Built-in MCP Configuration

This document was translated from Chinese by AI and has not yet been reviewed.

Built-in MCP Configurations

@cherry/mcp-auto-install

Automatically installs MCP services (beta).

@cherry/memory

A basic implementation of persistent memory based on a local knowledge graph. This allows the model to remember relevant user information across different conversations.

@cherry/sequentialthinking

An MCP server implementation that provides tools for dynamic and reflective problem-solving through structured thought processes.

@cherry/brave-search

An MCP server implementation that integrates the Brave Search API, providing dual functionality for web and local search.

@cherry/fetch

An MCP server for fetching web page content from a URL.

@cherry/filesystem

A Node.js server that implements the Model Context Protocol (MCP) for file system operations.

Clear CSS Settings

This document was translated from Chinese by AI and has not yet been reviewed.

Clear CSS Settings

Use this method to clear CSS settings when you have set incorrect CSS or cannot enter the settings interface after setting the CSS.

  • Open the console, click on the CherryStudio window, and press the shortcut key Ctrl+Shift+I (MacOS: command+option+I).

  • In the console window that pops up, click Console.

  • Then, manually type document.getElementById('user-defined-custom-css').remove(). Copying and pasting will likely not execute.

  • After typing, press Enter to confirm and clear the CSS settings. Then, go back to CherryStudio's display settings and delete the problematic CSS code.

Feedback & Suggestions

This document was translated from Chinese by AI and has not yet been reviewed.

Feedback & Suggestions

Telegram Discussion Group

Group members will share their experiences and help you solve problems.

QQ Group

QQ group members can help each other and share download links.

GitHub Issues

Suitable for recording issues to prevent developers from forgetting, or for participating in discussions here.

Email

If you can't find other feedback channels, you can contact the developer for help.

Contact the developer via email: [email protected]

Error screenshot

Subscription Configuration

This page only introduces the interface functions. For configuration tutorials, please refer to the tutorial in the basic tutorials.

On the official , click + Create new secret key

Copy the generated key and open CherryStudio's .

The painting feature currently only supports the painting models from SiliconFlow. You can go to to register an account and to use it.

Join the Telegram discussion group for help:

GitHub Issues:

sk-xxxx1,sk-xxxx2,sk-xxxx3,sk-xxxx4
MEMORY_FILE_PATH=/path/to/your/file.json
BRAVE_API_KEY=YOUR_API_KEY
uBlacklist subscription compilation
uBlockOrigin-HUGE-AI-Blocklist
Provider Configuration
API Key page
Provider Settings
SiliconFlow
add it to the providers
https://t.me/CherryStudioAI
QQ Group (1025067911)
https://github.com/CherryHQ/cherry-studio/issues/new/choose

Contributing Code

This document was translated from Chinese by AI and has not yet been reviewed.

Contributing Code

We welcome contributions to Cherry Studio! You can contribute in the following ways:

  1. Contribute Code: Develop new features or optimize existing code.

  2. Fix Bugs: Submit fixes for bugs you find.

  3. Maintain Issues: Help manage GitHub issues.

  4. Product Design: Participate in design discussions.

  5. Write Documentation: Improve user manuals and guides.

  6. Community Engagement: Join discussions and help users.

  7. Promote Usage: Spread the word about Cherry Studio.

How to Participate

Send an email to [email protected]

Email Subject: Apply to become a developer

Email Body: Reason for application

Privacy Policy

This document was translated from Chinese by AI and has not yet been reviewed.

Privacy Policy

Welcome to Cherry Studio (hereinafter referred to as "this software" or "we"). We place a high value on protecting your privacy. This Privacy Policy explains how we handle and protect your personal information and data. Please read and understand this policy carefully before using this software:

I. Scope of Information We Collect

To optimize user experience and improve software quality, we may only collect the following anonymous, non-personal information:

• Software version information; • Activity and usage frequency of software features; • Anonymous crash and error log information;

The above information is completely anonymous, does not involve any personally identifiable data, and cannot be associated with your personal information.

II. Information We Do Not Collect

To maximize the protection of your privacy and security, we explicitly promise:

• We will not collect, save, transmit, or process the model service API Key information you enter into this software; • We will not collect, save, transmit, or process any conversation data generated during your use of this software, including but not limited to chat content, command information, knowledge base information, vector data, and other custom content; • We will not collect, save, transmit, or process any personally identifiable sensitive information.

III. Data Interaction Description

This software uses the API Key from a third-party model service provider that you apply for and configure yourself to perform model calls and conversation functions. The model services you use (e.g., large models, API interfaces, etc.) are provided by and are the sole responsibility of the third-party provider you choose. Cherry Studio only acts as a local tool to provide the interface calling function with third-party model services.

Therefore:

• All conversation data generated between you and the large model service is unrelated to Cherry Studio. We do not participate in data storage, nor do we conduct any form of data transmission or relay; • You need to review and accept the privacy policies and related terms of the corresponding third-party model service providers. The privacy policies for these services can be found on the official websites of each provider.

IV. Third-Party Model Service Provider Privacy Policy Statement

You are solely responsible for any privacy risks that may arise from using third-party model service providers. For specific privacy policies, data security measures, and related liabilities, please refer to the relevant content on the official website of your chosen model service provider. We assume no responsibility for this.

V. Agreement Updates and Modifications

This policy may be adjusted appropriately with software version updates. Please check it regularly. When substantial changes to the policy occur, we will notify you in an appropriate manner.

VI. Contact Us

If you have any questions about the content of this policy or Cherry Studio's privacy protection measures, please feel free to contact us.

Thank you for choosing and trusting Cherry Studio. We will continue to provide you with a secure and reliable product experience.

Knowledge Base

This document was translated from Chinese by AI and has not yet been reviewed.

Knowledge Base

For usage of the knowledge base, refer to the Knowledge Base Tutorial in the advanced tutorials.

Quick Assistant

This document was translated from Chinese by AI and has not yet been reviewed.

Quick Assistant

Quick Assistant is a convenient tool provided by Cherry Studio that allows you to quickly access AI functions in any application, enabling instant questioning, translation, summarization, and explanation.

Enable Quick Assistant

  1. Open Settings: Navigate to Settings -> Shortcuts -> Quick Assistant.

  2. Enable the Switch: Find and turn on the switch for Quick Assistant.

  1. Set Shortcut (Optional):

    • The default shortcut for Windows is Ctrl + E.

    • The default shortcut for macOS is ⌘ + E.

    • You can customize the shortcut here to avoid conflicts or to better suit your usage habits.

Using Quick Assistant

  1. Invoke: In any application, press your set shortcut (or the default one) to open the Quick Assistant.

  2. Interact: In the Quick Assistant window, you can perform the following actions directly:

    • Quick Question: Ask the AI any question.

    • Text Translation: Enter the text you need to translate.

    • Content Summarization: Input long text for a summary.

    • Explanation: Enter concepts or terms that need clarification.

  3. Close: Press the ESC key or click anywhere outside the Quick Assistant window to close it.

The model used by the Quick Assistant is the Global Default Conversation Model.

Tips & Tricks

  • Shortcut Conflicts: If the default shortcut conflicts with other applications, please modify it.

  • Explore More Features: In addition to the functions mentioned in the documentation, the Quick Assistant may support other operations, such as code generation, style conversion, etc. It is recommended that you continue to explore during use.

  • Feedback & Improvement: If you encounter any problems or have any suggestions for improvement during use, please provide feedback to the Cherry Studio team in a timely manner.

macOS

macOS 版本安装教程

This document was translated from Chinese by AI and has not yet been reviewed.

macOS

  1. First, go to the official website's download page to download the Mac version, or click the direct link below.

Please make sure to download the correct chip version for your Mac.

If you don't know which chip version your Mac uses:

  • Click the  in the menu bar at the top-left corner of your Mac.

  • Click "About This Mac" in the dropdown menu.

  • Check the processor information in the pop-up window.

If it's an Intel chip, download the Intel version installer.

If it's an Apple M* chip, download the Apple chip installer.

  1. After the download is complete, click here.

  1. Drag the icon to install.

Go to Launchpad, find the Cherry Studio icon, and click it. If the Cherry Studio main interface opens, the installation is successful.

NewAPI

This document was translated from Chinese by AI and has not yet been reviewed.

NewAPI

  • Log in and open the token page

  • Click "Add Token"

  • Enter a token name and click "Submit" (other settings can be configured as needed).

  • Open the provider settings in CherryStudio and click Add at the bottom of the provider list.

  • Enter a memo name, select OpenAI as the provider, and click OK.

  • Paste the key you just copied.

  • Go back to the page where you obtained the API Key and copy the base URL from your browser's address bar. For example:

  • When the address is an IP + port, enter http://IP:port, for example: http://127.0.0.1:3000

  • Strictly distinguish between http and https. If SSL is not enabled, do not use https.

  • Add models (click "Manage" to fetch them automatically or enter them manually), then enable the switch in the top-right corner to start using them.

MCP Environment Installation

This document was translated from Chinese by AI and has not yet been reviewed.

MCP Environment Setup

Using MCP in Cherry Studio

The following uses the fetch feature as an example to demonstrate how to use MCP in Cherry Studio. You can find more details in the documentation.

Preparation: Install uv, bun

Cherry Studio currently only uses the built-in uv and bun, and will not reuse any existing installations of uv and bun on the system.

In Settings - MCP Server, click the Install button to automatically download and install them. Since the downloads are directly from GitHub, the speed might be slow, and there is a high chance of failure. The success of the installation depends on whether the files exist in the folder mentioned below.

Executable Installation Directory:

Windows: C:\Users\YourUsername\.cherrystudio\bin

macOS, Linux: ~/.cherrystudio/bin

If the installation fails:

You can create a symbolic link (soft link) from the corresponding system command to this directory. If the directory does not exist, you need to create it manually. Alternatively, you can manually download the executable files and place them in this directory:

Bun: https://github.com/oven-sh/bun/releases UV: https://github.com/astral-sh/uv/releases

Configure and Use MCP

This document was translated from Chinese by AI and has not yet been reviewed.

Configure and Use MCP

  1. Open Cherry Studio settings.

  2. Find the MCP Server option.

  3. Click Add Server.

  4. Fill in the relevant parameters for the MCP Server (reference link). The content you may need to fill in includes:

    • Name: Customize a name, for example, fetch-server

    • Type: Select STDIO

    • Command: Fill in uvx

    • Arguments: Fill in mcp-server-fetch

    • (There may be other parameters, depending on the specific Server)

  5. Click Save.

After completing the above configuration, Cherry Studio will automatically download the required MCP Server - fetch server. Once the download is complete, we can start using it! Note: If the mcp-server-fetch configuration is unsuccessful, you can try restarting your computer.

Enable MCP Service in the Chat Box

  • Successfully added an MCP server in the MCP Server settings

Usage Demonstration

As you can see from the image above, by integrating MCP's fetch feature, Cherry Studio can better understand the user's query intent, retrieve relevant information from the web, and provide more accurate and comprehensive answers.

Business Cooperation

This document was translated from Chinese by AI and has not yet been reviewed.

Business Cooperation

Contact Person: Mr. Wang

📮:[email protected]

📱:18954281942 (Not a customer service number)

For usage inquiries, you can join our user communication group at the bottom of the official website homepage, or email [email protected]

Or submit issues at: https://github.com/CherryHQ/cherry-studio/issues

If you need more guidance, you can join our Knowledge Planet

Commercial license details: https://docs.cherry-ai.com/contact-us/questions/cherrystudio-xu-ke-xie-yi

Windows

Windows 版本安装教程

This document was translated from Chinese by AI and has not yet been reviewed.

Windows

Open the Official Website

Note: Cherry Studio cannot be installed on Windows 7.

Click download and select the appropriate version

Wait for the Download to Complete

If the browser prompts that the file is not trusted, choose to keep it.

Choose Keep → Trust Cherry-Studio

Open the File

Install

SiYuan Note Configuration Tutorial

This document was translated from Chinese by AI and has not yet been reviewed.

SiYuan Note Configuration Tutorial

Supports exporting topics and messages to SiYuan Note.

Step 1

Open SiYuan Note and create a new notebook.

Step 2

Open the notebook settings and copy the Notebook ID.

Step 3

Paste the copied Notebook ID into the Cherry Studio settings.

Step 4

Enter the SiYuan Note address.

  • Local Usually http://127.0.0.1:6806

  • Self-hosted Your domain, e.g., http://note.domain.com

Step 5

Copy the SiYuan Note API Token.

Paste it into the Cherry Studio settings and check the connection.

Step 6

Congratulations, the SiYuan Note configuration is complete ✅ You can now export content from Cherry Studio to your SiYuan Note.

Configure Dify Knowledge Base

This document was translated from Chinese by AI and has not yet been reviewed.

Configure Dify Knowledge Base

The Dify Knowledge Base MCP requires upgrading Cherry Studio to v1.2.9 or higher.

Add Dify Knowledge Base MCP Server

  1. Open Search MCP.

  2. Add the dify-knowledge server.

Configure Dify Knowledge Base

Parameters and environment variables need to be configured

  1. The Dify knowledge base key can be obtained as follows

Using the Dify Knowledge Base MCP

GitHub Copilot

This document was translated from Chinese by AI and has not yet been reviewed.

GitHub Copilot

To use GitHub Copilot, you first need a GitHub account and a subscription to the GitHub Copilot service. A free version subscription is also acceptable, but the free version does not support the latest Claude 3.7 model. For details, please refer to the official GitHub Copilot website.

Get Device Code

Click "Login with GitHub" to get the Device Code and copy it.

Enter the Device Code in the browser and authorize

After successfully obtaining the Device Code, click the link to open your browser. Log in to your GitHub account in the browser, enter the Device Code, and authorize.

After successful authorization, return to Cherry Studio and click "Connect to GitHub". Upon success, your GitHub username and avatar will be displayed.

Click "Manage" to get the model list

Click the "Manage" button below, and it will automatically connect to the internet to fetch the list of currently supported models.

Frequently Asked Questions

Failed to get Device Code, please retry

Currently, requests are built using Axios, which does not support SOCKS proxies. Please use a system proxy or an HTTP proxy, or alternatively, do not set a proxy within CherryStudio and use a global proxy instead. First, ensure your network connection is stable to avoid failing to obtain the Device Code.

Infinigence

This document was translated from Chinese by AI and has not yet been reviewed.

Infini-AI

Are you experiencing this: having 26 insightful articles saved in your WeChat Favorites that you never open again, more than 10 files scattered in a "study materials" folder on your computer, or trying to find a theory you read six months ago but only remembering a few keywords. When the daily amount of information exceeds your brain's processing limit, 90% of valuable knowledge is forgotten within 72 hours. Now, by building a personal knowledge base with the Infini-AI Large Model Service Platform API + Cherry Studio, you can transform those dust-gathering WeChat articles and fragmented course content into structured knowledge for precise retrieval.

I. Building a Personal Knowledge Base

1. Infini-AI API Service: The "Thinking Hub" of Your Knowledge Base, Easy-to-Use and Stable

As the "thinking hub" of the knowledge base, the Infini-AI Large Model Service Platform offers model versions like the full-power DeepSeek R1, providing stable API services. Currently, it's free to use with no barriers after registration. It supports mainstream embedding models like bge and jina for building knowledge bases. The platform also continuously updates with the latest, most powerful, and stable open-source model services, including various modalities such as images, videos, and voice.

2. Cherry Studio: Build a Knowledge Base with Zero Code

Cherry Studio is an easy-to-use AI tool. Compared to the 1-2 month deployment cycle required for RAG knowledge base development, this tool's advantage is its support for zero-code operation. You can import multiple formats like Markdown/PDF/webpages with one click. A 40MB file can be parsed in 1 minute. Additionally, you can add local computer folders, article URLs from WeChat Favorites, and course notes.

II. Build Your Exclusive Knowledge Butler in 3 Steps

Step 1: Basic Preparation

  1. Visit the official Cherry Studio website to download the appropriate version (https://cherry-ai.com/)

  2. Register an account: Log in to the Infini-AI Large Model Service Platform (https://cloud.infini-ai.com/genstudio/model?cherrystudio)

  • Get API Key: In the "Model Square," select deepseek-r1, click create to get the APIKEY, and copy the model name.

Step 2: Open Cherry Studio settings, select Infini-AI in the Model Service, fill in the API Key, and enable the Infini-AI model service.

After completing the steps above, you can use Infini-AI's API service in Cherry Studio by selecting the desired large model during interaction. For convenience, you can also set a "Default Model" here.

Step 3: Add a Knowledge Base

Select any version of the bge series or jina series embedding models from the Infini-AI Large Model Service Platform.

III. Real User Scenario Test

  • After importing study materials, enter "Summarize the core formula derivations in Chapter 3 of 'Machine Learning'"

Generated result shown below

SiliconFlow

This document was translated from Chinese by AI and has not yet been reviewed.

SiliconFlow

1. Configure SiliconCloud's Model Service

​1.2 Click on Settings in the bottom-left corner and select 【SiliconFlow】 under Model Service

​1.2 Click the link to get the SiliconCloud API key

  1. Log in to SiliconCloud (if you haven't registered, an account will be automatically created on your first login)

  2. Visit API Keys to create a new key or copy an existing one

​1.3 Click Manage to add a model

​2. Using the Model Service

  1. Click the "Chat" button in the left menu bar

  2. Enter text in the input box to start chatting

  3. You can switch models by selecting the model name in the top menu

Knowledge Base Data

This document was translated from Chinese by AI and has not yet been reviewed.

Data Storage Explanation

All data added to the Cherry Studio knowledge base is stored locally. During the addition process, a copy of the document will be placed in the Cherry Studio data storage directory.

Vector Database: https://turso.tech/libsql

After a document is added to the Cherry Studio knowledge base, the file will be split into several chunks, and then these chunks will be processed by an embedding model.

When using a large model for Q&A, text chunks related to the question will be retrieved and sent to the large language model for processing.

If you have data privacy requirements, it is recommended to use a local embedding database and a local large language model.

Automatic MCP Installation

This document was translated from Chinese by AI and has not yet been reviewed.

Automatic MCP Installation

Automatic MCP installation requires upgrading Cherry Studio to v1.1.18 or a higher version.

Feature Introduction

In addition to manual installation, Cherry Studio has a built-in tool, @mcpmarket/mcp-auto-install, which provides a more convenient way to install MCP servers. You just need to input the corresponding command in a large model conversation that supports MCP services.

Beta Phase Reminder:

  • @mcpmarket/mcp-auto-install is still in its beta phase.

  • The effectiveness depends on the "intelligence" of the large model. Some configurations will be added automatically, while others may still require manual changes to certain parameters in the MCP settings.

  • Currently, the search source is @modelcontextprotocol, which you can configure yourself (explained below).

Usage Instructions

For example, you can enter:

The system will automatically recognize your request and complete the installation via @mcpmarket/mcp-auto-install. This tool supports various types of MCP servers, including but not limited to:

  • filesystem

  • fetch

  • sqlite

  • and more...

The MCP_PACKAGE_SCOPES variable allows you to customize the MCP service search source. The default value is: @modelcontextprotocol, which can be configured.

Introduction to the @mcpmarket/mcp-auto-install Library

Default Configuration Reference:

ByteDance (Doubao)

This document was translated from Chinese by AI and has not yet been reviewed.

ByteDance (Doubao)

Get API Key

  • Create an API Key.

  • After successful creation, click the eye icon next to the newly created API Key to reveal and copy it.

  • Paste the copied API Key into CherryStudio, then turn on the provider switch.

Activate and Add Models

  • Click Add, and paste the previously obtained Model ID into the Model ID text box.

  • Follow this process to add models one by one.

API Address

There are two ways to write the API address:

  • The first is the client default: https://ark.cn-beijing.volces.com/api/v3/

  • The second way is: https://ark.cn-beijing.volces.com/api/v3/chat/completions#

There is no difference between the two formats. You can keep the default; no modification is needed.

Notion Configuration Tutorial

This document was translated from Chinese by AI and has not yet been reviewed.

Notion Configuration Tutorial

Cherry Studio supports importing topics into a Notion database.

Step 1

Step 2

Create an integration.

Name: Cherry Studio

Type: Select the first one

Icon: You can save this image

Step 3

Copy the secret token and paste it into the Cherry Studio settings.

Step 4

Step 5

If your Notion database URL looks like this:

https://www.notion.so/<long_hash_1>?v=<long_hash_2>

Then the Notion database ID is the <long_hash_1> part.

Step 6

Fill in the Page Title Field Name:

If your web page is in English, enter Name If your web page is in Chinese, enter 名称

Step 7

Congratulations, your Notion configuration is complete ✅ You can now export content from Cherry Studio to your Notion database.

Add ModelScope MCP Server

This document was translated from Chinese by AI and has not yet been reviewed.

Add ModelScope MCP Server

ModelScope MCP Server requires upgrading Cherry Studio to v1.2.9 or higher.

In version v1.2.9, Cherry Studio officially partnered with ModelScope, significantly simplifying the process of adding MCP servers. This helps avoid configuration errors and allows you to discover a vast number of MCP servers within the ModelScope community. Follow the steps below to learn how to sync ModelScope's MCP servers in Cherry Studio.

Steps

Sync Entry Point:

Click on MCP Server Settings in the settings, and select Sync Server.

Discover MCP Services:

Select ModelScope and browse to discover MCP services.

View MCP Server Details

Register and log in to ModelScope, and view the MCP service details.

Connect to Server

In the MCP service details, select "Connect Service".

Apply for and Paste API Token

Click "Get API Token" in Cherry Studio, which will redirect you to the official ModelScope website. Copy the API token and paste it back into Cherry Studio.

Successful Sync

In the MCP server list in Cherry Studio, you can see the MCP service connected from ModelScope and call it in conversations.

Incremental Update

For new MCP servers connected on the ModelScope webpage later, simply click Sync Server to add them incrementally.

By following the steps above, you have successfully learned how to easily sync MCP servers from ModelScope in Cherry Studio. The entire configuration process is not only greatly simplified, effectively avoiding the hassle and potential errors of manual configuration, but it also allows you to easily access the vast MCP server resources provided by the ModelScope community.

Start exploring and using these powerful MCP services to bring more convenience and possibilities to your Cherry Studio experience

You only need to copy https://xxx.xxx.com; the "/" and the content after it are not needed.
Diagram for enabling Quick Assistant
Quick Assistant interface diagram
You only need to copy https://xxx.xxx.com. The "/" and everything that follows are not needed.

MCP (Model Context Protocol) is an open-source protocol designed to provide context information to Large Language Models (LLMs) in a standardized way. For more information about MCP, please see

bin directory
Open the official website
Downloading in Edge browser
Edge download list
Software installation interface
Click to create a new notebook
Open notebook settings
Click the button to copy the Notebook ID
Paste the Notebook ID into the data settings
Enter your SiYuan Note address
Copy the SiYuan Note token
Enter the API Token and click Check
Export to SiYuan Note
View the export result
Get Device Code
GitHub Authorization
GitHub Connection Successful
Get Model List
Failed to get Device Code
Knowledge Base Processing Flowchart

@mcpmarket/mcp-auto-install is an open-source npm package. You can view its detailed information and documentation on the . @mcpmarket is the official collection of MCP services for Cherry Studio.

Log in to

Click

Click on at the bottom of the sidebar.

In the at the bottom of the Ark console sidebar, activate the models you need. You can activate models like the Doubao series and DeepSeek as needed.

In the , find the Model ID corresponding to the desired model.

Open Cherry Studio's settings and find Volcano Engine.

For the difference between endings with / and #, refer to the API Address section in the provider settings documentation, .

Go to the website to create a new integration.

Go to the website and create a new page. Select the database type, name it Cherry Studio, and follow the illustration to connect.

Install a filesystem mcp server for me
// `axun-uUpaWEdMEMU8C61K` is the service ID, which can be customized
"axun-uUpaWEdMEMU8C61K": {
  "name": "mcp-auto-install",
  "description": "Automatically install MCP services (Beta version)",
  "isActive": false,
  "registryUrl": "https://registry.npmmirror.com",
  "command": "npx",
  "args": [
    "-y",
    "@mcpmarket/mcp-auto-install",
    "connect",
    "--json"
  ],
  "env": {
    "MCP_REGISTRY_PATH": "For details, see https://www.npmjs.com/package/@mcpmarket/mcp-auto-install"
  },
  "disabledTools": []
}
official npm repository
Volcano Engine
here to go directly
API Key Management
Activation Management
Model List Document
Model Service
click here to go
Notion Integrations
Notion

How to Ask Questions Effectively

This document was translated from Chinese by AI and has not yet been reviewed.

How to Ask Questions Effectively

Cherry Studio is a free and open-source project. As the project grows, the workload for the project team has also increased. To reduce communication costs and resolve your issues quickly and efficiently, we hope that you will follow the steps and methods below to handle problems before asking questions. This will allow the project team more time to focus on project maintenance and development. Thank you for your cooperation!

1. Check and Search the Documentation

Most basic questions can be solved by carefully reading the documentation.

  • For questions about the software's features and usage, you can check the Feature Introduction documentation.

  • Frequently asked questions are collected on the FAQ page. You can check there first for solutions.

  • For more complex issues, you can try solving them directly by searching or asking in the search bar.

  • Be sure to carefully read the content in the hint boxes within each document, as this can help you avoid many problems.

  • Check or search the GitHub Issues page for similar problems and solutions.

2. Search Online, Ask an AI

For issues unrelated to the client's functionality (such as model errors, unexpected responses, or parameter settings), it is recommended to first search online for relevant solutions or describe the error message and problem to an AI to find a solution.

3. Ask in Official Communities or Create a GitHub Issue

If the first two steps did not provide an answer or solve your problem, you can describe your issue in detail and seek help in our official Telegram Channel, Discord Channel, or (Click to Join).

  1. If it's a model error, please provide a complete screenshot of the interface and the console error message. You can censor sensitive information, but the model name, parameter settings, and error content must be visible in the screenshot. To learn how to view console error messages, click here.

  2. If it's a software bug, please provide a specific error description and detailed steps to help developers debug and fix it. If it's an intermittent issue that cannot be reproduced, please describe the relevant scenarios, context, and configuration parameters when the problem occurred in as much detail as possible. In addition, you also need to include platform information (Windows, Mac, or Linux) and the software version number in your problem description.

Requesting Documentation or Providing Suggestions

You can contact @Wangmouuu on our Telegram channel or QQ (1355873789), or send an email to: [email protected].

Shortcut Key Settings

This document was translated from Chinese by AI and has not yet been reviewed.

Shortcut Key Settings

On this interface, you can enable (or disable) and set shortcut keys for some functions. Please follow the instructions on the interface for setup.

Model Provider Configuration

This document was translated from Chinese by AI and has not yet been reviewed.

Model Service Configuration

Knowledge Sharing

PPIO Cloud

This document was translated from Chinese by AI and has not yet been reviewed.

PPIO Cloud

Connecting Cherry Studio to PPIO LLM API

​Tutorial Overview

Cherry Studio is a multi-model desktop client that currently supports installation packages for Windows, Linux, and macOS. It aggregates mainstream LLM models and provides multi-scenario assistance. Users can improve their work efficiency through intelligent session management, open-source customization, and multi-themed interfaces.

Cherry Studio is now deeply integrated with the PPIO high-performance API channel—ensuring high-speed responses for DeepSeek-R1/V3 and 99.9% service availability through enterprise-grade computing power, bringing you a fast and smooth experience.

The tutorial below provides a complete integration plan (including API key configuration), allowing you to enable the advanced mode of "Cherry Studio Intelligent Scheduling + PPIO High-Performance API" in just 3 minutes.

​1. Open CherryStudio and add "PPIO" as a model provider

First, go to the official website to download Cherry Studio: https://cherry-ai.com/download (If you can't access it, you can use the Quark Web Drive link below to download the version you need: https://pan.quark.cn/s/c8533a1ec63e#/list/share

(1) First, click on Settings in the bottom left corner, set the custom provider name to: PPIO, and click "OK"

(2) Go to PPIO Compute Cloud API Key Management, click on your [User Avatar] — [API Key Management] to enter the console

Click the [+ Create] button to create a new API key. Give it a custom name. The generated key is only displayed at the time of creation. Be sure to copy and save it to a document to avoid affecting future use.

(3) In CherryStudio, enter the API key. Click Settings, select [PPIO Cloud], enter the API key generated on the official website, and finally click [Check].

(4) Select the model: For example, deepseek/deepseek-r1/community. If you need to switch to another model, you can do so directly.

The community versions of DeepSeek R1 and V3 are for trial purposes. They are full-parameter models with no difference in stability or performance. For high-volume usage, you must top up your account and switch to a non-community version.

​2. Model Usage Configuration

(1) Click [Check]. Once it shows "Connection successful," you can start using it normally.

(2) Finally, click on [@] and select the DeepSeek R1 model you just added under the PPIO provider to start chatting successfully~

[Some materials sourced from: 陈恩]

​3. PPIO×Cherry Studio Video Tutorial

If you prefer visual learning, we have prepared a video tutorial on Bilibili. This step-by-step guide will help you quickly master the configuration of "PPIO API + Cherry Studio". Click the link below to go directly to the video and start your smooth development experience → 【Still frustrated by DeepSeek's endless loading?】PPIO Cloud + Full-power DeepSeek =? No more congestion, take off now!

[Video material sourced from: sola]

Web Search Mode

如何在 Cherry Studio 使用联网模式

This document was translated from Chinese by AI and has not yet been reviewed.

Web Search Mode

Examples of scenarios that require web access:

  • Time-sensitive information: For example, the price of gold futures today/this week/just now.

  • Real-time data: For example, weather, exchange rates, and other dynamic values.

  • Emerging knowledge: For example, new things, new concepts, new technologies, etc...

1. How to Enable Web Search

In the Cherry Studio question window, click the [Little Globe] icon to enable web search.

2. Special Note: There Are Two Web Search Modes

Mode 1: The model provider's large model has a built-in web search function

In this case, after enabling web search, you can use the service directly. It's very simple.

You can quickly determine if a model supports web search by checking for a small globe icon next to the model's name at the top of the chat interface.

On the model management page, this method also allows you to quickly distinguish which models support web search and which do not.

Cherry Studio currently supports the following model providers with web search capabilities:

  • Google Gemini

  • OpenRouter (all models support web search)

  • Tencent Hunyuan

  • Zhipu AI

  • Alibaba Cloud Bailian, etc.

Special Note:

There is a special case where a model can access the web even without the small globe icon, as explained in the tutorial below.


Mode 2: The model does not have a built-in web search function; use the Tavily service to enable it

When we use a large model without a built-in web search function (no small globe icon next to its name), but we need it to retrieve real-time information for processing, we need to use the Tavily web search service.

When using the Tavily service for the first time, a pop-up will prompt you to configure some settings. Please follow the instructions—it's very simple!

After clicking to get the API key, you will be automatically redirected to the official Tavily website's login/registration page. After registering and logging in, create an API key, then copy the key and paste it into Cherry Studio.

If you don't know how to register, refer to the Tavily web search login and registration tutorial in the same directory as this document.

Tavily registration reference document:

The interface below indicates that the registration was successful.

Let's try again to see the effect. The result shows that the web search is now working correctly, and the number of search results is our default setting: 5.

Note: Tavily has a monthly free usage limit. You will need to pay if you exceed it~~

PS: If you find any errors, please feel free to contact us.

Huawei Cloud

This document was translated from Chinese by AI and has not yet been reviewed.

Huawei Cloud

  1. Go to Huawei Cloud to create an account and log in.

  2. Click this link to enter the Maa S console.

  3. Authorization

  1. Click on Authentication Management in the sidebar, create an API Key (secret key), and copy it.

Then, create a new provider in CherryStudio.

After creation, fill in the secret key.

  1. Click on Model Deployment in the sidebar and claim all models.

  1. Click on Invoke.

Copy the address from ① and paste it into the Provider Address field in CherryStudio, and add a "#" symbol at the end.

And add a "#" symbol at the end.

And add a "#" symbol at the end.

And add a "#" symbol at the end.

And add a "#" symbol at the end.

Why add a "#" symbol? See here

Of course, you can also skip reading that and just follow the tutorial;

You can also fill it in by deleting v1/chat/completions. As long as you know how to fill it in, any method works. If you don't know how, be sure to follow the tutorial.

Then, copy the model name from ②, and in CherryStudio, click the "+Add" button to create a new model.

Enter the model name. Do not add anything extra or include quotes. Copy it exactly as it is written in the example.

Click the Add Model button to finish adding.

In Huawei Cloud, since the address for each model is different, you need to create a new provider for each model. Just repeat the steps above.

Knowledge Base Tutorial

This document was translated from Chinese by AI and has not yet been reviewed.

Knowledge Base Tutorial

In version 0.9.1, CherryStudio introduced the long-awaited knowledge base feature.

Below, we will provide detailed instructions for using CherryStudio step-by-step.

Add Embedding Model

  1. In the Model Management service, find a model. You can click "Embedding Model" to filter quickly;

  2. Find the model you need and add it to "My Models".

Create a Knowledge Base

  1. Knowledge Base Entry: On the left toolbar of CherryStudio, click the knowledge base icon to enter the management page;

  2. Add Knowledge Base: Click "Add" to start creating a knowledge base;

  3. Naming: Enter a name for the knowledge base and add an embedding model, for example, bge-m3, to complete the creation.

Add Files and Vectorize

  1. Add Files: Click the "Add Files" button to open the file selector;

  2. Select Files: Choose supported file formats like pdf, docx, pptx, xlsx, txt, md, mdx, etc., and open them;

  3. Vectorization: The system will automatically perform vectorization. When it shows "Completed" (green ✓), it means vectorization is finished.

Add Data from Multiple Sources

CherryStudio supports adding data in multiple ways:

  1. Folder Directory: You can add an entire folder directory. Files in supported formats within this directory will be automatically vectorized;

  2. URL Link: Supports website URLs, such as https://docs.siliconflow.cn/introduction;

  3. Sitemap: Supports XML-formatted sitemaps, such as https://docs.siliconflow.cn/sitemap.xml;

  4. Plain Text Note: Supports inputting custom content as plain text.

Tip:

  1. Illustrations in documents imported into the knowledge base are not yet supported for vector conversion and need to be manually converted to text;

  2. Using a website URL as a knowledge base source may not always be successful. Some websites have strict anti-scraping mechanisms (or require login, authorization, etc.), so this method may not retrieve accurate content. It is recommended to test by searching after creation.

  3. Most websites provide a sitemap, like CherryStudio's sitemap. Generally, you can get related information by adding /sitemap.xml after the root address (URL) of the website, e.g., aaa.com/sitemap.xml.

  4. If a website does not provide a sitemap or the URLs are complex, you can create your own sitemap.xml file. The file must be provided as a direct link that is publicly accessible on the internet; local file links will not be recognized.

  1. You can ask an AI to generate a sitemap file or write an HTML sitemap generator tool;

  2. Direct links can be generated using methods like OSS direct links or cloud storage direct links. If you don't have a ready-made tool, you can go to the ocoolAI official website, log in, and use the free file upload tool in the top bar to generate a direct link.

Search the Knowledge Base

Once files and other materials have been vectorized, you can start querying:

  1. Click the "Search Knowledge Base" button at the bottom of the page;

  2. Enter your query;

  3. The search results will be displayed;

  4. And the match score for each result will be shown.

Referencing the Knowledge Base to Generate Replies in a Conversation

  1. Create a new topic. In the conversation toolbar, click on the knowledge base icon. A list of created knowledge bases will expand. Select the one you want to reference;

  2. Enter and send your question. The model will return an answer generated from the search results;

  3. Additionally, the referenced data sources will be attached below the answer, allowing for quick access to the source files.

Data Settings

This document was translated from Chinese by AI and has not yet been reviewed.

Data Settings

This interface allows you to perform operations such as cloud and local backup of client data, querying the local data directory, and clearing the cache.

Data Backup

Currently, data backup only supports WebDAV. You can choose a service that supports WebDAV for cloud backup.

Taking Jianguoyun as an Example

  1. Log in to Jianguoyun, click on the username in the upper right corner, and select "Account Info":

  1. Select "Security Options" and click "Add Application"

  1. Enter the application name and generate a random password;

  1. Copy and save the password;

  1. Obtain the server address, account, and password;

  1. In Cherry Studio Settings -> Data Settings, fill in the WebDAV information;

  1. Choose to back up or restore data, and you can set the automatic backup time interval.

Generally, the easiest WebDAV services to get started with are cloud storage providers:

  • Jianguoyun

  • 123Pan (Requires membership)

  • Aliyun Drive (Requires purchase)

  • Box (Free space is 10GB, single file size limit is 250MB.)

  • Dropbox (Dropbox offers 2GB for free, and you can get up to 16GB by inviting friends.)

  • TeraCloud (Free space is 10GB, and an additional 5GB can be obtained through referrals.)

  • Yandex Disk (Provides 10GB of capacity for free users.)

Next are some services that you need to deploy yourself:

  • Alist

  • Cloudreve

  • sharelist

Tavily Web Search Login & Registration Tutorial

如何注册tavily?

This document was translated from Chinese by AI and has not yet been reviewed.

Tavily Web Login and Registration Tutorial

1. Tavily Official Website

https://app.tavily.com/home

Some users may experience slow access. If you have a proxy, you can use it.

2. Detailed Tavily Registration Steps

Visit the official website mentioned above, or go to Cherry Studio -> Settings -> Web Search and click "Get API Key". This will redirect you to the Tavily login/registration page.

If this is your first time, you need to Sign up for an account before you can Log in. Note that the page defaults to the login page.

  1. Click to sign up for an account to enter the following interface. Enter your commonly used email address, or use your Google/GitHub account. Then, enter your password in the next step. This is a standard procedure.

  1. 🚨🚨🚨[Crucial Step] After successful registration, there will be a dynamic verification code step. You need to scan a QR code to generate a one-time code to continue.

It's very simple. You have two options at this point.

  1. Download an authenticator app, like Microsoft Authenticator. [Slightly more complicated]

  2. Use the WeChat Mini Program: 腾讯身份验证器. [Simple, anyone can do it, recommended]

  1. Open the WeChat Mini Program and search for: 腾讯身份验证器

3. 🎉Registration Successful🎉

After completing the steps above, you will see the interface below, which means your registration was successful. Copy the key to Cherry Studio, and you can start using it happily.

Custom Provider

This document was translated from Chinese by AI and has not yet been reviewed.

Custom Providers

Cherry Studio not only integrates mainstream AI model services but also gives you powerful customization capabilities. With the Custom AI Provider feature, you can easily connect to any AI model you need.

Why Do You Need Custom AI Providers?

  • Flexibility: No longer limited to the preset list of providers, you are free to choose the AI model that best suits your needs.

  • Diversity: Experiment with AI models from various platforms to discover their unique advantages.

  • Controllability: Directly manage your API keys and access addresses to ensure security and privacy.

  • Customization: Integrate privately deployed models to meet the needs of specific business scenarios.

How to Add a Custom AI Provider?

You can add your custom AI provider in Cherry Studio in just a few simple steps:

  1. Open Settings: In the left navigation bar of the Cherry Studio interface, click "Settings" (the gear icon).

  2. Go to Model Services: On the settings page, select the "Model Services" tab.

  3. Add Provider: On the "Model Services" page, you will see a list of existing providers. Click the "+ Add" button below the list to open the "Add Provider" pop-up window.

  4. Fill in Information: In the pop-up window, you need to fill in the following information:

    • Provider Name: Give your custom provider an easily recognizable name (e.g., MyCustomOpenAI).

    • Provider Type: Select your provider type from the drop-down list. Currently supported types are:

      • OpenAI

      • Gemini

      • Anthropic

      • Azure OpenAI

  5. Save Configuration: After filling in the information, click the "Add" button to save your configuration.

Configuring a Custom AI Provider

After adding a provider, you need to find it in the list and configure its details:

  1. Enable Status: On the far right of the custom provider list, there is an enable switch. Turning it on enables this custom service.

  2. API Key:

    • Fill in the API Key provided by your AI service provider.

    • Click the "Check" button on the right to verify the key's validity.

  3. API Address:

    • Fill in the API access address (Base URL) for the AI service.

    • Be sure to refer to the official documentation provided by your AI service provider to get the correct API address.

  4. Model Management:

    • Click the "+ Add" button to manually add the model IDs you want to use under this provider, such as gpt-3.5-turbo, gemini-pro, etc.

    • If you are unsure of the specific model names, please refer to the official documentation provided by your AI service provider.

    • Click the "Manage" button to edit or delete the models that have been added.

Start Using

After completing the above configuration, you can select your custom AI provider and model in the Cherry Studio chat interface and start conversing with the AI!

Using vLLM as a Custom AI Provider

vLLM is a fast and easy-to-use LLM inference library, similar to Ollama. Here are the steps to integrate vLLM into Cherry Studio:

  1. Start the vLLM Service: Start the service using the OpenAI-compatible interface provided by vLLM. There are two main ways to do this:

    • Start using vllm.entrypoints.openai.api_server

    • Start using uvicorn

Ensure the service starts successfully and listens on the default port 8000. Of course, you can also specify the port number for the vLLM service using the --port parameter.

  1. Add vLLM Provider in Cherry Studio:

    • Follow the steps described earlier to add a new custom AI provider in Cherry Studio.

    • Provider Name: vLLM

    • Provider Type: Select OpenAI.

  2. Configure vLLM Provider:

    • API Key: Since vLLM does not require an API key, you can leave this field blank or fill in any content.

    • API Address: Fill in the API address of the vLLM service. By default, the address is: http://localhost:8000/ (if you use a different port, please modify it accordingly).

    • Model Management: Add the model name you loaded in vLLM. In the example python -m vllm.entrypoints.openai.api_server --model gpt2 above, you should enter gpt2 here.

  3. Start Chatting: Now, you can select the vLLM provider and the gpt2 model in Cherry Studio and start chatting with the vLLM-powered LLM!

Tips and Tricks

  • Read the Documentation Carefully: Before adding a custom provider, be sure to carefully read the official documentation of the AI service provider you are using to understand key information such as API keys, access addresses, and model names.

  • Check the API Key: Use the "Check" button to quickly verify the validity of the API key to avoid issues caused by an incorrect key.

  • Pay Attention to the API Address: The API address may vary for different AI service providers and models. Be sure to fill in the correct address.

  • Add Models On-Demand: Please only add the models you will actually use to avoid adding too many unnecessary models.

Change Storage Location

This document was translated from Chinese by AI and has not yet been reviewed.

Default Storage Location

Cherry Studio's data storage follows system specifications, and data is automatically placed in the user's directory. The specific directory locations are as follows:

macOS: /Users/username/Library/Application Support/CherryStudioDev

Windows: C:\Users\username\AppData\Roaming\CherryStudio

Linux: /home/username/.config/CherryStudio

You can also check the location here:

Change Storage Location (for reference)

Method 1:

This can be achieved by creating a symbolic link. Exit the application, move the data to your desired location, and then create a link at the original location pointing to the new location.

Method 2: Based on the characteristics of Electron applications, you can modify the storage location by configuring launch parameters.

--user-data-dir e.g.: Cherry-Studio-*-x64-portable.exe --user-data-dir="%user_data_dir%"

Example:

init_cherry_studio.bat (encoding: ANSI)

Directory structure of user-data-dir after initialization:

Custom CSS

This document was translated from Chinese by AI and has not yet been reviewed.

Custom CSS

With custom CSS, you can modify the software's appearance to better suit your preferences, like this:

Built-in Variables

Related Recommendations

Knowledge Sharing

This document was translated from Chinese by AI and has not yet been reviewed.

AI Concepts

What are tokens?

Tokens are the basic units that AI models use to process text. You can think of them as the smallest unit of "thought" for the model. They are not exactly equivalent to characters or words as we understand them, but rather a special way the model segments text.

1. Chinese Tokenization

  • A Chinese character is usually encoded as 1-2 tokens.

  • For example: "你好" ≈ 2-4 tokens

2. English Tokenization

  • Common words are usually 1 token.

  • Longer or less common words are broken down into multiple tokens.

  • For example:

    • "hello" = 1 token

    • "indescribable" = 4 tokens

3. Special Characters

  • Spaces, punctuation marks, etc., also consume tokens.

  • A newline character is usually 1 token.

The tokenizers of different service providers are not the same, and even the tokenizers of different models from the same provider can vary. This information is only intended to clarify the concept of a token.


What is a Tokenizer?

A Tokenizer is the tool an AI model uses to convert text into tokens. It determines how to split the input text into the smallest units that the model can understand.

Why do different models have different Tokenizers?

1. Different Training Data

  • Different corpora lead to different optimization directions.

  • Varying degrees of multilingual support.

  • Specialized optimizations for specific domains (e.g., medical, legal).

2. Different Tokenization Algorithms

  • BPE (Byte Pair Encoding) - OpenAI GPT series

  • WordPiece - Google BERT

  • SentencePiece - Suitable for multilingual scenarios

3. Different Optimization Goals

  • Some focus on compression efficiency.

  • Some focus on semantic preservation.

  • Some focus on processing speed.

Practical Impact

The same text may have a different number of tokens in different models:


What is an Embedding Model?

Basic Concept: An embedding model is a technique that converts high-dimensional discrete data (text, images, etc.) into low-dimensional continuous vectors. This transformation allows machines to better understand and process complex data. Imagine it as simplifying a complex puzzle into a simple coordinate point that still retains the key features of the puzzle. In the large model ecosystem, it acts as a "translator," converting human-understandable information into a numerical form that AI can compute.

How it Works: Taking natural language processing as an example, an embedding model can map words to specific positions in a vector space. In this space, words with similar meanings will automatically cluster together. For example:

  • The vectors for "king" and "queen" will be very close.

  • Pet-related words like "cat" and "dog" will also be near each other.

  • Words with unrelated meanings, like "car" and "bread," will be far apart.

Main Application Scenarios:

  • Text analysis: document classification, sentiment analysis

  • Recommendation systems: personalized content recommendations

  • Image processing: similar image retrieval

  • Search engines: semantic search optimization

Core Advantages:

  1. Dimensionality Reduction: Simplifies complex data into easy-to-process vector form.

  2. Semantic Preservation: Retains key semantic information from the original data.

  3. Computational Efficiency: Significantly improves the training and inference efficiency of machine learning models.

Technical Value: Embedding models are fundamental components of modern AI systems. They provide high-quality data representations for machine learning tasks and are a key technology driving progress in fields like natural language processing and computer vision.


How Embedding Models Work in Knowledge Retrieval

Basic Workflow:

  1. Knowledge Base Preprocessing Stage

    • Split documents into appropriately sized chunks.

    • Use an embedding model to convert each chunk into a vector.

    • Store the vectors and the original text in a vector database.

  2. Query Processing Stage

    • Convert the user's question into a vector.

    • Retrieve similar content from the vector database.

    • Provide the retrieved relevant content to the LLM as context.


What is MCP (Model Context Protocol)?

MCP is an open-source protocol designed to provide contextual information to Large Language Models (LLMs) in a standardized way.

  • Analogy: You can think of MCP as the "USB drive" of the AI world. We know that a USB drive can store various files and be used directly after being plugged into a computer. Similarly, various "plugins" that provide context can be "plugged" into an MCP Server. An LLM can request these plugins from the MCP Server as needed to obtain richer contextual information and enhance its capabilities.

  • Comparison with Function Tools: Traditional Function Tools can also provide external functionalities for LLMs, but MCP is more like a higher-dimensional abstraction. A Function Tool is more of a tool for specific tasks, whereas MCP provides a more general, modular mechanism for acquiring context.

Core Advantages of MCP

  1. Standardization: MCP provides a unified interface and data format, allowing different LLMs and context providers to collaborate seamlessly.

  2. Modularity: MCP allows developers to break down contextual information into independent modules (plugins), making them easier to manage and reuse.

  3. Flexibility: LLMs can dynamically select the required context plugins based on their needs, enabling more intelligent and personalized interactions.

  4. Extensibility: MCP's design supports the future addition of more types of context plugins, offering limitless possibilities for expanding the capabilities of LLMs.


Knowledge Base Document Preprocessing

This document was translated from Chinese by AI and has not yet been reviewed.

Knowledge Base Document Preprocessing

Knowledge base document preprocessing requires upgrading Cherry Studio to v1.5.0 or higher.

Configure OCR Service Provider

After clicking 'Get API KEY', the application URL will open in your browser. Click 'Apply Now', fill out the form to get the API KEY, and then enter it into the API KEY field.

Configure Knowledge Base Document Preprocessing

Configure the created knowledge base as shown above to complete the knowledge base document preprocessing setup.

Upload Documents

You can check the knowledge base results by using the search in the upper right corner.

Use in Conversation

Knowledge Base Tips: When using a more capable model, you can change the knowledge base search mode to intent recognition. Intent recognition can describe your questions more accurately and broadly.

Enable Knowledge Base Intent Recognition

Enter the command to install the MCP server
MCP server configuration interface
Official documentation cURL example
Click the plus sign to create a new integration
Fill in the integration information
Click to copy the secret
Paste the secret into the data settings
Create a new page and select the database type
Enter the page name and choose to connect to an integration
Copy the database ID
Enter the database ID and click Check
Fill in the Page Title Field Name
Export to Notion
View the exported result
Click the globe icon - enable web search
Indicates - web search is enabled
Pop-up window, click: Go to Settings
Click to get the API key
Copy the key
Paste the key, and you're all set
Authorization Steps (skip if already authorized)
  1. After entering the link page from step (2), follow the prompts to go to the authorization page (Click IAM Sub-user → New Delegation → Normal User).

  1. After clicking create, return to the link page from step (2).

  2. You will be prompted with 'Insufficient access permissions'. Click the "Click here" in the prompt.

  3. Append existing authorizations and confirm.

Note: This method is suitable for beginners. You don't need to read too much content, just click according to the prompts. If you can successfully authorize in one go using your own method, feel free to do so.

You can also synchronize data across multiple devices by following the process: Computer A →Backup\xrightarrow{\text{Backup}}Backup​ WebDAV →Restore\xrightarrow{\text{Restore}}Restore​ Computer B.

Sign up for an account
Many users get stuck on this step and feel overwhelmed... Don't panic
WeChat Mini Program - Search - Click to open
After clicking, scan the QR code from the Tavily page
You will get a string of numbers
Copy it to the Tavily page
It will prompt you to copy the code to a safe place. It's best to follow the advice, although you might not use it often.

Install vLLM: Install vLLM by following the official vLLM documentation ().

For specific steps, please refer to:

For more theme variables, please refer to the source code:

Cherry Studio Theme Library:

Share some Chinese-style Cherry Studio theme skins:

VolcEngine Web Search Integration
Tavily Web Search Login & Registration Tutorial
pip install vllm # If you use pip
uv pip install vllm # If you use uv
python -m vllm.entrypoints.openai.api_server --model gpt2
vllm --model gpt2 --served-model-name gpt2
PS D:\CherryStudio> dir


    Directory: D:\CherryStudio


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d-----         2025/4/18     14:05                user-data-dir
-a----         2025/4/14     23:05       94987175 Cherry-Studio-1.2.4-x64-portable.exe
-a----         2025/4/18     14:05            701 init_cherry_studio.bat
@title CherryStudio Initialization
@echo off

set current_path_dir=%~dp0
@echo Current Path:%current_path_dir%
set user_data_dir=%current_path_dir%user-data-dir
@echo CherryStudio Data Path:%user_data_dir%

@echo Searching for Cherry-Studio-*-portable.exe in the current path
setlocal enabledelayedexpansion

for /f "delims=" %%F in ('dir /b /a-d "Cherry-Studio-*-portable*.exe" 2^>nul') do ( #This code is adapted for versions downloaded from GitHub and the official website. Please modify it for other versions.
    set "target_file=!cd!\%%F"
    goto :break
)
:break
if defined target_file (
    echo Found file: %target_file%
) else (
    echo No matching file found, exiting the script
    pause
    exit
)

@echo Press any key to continue
pause

@echo Starting CherryStudio
start %target_file% --user-data-dir="%user_data_dir%"

@echo Operation finished
@echo on
exit
PS D:\CherryStudio> dir .\user-data-dir\


    Directory: D:\CherryStudio\user-data-dir


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d-----         2025/4/18     14:29                blob_storage
d-----         2025/4/18     14:07                Cache
d-----         2025/4/18     14:07                Code Cache
d-----         2025/4/18     14:07                Data
d-----         2025/4/18     14:07                DawnGraphiteCache
d-----         2025/4/18     14:07                DawnWebGPUCache
d-----         2025/4/18     14:07                Dictionaries
d-----         2025/4/18     14:07                GPUCache
d-----         2025/4/18     14:07                IndexedDB
d-----         2025/4/18     14:07                Local Storage
d-----         2025/4/18     14:07                logs
d-----         2025/4/18     14:30                Network
d-----         2025/4/18     14:07                Partitions
d-----         2025/4/18     14:29                Session Storage
d-----         2025/4/18     14:07                Shared Dictionary
d-----         2025/4/18     14:07                WebStorage
-a----         2025/4/18     14:07             36 .updaterId
-a----         2025/4/18     14:29             20 config.json
-a----         2025/4/18     14:07            434 Local State
-a----         2025/4/18     14:29             57 Preferences
-a----         2025/4/18     14:09           4096 SharedStorage
-a----         2025/4/18     14:30            140 window-state.json
:root {
  --color-background: #1a462788;
  --color-background-soft: #1a4627aa;
  --color-background-mute: #1a462766;
  --navbar-background: #1a4627;
  --chat-background: #1a4627;
  --chat-background-user: #28b561;
  --chat-background-assistant: #1a462722;
}

#content-container {
  background-color: #2e5d3a !important;
}
:root {
  font-family: "HanYiTangMeiRen" !important; /* Font */
}

/* Font color for expanded deep thinking section */
.ant-collapse-content-box .markdown {
  color: red;
}

/* Theme Variables */
:root {
  --color-black-soft: #2a2b2a; /* Dark background color */
  --color-white-soft: #f8f7f2; /* Light background color */
}

/* Dark Theme */
body[theme-mode="dark"] {
  /* Colors */
  --color-background: #2b2b2b; /* Dark background color */
  --color-background-soft: #303030; /* Light background color */
  --color-background-mute: #282c34; /* Neutral background color */
  --navbar-background: var(-–color-black-soft); /* Navbar background color */
  --chat-background: var(–-color-black-soft); /* Chat background color */
  --chat-background-user: #323332; /* User chat background color */
  --chat-background-assistant: #2d2e2d; /* Assistant chat background color */
}

/* Dark Theme Specific Styles */
body[theme-mode="dark"] {
  #content-container {
    background-color: var(-–chat-background-assistant) !important; /* Content container background color */
  }

  #content-container #messages {
    background-color: var(-–chat-background-assistant); /* Messages background color */
  }

  .inputbar-container {
    background-color: #3d3d3a; /* Input bar background color */
    border: 1px solid #5e5d5940; /* Input bar border color */
    border-radius: 8px; /* Input bar border radius */
  }

  /* Code Style */
  code {
    background-color: #e5e5e20d; /* Code background color */
    color: #ea928a; /* Code text color */
  }

  pre code {
    color: #abb2bf; /* Preformatted code text color */
  }
}

/* Light Theme */
body[theme-mode="light"] {
  /* Colors */
  --color-white: #ffffff; /* White */
  --color-background: #ebe8e2; /* Light background color */
  --color-background-soft: #cbc7be; /* Light background color */
  --color-background-mute: #e4e1d7; /* Neutral background color  */
  --navbar-background: var(-–color-white-soft); /* Navbar background color */
  --chat-background: var(-–color-white-soft); /* Chat background color */
  --chat-background-user: #f8f7f2; /* User chat background color */
  --chat-background-assistant: #f6f4ec; /* Assistant chat background color */
}

/* Light Theme Specific Styles */
body[theme-mode="light"] {
  #content-container {
    background-color: var(-–chat-background-assistant) !important; /* Content container background color */
  }

  #content-container #messages {
    background-color: var(-–chat-background-assistant); /* Messages background color */
  }

  .inputbar-container {
    background-color: #ffffff; /* Input bar background color */
    border: 1px solid #87867f40; /* Input bar border color */
    border-radius: 8px; /* Input bar border radius, change to your preferred size */
  }

  /* Code Style */
  code {
    background-color: #3d39290d; /* Code background color */
    color: #7c1b13; /* Code text color */
  }

  pre code {
    color: #000000; /* Preformatted code text color */
  }
}
Input: "Hello, world!"
GPT-3: 4 tokens
BERT: 3 tokens
Claude: 3 tokens
https://docs.vllm.ai/en/latest/getting_started/quickstart.html
https://github.com/CherryHQ/cherry-studio/issues/621#issuecomment-2588652880
https://github.com/CherryHQ/cherry-studio/tree/main/src/renderer/src/assets/styles
https://github.com/boilcy/cherrycss
https://linux.do/t/topic/325119/129

Cherry Studio Commercial License Agreement

This document was translated from Chinese by AI and has not yet been reviewed.

Cherry Studio License Agreement

By using or distributing any part or element of the Cherry Studio Materials, you will be deemed to have acknowledged and accepted the content of this Agreement, which shall become effective immediately.

I. Definitions

  1. This Cherry Studio License Agreement (hereinafter referred to as the “Agreement”) shall mean the terms and conditions for use, reproduction, distribution, and modification of the Materials as defined by this Agreement.

  2. “We” (or “Us”) shall mean Shanghai Qianhui Technology Co., Ltd.

  3. “You” (or “Your”) shall mean a natural person or legal entity exercising the rights granted by this Agreement, and/or using the Materials for any purpose and in any field of use.

  4. “Third Party” shall mean an individual or legal entity that does not have common control with either Us or You.

  5. “Cherry Studio” shall mean this software suite, including but not limited to [e.g., core libraries, editors, plugins, sample projects], as well as source code, documentation, sample code, and other elements of the foregoing distributed by Us. (Please describe in detail according to the actual composition of Cherry Studio)

  6. “Materials” shall collectively refer to the proprietary Cherry Studio and documentation (and any part thereof) of Shanghai Qianhui Technology Co., Ltd., provided under this Agreement.

  7. “Source” form shall mean the preferred form for making modifications, including but not limited to source code, documentation source files, and configuration files.

  8. “Object” form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.

  9. “Commercial Use” means for the purpose of direct or indirect commercial gain or commercial advantage, including but not limited to sales, licensing, subscriptions, advertising, marketing, training, consulting services, etc.

  10. “Modification” means any change, adjustment, derivation, or secondary development of the Source form of the Materials, including but not limited to modifying the application name, logo, code, functionality, interface, etc.

II. Grant of Rights

  1. Free Commercial Use (Limited to Unmodified Code): We hereby grant You a non-exclusive, worldwide, non-transferable, royalty-free license, under the intellectual property or other rights owned by Us or embodied in the Materials, to use, reproduce, distribute, copy, and distribute the unmodified Materials, including for Commercial Use, subject to the terms and conditions of this Agreement.

  2. Commercial License (When Required): When the conditions described in Section III “Commercial License” are met, you must obtain an explicit written commercial license from Us to exercise the rights under this Agreement.

III. Commercial License

In any of the following situations, you must contact Us and obtain an explicit written commercial license before you can continue to use the Cherry Studio Materials:

  1. Modification and Derivation: You modify the Cherry Studio Materials or develop derivatives based on them (including but not limited to modifying the application name, logo, code, functionality, interface, etc.).

  2. Enterprise Services: Providing services based on Cherry Studio within your enterprise or to enterprise customers, where such service supports 10 or more cumulative users.

  3. Hardware Bundling: You pre-install or integrate Cherry Studio into hardware devices or products for bundled sales.

  4. Large-Scale Procurement by Government or Educational Institutions: Your use case is part of a large-scale procurement project by a government or educational institution, especially when it involves sensitive requirements such as security and data privacy.

  5. Public-Facing Cloud Services: Providing public-facing cloud services based on Cherry Studio.

IV. Redistribution

You may distribute copies of the unmodified Materials, or provide them as part of a product or service that includes the unmodified Materials, in Source or Object form, provided that You meet the following conditions:

  1. You must provide a copy of this Agreement to any other recipient of the Materials;

  2. You must, in all copies of the Materials that you distribute, retain the following attribution notice and place it in a “NOTICE” or similar text file distributed as part of such copies: `"Cherry Studio is licensed under the Cherry Studio LICENSE AGREEMENT, Copyright (c) 上海千彗科技有限公司. All Rights Reserved."` (Cherry Studio is licensed under the Cherry Studio License Agreement, Copyright (c) 上海千彗科技有限公司. All rights reserved.)

V. Rules of Use

  1. The Materials may be subject to export controls or restrictions. You shall comply with applicable laws and regulations when using the Materials.

  2. If You use the Materials or any of their outputs or results to create, train, fine-tune, or improve software or models that will be distributed or provided, We encourage You to prominently display “Built with Cherry Studio” or “Powered by Cherry Studio” in the relevant product documentation.

VI. Intellectual Property

  1. We retain all intellectual property rights in and to the Materials and derivative works made by or for Us. Subject to the terms and conditions of this Agreement, the ownership of intellectual property rights for modifications and derivative works of the Materials made by You will be stipulated in a specific commercial license agreement. Without obtaining a commercial license, You do not own the rights to your modifications and derivative works of the Materials, and their intellectual property rights remain with Us.

  2. No trademark license is granted to use Our trade names, trademarks, service marks, or product names, except as required for reasonable and customary use in describing and redistributing the Materials or as required to fulfill the notice obligations under this Agreement.

  3. If You initiate a lawsuit or other legal proceeding (including a counterclaim or cross-claim in a lawsuit) against Us or any entity, alleging that the Materials or any of its outputs, or any portion of the foregoing, infringes any intellectual property or other rights owned or licensable by You, then all licenses granted to You under this Agreement shall terminate as of the date such lawsuit or other legal proceeding is initiated or filed.

VII. Disclaimer and Limitation of Liability

  1. We have no obligation to support, update, provide training for, or develop any further versions of the Cherry Studio Materials, nor to grant any related licenses.

  2. The Materials are provided "as is" without any warranty of any kind, either express or implied, including warranties of merchantability, non-infringement, or fitness for a particular purpose. We make no warranty and assume no responsibility for the security or stability of the Materials and their outputs.

  3. In no event shall We be liable to You for any damages, including but not limited to any direct, indirect, special, or consequential damages, arising out of your use or inability to use the Materials or any of their outputs, however caused.

  4. You will defend, indemnify, and hold Us harmless from any claims by any third party arising out of or related to your use or distribution of the Materials.

VIII. Survival and Termination

  1. The term of this Agreement shall commence upon your acceptance of this Agreement or your access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions of this Agreement.

  2. We may terminate this Agreement if You breach any of its terms or conditions. Upon termination of this Agreement, You must cease using the Materials. Section VII, Section IX, and "II. Contributor Agreement" shall survive the termination of this Agreement.

IX. Applicable Law and Jurisdiction

  1. This Agreement and any dispute arising from or related to this Agreement shall be governed by the laws of China.

  2. The Shanghai People's Court shall have exclusive jurisdiction over any dispute arising from this Agreement.

Obsidian Configuration Tutorial

数据设置→Obsidian配置

This document was translated from Chinese by AI and has not yet been reviewed.

Obsidian Configuration Tutorial

Cherry Studio supports integration with Obsidian, allowing you to export entire conversations or single messages to your Obsidian vault.

This process does not require installing any additional Obsidian plugins. However, since Cherry Studio's import mechanism is similar to the Obsidian Web Clipper, it's recommended to upgrade Obsidian to the latest version (at least greater than 1.7.2) to avoid import failures with long conversations.

Latest Tutorial

Compared to the old export feature, the new version can automatically select the vault path, so you no longer need to manually enter the vault name and folder name.

Step 1: Configure Cherry Studio

Open Cherry Studio's Settings → Data Settings → Obsidian Configuration menu. The dropdown will automatically list the Obsidian vaults that have been opened on your machine. Select your target Obsidian vault:

Step 2: Export Conversation

Exporting an Entire Conversation

Go back to the conversation interface in Cherry Studio, right-click on the conversation, select Export, and click Export to Obsidian:

A window will pop up, allowing you to adjust the Properties, the folder location in Obsidian, and the handling method for the exported note:

  • Vault: Click the dropdown menu to select other Obsidian vaults

  • Path: Click the dropdown menu to select the folder where the exported note will be stored

  • As Obsidian note properties (Properties):

    • Tags (tags)

    • Creation time (created)

    • Source (source)

  • There are three handling methods for exporting to Obsidian:

  • There are three handling methods for exporting to Obsidian:

    • Create new (overwrite if exists): Creates a new note in the folder specified in the Path. If a note with the same name already exists, it will be overwritten.

    • Prepend: If a note with the same name exists, the selected conversation content will be prepended to the beginning of that note.

    • Append: If a note with the same name exists, the selected conversation content will be appended to the end of that note.

Only the first method will include Properties; the other two methods will not.

After selecting all options, click OK to export the entire conversation to the corresponding folder in the specified Obsidian vault.

Exporting a Single Message

To export a single message, click the three-bar menu below the message, select Export, and click Export to Obsidian:

A window similar to the one for exporting an entire conversation will appear, asking you to configure the note properties and handling method. Follow the tutorial above to complete the process.

Export Successful

🎉 Congratulations! You have now completed all the configurations for integrating Cherry Studio with Obsidian and have gone through the entire export process. Enjoy!


Old Tutorial (for Cherry Studio < v1.1.13)

Step 1: Prepare Obsidian

Open your Obsidian vault and create a folder to save the exported conversations (the example in the image uses a folder named "Cherry Studio"):

Take note of the text in the bottom-left corner; this is your vault name.

Step 2: Configure Cherry Studio

In Cherry Studio's Settings → Data Settings → Obsidian Configuration menu, enter the vault name and folder name you noted in Step 1:

The Global Tags field is optional. You can set tags that will be applied to all exported conversations in Obsidian. Fill it in as needed.

Step 3: Export Conversation

Exporting an Entire Conversation

Go back to the conversation interface in Cherry Studio, right-click on the conversation, select Export, and click Export to Obsidian.

A window will pop up, allowing you to adjust the Properties for the exported note and the handling method. There are three handling methods for exporting to Obsidian:

  • Create new (overwrite if exists): Creates a new note in the folder you specified in Step 2. If a note with the same name already exists, it will be overwritten.

  • Prepend: If a note with the same name exists, the selected conversation content will be prepended to the beginning of that note.

  • Append: If a note with the same name exists, the selected conversation content will be appended to the end of that note.

Only the first method will include Properties; the other two methods will not.

Exporting a Single Message

To export a single message, click the three-bar menu below the message, select Export, and click Export to Obsidian.

A window similar to the one for exporting an entire conversation will appear, asking you to configure the note properties and handling method. Follow the tutorial above to complete the process.

Export Successful

🎉 Congratulations! You have now completed all the configurations for integrating Cherry Studio with Obsidian and have gone through the entire export process. Enjoy!

Google Gemini

This document was translated from Chinese by AI and has not yet been reviewed.

Google Gemini

Get API Key

  • Before obtaining a Gemini API key, you need to have a Google Cloud project (if you already have one, you can skip this process).

  • Go to Google Cloud to create a project, fill in the project name, and click Create Project.

  • On the official API Key page, click Create API key.

  • Copy the generated key and open the Provider Settings in CherryStudio.

  • Find the Gemini provider and paste the key you just obtained.

  • Click Manage or Add at the bottom, add the supported models, and enable the provider switch in the top right corner to start using it.

  • In China, except for Taiwan, Google Gemini services cannot be used directly. You need to resolve the proxy issue on your own.

Font Recommendations

This document was translated from Chinese by AI and has not yet been reviewed.

Recommended Fonts

Custom CSS
Configure note properties
Select path
Select handling method
Export single message
Export to Obsidian
View the exported result
Export entire conversation
Configure note properties
Export single message
Export to Obsidian
View the exported result

Monaspace

English Font Commercial Use

GitHub has launched an open-source font family called Monaspace, which offers five styles: Neon (modern), Argon (humanist), Xenon (serif), Radon (handwriting), and Krypton (mechanical).

MiSans Global

Multilingual Commercial Use

MiSans Global is a global font customization project led by Xiaomi, in collaboration with Monotype and Hanyi Fonts.

This is a vast font family, covering over 20 writing systems and supporting more than 600 languages.

Cherry Studio - 全能的AI助手Cherry Studio
Cherry Studio - 全能的AI助手Cherry Studio

Translation

This document was translated from Chinese by AI and has not yet been reviewed.

Translation

Cherry Studio's translation feature provides you with fast and accurate text translation services, supporting mutual translation between multiple languages.

Interface Overview

The translation interface mainly consists of the following parts:

  1. Source Language Selection Area:

    • Any Language: Cherry Studio will automatically detect the source language and translate it.

  2. Target Language Selection Area:

    • Dropdown Menu: Select the language you want to translate the text into.

  3. Settings Button:

    • Clicking it will take you to the Default Model Settings.

  4. Scroll Sync:

    • Click to toggle scroll sync (scrolling on one side will also scroll the other side).

  5. Text Input Box (Left):

    • Enter or paste the text you need to translate.

  6. Translation Result Box (Right):

    • Displays the translated text.

    • Copy Button: Click the button to copy the translation result to the clipboard.

  7. Translate Button:

    • Click this button to start the translation.

  8. Translation History (Top Left):

    • Click to view the translation history.

How to Use

  1. Select the Target Language:

    • In the target language selection area, choose the language you want to translate into.

  2. Enter or Paste Text:

    • Enter or paste the text you want to translate into the text input box on the left.

  3. Start Translation:

    • Click the Translate button.

  4. View and Copy the Result:

    • The translation result will be displayed in the result box on the right.

    • Click the copy button to copy the translation result to the clipboard.

Frequently Asked Questions (FAQ)

  • Q: What should I do if the translation is inaccurate?

    • A: While AI translation is powerful, it is not perfect. For professional fields or texts with complex context, manual proofreading is recommended. You can also try switching to different models.

  • Q: Which languages are supported?

    • A: The Cherry Studio translation feature supports various major languages. For a specific list of supported languages, please refer to the official Cherry Studio website or in-app instructions.

  • Q: Can I translate an entire file?

    • A: The current interface is primarily for text translation. For file translation, you may need to go to the Cherry Studio chat page and add the file to translate it.

  • Q: What if the translation speed is slow?

    • A: Translation speed can be affected by factors such as network connection, text length, and server load. Please ensure your network connection is stable and wait patiently.

FAQ

This document was translated from Chinese by AI and has not yet been reviewed.

Frequently Asked Questions

Common Error Codes

  • 4xx (Client Error Status Codes): Generally indicate that the request cannot be completed due to a syntax error, authorization failure, or authentication failure.

  • 5xx (Server Error Status Codes): Generally indicate a server-side error, such as the server being down, or the request processing timing out.


How to Check Console Errors

  • After clicking the Cherry Studio client window, press the shortcut key Ctrl + Shift + I (for Mac: Command + Option + I)

  • The currently active window must be the Cherry Studio client window to open the console;

  • You need to open the console first, and then click test or initiate a conversation or other requests to collect request information.

  • In the pop-up console window, click Network → click to view the last item in section ② marked with a red ×, which will be completions (for errors in conversations, translations, model connectivity checks, etc.) or generations (for errors in painting) → click Response to view the full returned content (area ④ in the figure).

This inspection method can be used not only to obtain error information during conversations, but also during model testing, adding knowledge bases, painting, etc. In any case, you need to open the debugging window first, and then perform the request operation to obtain the request information.

The name in the Name column (② in the image above) will vary depending on the scenario

Conversation, Translation, Model Check: completions

Painting: generations

Knowledge Base Creation: embeddings


Formulas Not Rendered / Formula Rendering Errors

  • If the formula code is displayed directly instead of being rendered, check if the formula has delimiters.

Delimiter Usage

Inline formulas

  • Use single dollar signs: $formula$

  • Or use \( and \), like: \(formula\)

Block formulas

  • Use double dollar signs: $$formula$$

  • Or use \[formula\]

  • Formula rendering errors/garbled text are common when the formula contains Chinese content. Try switching the formula engine to KateX.


Unable to Create Knowledge Base / "Failed to get embedding dimensions" prompt

  1. Model status is unavailable

Confirm whether the service provider supports the model or if the model's service status is normal.

  1. A non-embedding model was used.


Model Cannot Recognize Images / Unable to Upload or Select Images

First, you need to confirm if the model supports image recognition. Cherry Studio categorizes popular models; those with a small eye icon next to their name support image recognition.

Image recognition models support uploading image files. If the model's functionality is not correctly matched, you can find the model in the corresponding service provider's model list, click the settings button after its name, and check the image option.

For specific model information, you can check the details from the corresponding service provider. Similar to embedding models, models that do not support vision do not need to have the image function forced on; checking the image option will have no effect.

Ollama

This document was translated from Chinese by AI and has not yet been reviewed.

Ollama

Ollama is an excellent open-source tool that allows you to easily run and manage various Large Language Models (LLMs) locally. Cherry Studio now supports Ollama integration, enabling you to interact directly with locally deployed LLMs in a familiar interface, without relying on cloud services!

What is Ollama?

Ollama is a tool that simplifies the deployment and use of Large Language Models (LLMs). It has the following features:

  • Local Execution: Models run entirely on your local computer, without needing an internet connection, protecting your privacy and data security.

  • Easy to Use: Download, run, and manage various LLMs with simple command-line instructions.

  • Rich Model Library: Supports many popular open-source models like Llama 2, Deepseek, Mistral, and Gemma.

  • Cross-Platform: Supports macOS, Windows, and Linux systems.

  • Open API: Supports an OpenAI-compatible interface, allowing integration with other tools.

Why Use Ollama in Cherry Studio?

  • No Cloud Services Needed: No longer limited by cloud API quotas and fees. Enjoy the full power of local LLMs.

  • Data Privacy: All your conversation data remains on your local machine, eliminating concerns about privacy leaks.

  • Offline Availability: Continue interacting with LLMs even without an internet connection.

  • Customization: Choose and configure the LLMs that best suit your needs.

Configuring Ollama in Cherry Studio

1. Install and Run Ollama

First, you need to install and run Ollama on your computer. Follow these steps:

  • Install Ollama: Follow the installer's instructions to complete the installation.

  • Download a Model: Open your terminal (or command prompt) and use the ollama run command to download the model you want to use. For example, to download the Llama 3.2 model, run:

    Ollama will automatically download and run the model.

  • Keep Ollama Running: Ensure that Ollama remains running while you are interacting with Ollama models through Cherry Studio.

2. Add the Ollama Provider in Cherry Studio

Next, add Ollama as a custom AI provider in Cherry Studio:

  • Open Settings: In the left navigation bar of the Cherry Studio interface, click on "Settings" (the gear icon).

  • Go to Model Services: On the settings page, select the "Model Services" tab.

  • Add Provider: Click on Ollama in the list.

3. Configure the Ollama Provider

Find the newly added Ollama in the provider list and configure its details:

  1. Enable Status:

    • Ensure the switch on the far right of the Ollama provider is turned on, indicating it is enabled.

  2. API Key:

    • Ollama does not require an API key by default. You can leave this field blank or fill it with any content.

  3. API Endpoint:

    • Enter the local API address provided by Ollama. Typically, the address is:

      If you have changed the port, please modify it accordingly.

  4. Keep-Alive Time: This option sets the session keep-alive duration in minutes. If there are no new conversations within the set time, Cherry Studio will automatically disconnect from Ollama to release resources.

  5. Model Management:

    • Click the "+ Add" button to manually add the names of the models you have already downloaded in Ollama.

    • For example, if you have already downloaded the llama3.2 model using ollama run llama3.2, you can enter llama3.2 here.

    • Click the "Manage" button to edit or delete the added models.

Getting Started

Once the configuration is complete, you can select the Ollama provider and your downloaded model in the Cherry Studio chat interface to start conversing with your local LLM!

Tips and Tricks

  • First Model Run: When running a model for the first time, Ollama needs to download the model file, which may take some time. Please be patient.

  • View Available Models: Run the ollama list command in the terminal to see a list of the Ollama models you have downloaded.

  • Hardware Requirements: Running large language models requires certain computing resources (CPU, memory, GPU). Please ensure your computer's configuration meets the model's requirements.

  • Ollama Documentation: You can click the View Ollama documentation and models link on the configuration page to quickly navigate to the official Ollama documentation.

Error Code
Possible Causes
Solution

If you cannot determine the cause of the error, please send a screenshot of this interface to the for help.

Example: $$\sum_{i=1}^n x_i$$

Download Ollama: Visit the official Ollama website () and download the appropriate installer for your operating system. On Linux, you can install Ollama directly with the following command:

400

Incorrect request body format, etc.

Check the error message returned in the conversation or view the error content in the console, and follow the prompts.

[Common Case 1]: If it's a Gemini model, you may need to link a credit card; [Common Case 2]: Data size exceeds the limit, common with vision models. This error code is returned if the image size exceeds the upstream's single request traffic limit; [Common Case 3]: Added unsupported parameters or filled in parameters incorrectly. Try creating a new, clean assistant to test if it works normally; [Common Case 4]: Context exceeds the limit. Clear the context, start a new conversation, or reduce the number of context messages.

401

Authentication failed: The model is not supported or the server-side account is banned, etc.

Contact or check the status of the corresponding service provider's account.

403

No permission for the requested operation.

Perform the corresponding action based on the error message returned in the conversation or the error message in the console.

404

Cannot find the requested resource.

Check the request path, etc.

422

The request format is correct, but there is a semantic error.

The server can parse this type of error but cannot process it. Commonly occurs with JSON semantic errors (e.g., null values; a value required to be a string is written as a number or boolean, etc.).

429

Request rate has reached the limit.

The request rate (TPM or RPM) has reached the limit. Take a break and try again later.

500

Internal server error, unable to complete the request.

If it persists, contact the upstream service provider.

501

The server does not support the functionality required to fulfill the request.

502

The server, while acting as a gateway or proxy, received an invalid response from an inbound server it accessed in attempting to fulfill the request.

503

The server is temporarily unable to handle the client's request due to overload or system maintenance. The length of the delay may be included in the server's Retry-After header.

504

The server, acting as a gateway or proxy, did not receive a timely response from the upstream server.

∑i=1nxi\sum_{i=1}^n x_i∑i=1n​xi​
curl -fsSL https://ollama.com/install.sh | sh
ollama run llama3.2
http://localhost:11434/
Official Communication Group
https://ollama.com/
Example image of a successful GitHub connection

Embedding Models Reference Information

This document was translated from Chinese by AI and has not yet been reviewed.

Embedding Model Reference Information

To prevent errors, the 'max input' values for some models in this document are not set to their absolute limits. For instance, when the official documentation states a maximum input of 8k (without specifying the exact number), this document may list the reference value as 8191 or 8000. (Feel free to ignore this note if it's unclear; simply use the reference values provided in the document.)

Volcengine-Doubao

Official Model Information Reference

Name
max input

Doubao-embedding

4095

Doubao-embedding-vision

8191

Doubao-embedding-large

4095

Alibaba

Official Model Information Reference

Name
max input

text-embedding-v3

8192

text-embedding-v2

2048

text-embedding-v1

2048

text-embedding-async-v2

2048

text-embedding-async-v1

2048

OpenAI

Official Model Information Reference

Name
max input

text-embedding-3-small

8191

text-embedding-3-large

8191

text-embedding-ada-002

8191

Baidu

Official Model Information Reference

Name
max input

Embedding-V1

384

tao-8k

8192

Zhipu

Official Model Information Reference

Name
max input

embedding-2

1024

embedding-3

2048

Hunyuan

Official Model Information Reference

Name
max input

hunyuan-embedding

1024

Baichuan

Official Model Information Reference

Name
max input

Baichuan-Text-Embedding

512

together

Official Model Information Reference

Name
max input

M2-BERT-80M-2K-Retrieval

2048

M2-BERT-80M-8K-Retrieval

8192

M2-BERT-80M-32K-Retrieval

32768

UAE-Large-v1

512

BGE-Large-EN-v1.5

512

BGE-Base-EN-v1.5

512

Jina

Official Model Information Reference

Name
max input

jina-embedding-b-en-v1

512

jina-embeddings-v2-base-en

8191

jina-embeddings-v2-base-zh

8191

jina-embeddings-v2-base-de

8191

jina-embeddings-v2-base-code

8191

jina-embeddings-v2-base-es

8191

jina-colbert-v1-en

8191

jina-reranker-v1-base-en

8191

jina-reranker-v1-turbo-en

8191

jina-reranker-v1-tiny-en

8191

jina-clip-v1

8191

jina-reranker-v2-base-multilingual

8191

reader-lm-1.5b

256000

reader-lm-0.5b

256000

jina-colbert-v2

8191

jina-embeddings-v3

8191

SiliconFlow

Official Model Information Reference

Name
max input

BAAI/bge-m3

8191

netease-youdao/bce-embedding-base_v1

512

BAAI/bge-large-zh-v1.5

512

BAAI/bge-large-en-v1.5

512

Pro/BAAI/bge-m3

8191

Gemini

Official Model Information Reference

Name
max input

text-embedding-004

2048

nomic

Official Model Information Reference

Name
max input

nomic-embed-text-v1

8192

nomic-embed-text-v1.5

8192

gte-multilingual-base

8192

console

Official Model Information Reference

Name
max input

embedding-query

4000

embedding-passage

4000

cohere

Official Model Information Reference

Name
max input

embed-english-v3.0

512

embed-english-light-v3.0

512

embed-multilingual-v3.0

512

embed-multilingual-light-v3.0

512

embed-english-v2.0

512

embed-english-light-v2.0

512

embed-multilingual-v2.0

256

Example image of getting the model list via the Manage button
Example image of failing to get a Device Code
Volcano Engine Model ID list example
Example image of getting a Device Code
Example image of GitHub authorization

Chat Interface

This document was translated from Chinese by AI and has not yet been reviewed.

Chat Interface

Assistants and Topics

Assistant

An Assistant is a personalized configuration of a selected model, including settings like prompt presets and parameter presets. These settings allow the selected model to better meet your expected work requirements.

The System Default Assistant has a fairly general parameter preset (no prompt), which you can use directly or go to the Agent Page to find a preset that suits your needs.

Topic

An Assistant is a parent set to a Topic. Multiple topics (i.e., conversations) can be created under a single assistant. All Topics share the Assistant's parameter settings, preset words (prompt), and other model settings.

Buttons in the Chat Box

For technical reasons, you must manually activate the button to generate images. This button will be removed after this feature is optimized.

This feature is only for estimating token count. The actual token count varies for each model. Please refer to the data provided by the model provider.

Conversation Settings

Model Settings

Model settings are synchronized with the Model Settings parameters in the assistant settings. For details, see Assistant Settings.

In the conversation settings, only the model settings apply to the current assistant; other settings are global. For example, if you set the message style to bubbles, it will be the bubble style in any topic of any assistant.

Message Settings

Message Divider:

Separates the message body from the action bar with a divider.

Use Serif Font:

Switches the font style. You can now also change the font via custom CSS.

Show Line Numbers in Code:

Displays line numbers in code blocks when the model outputs code snippets.

Collapsible Code Blocks:

When enabled, long code snippets will be automatically collapsed.

Wrap Lines in Code Blocks:

When enabled, long single lines of code (exceeding the window) will automatically wrap.

Auto-collapse Thinking Process:

When enabled, models that support showing their thinking process will automatically collapse it after completion.

Message Style:

You can switch the chat interface to either bubble style or list style.

Code Style:

You can switch the display style of code snippets.

Math Formula Engine:

  • KaTeX renders faster as it is specifically designed for performance optimization.

  • MathJax renders slower but is more feature-complete, supporting more mathematical symbols and commands.

Message Font Size:

Adjusts the font size of the chat interface.

Input Settings

Show Estimated Token Count:

Displays the estimated token consumption of the input text in the input box (not the actual context token consumption, for reference only).

Paste Long Text as File:

When pasting a long text from another source into the input box, it will automatically be displayed as a file style to reduce interference with subsequent input.

Render Input Messages with Markdown:

When off, only the model's reply messages are rendered, not the messages you send.

Triple-press Space to Translate:

After typing a message in the chat input box, pressing the spacebar three times consecutively will translate the input into English.

Note: This action will overwrite the original text.

Target Language:

Sets the target language for the translate button in the input box and the triple-press space translation feature.

Assistant Settings

In the assistant interface, select the assistant name you want to configure → choose the corresponding setting from the right-click context menu.

Edit Assistant

Assistant settings apply to all topics under that assistant.

Prompt Settings

Name:

You can customize the assistant's name for easy identification.

Prompt:

This is the prompt. You can refer to the prompt writing style on the agent page to edit the content.

Model Settings

Default Model:

You can set a fixed default model for this assistant. When adding from the agent page or copying an assistant, the initial model will be this one. If this is not set, the initial model will be the global initial model (i.e., the Default Assistant Model).

There are two types of default models for an assistant: the Global Default Chat Model and the assistant's default model. The assistant's default model has a higher priority than the global default chat model. When the assistant's default model is not set, it equals the global default chat model.

Auto Reset Model:

When on - If you switch to another model during a conversation in a topic, creating a new topic will reset the model for the new topic to the assistant's default model. When this option is off, the model for a new topic will follow the model used in the previous topic.

For example, if the assistant's default model is gpt-3.5-turbo, and I create Topic 1 under this assistant, and during the conversation in Topic 1, I switch to gpt-4o, then:

If auto-reset is on: When creating Topic 2, the default model for Topic 2 will be gpt-3.5-turbo.

If auto-reset is off: When creating Topic 2, the default model for Topic 2 will be gpt-4o.

Temperature :

The temperature parameter controls the degree of randomness and creativity in the text generated by the model (default is 0.7). Specifically:

  • Low temperature value (0-0.3):

    • Output is more deterministic and focused

    • Suitable for scenarios requiring accuracy, like code generation and data analysis

    • Tends to select the most likely words

  • Medium temperature value (0.4-0.7):

    • Balances creativity and coherence

    • Suitable for daily conversations and general writing

    • Recommended for chatbot conversations (around 0.5)

  • High temperature value (0.8-1.0):

    • Produces more creative and diverse output

    • Suitable for creative writing, brainstorming, etc.

    • May reduce the coherence of the text

Top P (Nucleus Sampling):

The default value is 1. The smaller the value, the more monotonous and easier to understand the AI-generated content is. The larger the value, the wider and more diverse the vocabulary of the AI's response.

Nucleus sampling affects the output by controlling the probability threshold for vocabulary selection:

  • Smaller value (0.1-0.3):

    • Considers only the highest probability words

    • Output is more conservative and controllable

    • Suitable for code comments, technical documentation, etc.

  • Medium value (0.4-0.6):

    • Balances vocabulary diversity and accuracy

    • Suitable for general conversation and writing tasks

  • Larger value (0.7-1.0):

    • Considers a wider range of vocabulary choices

    • Produces richer and more diverse content

    • Suitable for creative writing and other scenarios requiring diverse expression

  • These two parameters can be used independently or in combination.

  • Choose appropriate parameter values based on the specific task type.

  • It is recommended to experiment to find the best parameter combination for a specific application scenario.

  • The above content is for reference and conceptual understanding only. The given parameter ranges may not be suitable for all models. Please refer to the parameter recommendations in the relevant model documentation.

Context Window

The number of messages to keep in the context. The larger the value, the longer the context and the more tokens are consumed:

  • 5-10: Suitable for normal conversations

  • >10: For complex tasks requiring longer memory (e.g., generating a long article step-by-step according to an outline, which requires ensuring the generated context is logically coherent)

  • Note: The more messages, the greater the token consumption

Enable Message Length Limit (MaxToken)

The maximum number of Tokens for a single response. In large language models, max tokens is a key parameter that directly affects the quality and length of the generated response.

For example: When testing if a model is connected after filling in the key in CherryStudio, you only need to know if the model returns a message correctly without specific content. In this case, setting MaxToken to 1 is sufficient.

The MaxToken limit for most models is 32k Tokens, but some have 64k or even more. You need to check the corresponding introduction page for specifics.

The specific setting depends on your needs, but you can also refer to the following suggestions.

Suggestions:

  • Normal chat: 500-800

  • Short article generation: 800-2000

  • Code generation: 2000-3600

  • Long article generation: 4000 and above (requires model support)

Generally, the model's response will be limited within the MaxToken range. However, it might be truncated (e.g., when writing long code) or the expression may be incomplete. In special cases, you need to adjust it flexibly according to the actual situation.

Streaming Output (Stream)

Streaming output is a data processing method that allows data to be transmitted and processed as a continuous stream, rather than sending all data at once. This method allows data to be processed and output immediately after it is generated, greatly improving real-time performance and efficiency.

In an environment like the CherryStudio client, it's simply a typewriter effect.

When off (non-streaming): The model outputs the entire message at once after generating it (imagine receiving a message on WeChat).

When on: Word-by-word output. You can think of it as the large model sending you each word as it generates it, until the entire message is sent.

If some special models do not support streaming output, you need to turn this switch off, such as o1-mini which initially only supported non-streaming.

Custom Parameters

Adds extra request parameters to the request body, such as presence_penalty, which are generally not needed by most users.

The parameters mentioned above like top-p, maxtokens, stream, etc., are examples of these parameters.

How to fill: Parameter Name—Parameter Type (text, number, etc.)—Value. Refer to the documentation: Click to go

Each model provider has more or less its own unique parameters. You need to find their usage methods in the provider's documentation.

  • Custom parameters have a higher priority than built-in parameters. That is, if a custom parameter conflicts with a built-in parameter, the custom parameter will override the built-in one.

For example: If you set model to gpt-4o in the custom parameters, the gpt-4o model will be used in the conversation regardless of which model is selected.

  • Using the setting Parameter Name:undefined can exclude a parameter.

New Topic Creates a new topic within the current assistant.

Upload Image or Document Uploading images requires model support. Uploaded documents will be automatically parsed into text and provided to the model as context.

Web Search Requires configuring web search-related information in the settings. The search results are returned to the large model as context. For details, see Web Search Mode.

Knowledge Base Enables the knowledge base. For details, see Knowledge Base Tutorial.

MCP Server Enables the MCP server function. For details, see MCP Usage Tutorial.

Generate Image Not displayed by default. For models that support image generation (like Gemini), you need to manually activate it to generate images.

Select Model Switches to the specified model for the subsequent conversation while retaining the context.

Quick Phrases You need to preset common phrases in the settings first. They can be invoked here and entered directly, with support for variables.

Clear Messages Deletes all content under the current topic.

Expand Makes the chat box larger for entering long texts.

Clear Context Truncates the context available to the model without deleting the content, meaning the model will "forget" the previous conversation.

Estimate Token Count Displays the estimated token count. The four data points are Current Context Count, Max Context Count (∞ means infinite context), Character Count in Current Input Box, and Estimated Token Count.

Translate Translates the content in the current input box into English.

SearXNG Local Deployment & Configuration

This document was translated from Chinese by AI and has not yet been reviewed.

SearXNG Deployment and Configuration

CherryStudio supports web searches through SearXNG. SearXNG is an open-source project that can be deployed locally or on a server, so its configuration is slightly different from other methods that require an API provider.

SearXNG Project Link: SearXNG

Advantages of SearXNG

  • Open-source and free, no API required

  • Relatively high privacy

  • Highly customizable

Local Deployment

1. Direct Deployment with Docker

Since SearXNG does not require a complex environment setup, you can deploy it without using docker compose. Simply providing an available port is sufficient. Therefore, the quickest method is to directly pull the image and deploy it using Docker.

1. Download, install, and configure docker

After installation, select a path to store images:

2. Search for and pull the SearXNG image

Enter searxng in the search bar:

Pull the image:

3. Run the image

After the pull is successful, go to the images page:

Select the pulled image and click Run:

Open the settings to configure:

Using port 8085 as an example:

After it starts successfully, click the link to open the SearXNG frontend interface:

This page indicates a successful deployment:

Server Deployment

Given that installing Docker on Windows can be quite troublesome, users can deploy SearXNG on a server, which also allows sharing it with others. Unfortunately, SearXNG itself does not currently support authentication, meaning others could scan for and abuse your deployed instance through technical means.

To address this, Cherry Studio now supports configuring HTTP Basic Authentication (RFC7617). If you plan to expose your self-deployed SearXNG to the public internet, you must configure HTTP Basic Authentication using a reverse proxy software like Nginx. The following is a brief tutorial that requires basic Linux system administration knowledge.

Deploying SearXNG

Similarly, we will still use Docker for deployment. Assuming you have already installed the latest version of Docker CE on your server following the official tutorial, here is a one-stop command for a fresh installation on a Debian system:

sudo apt update
sudo apt install git -y

# Clone the official repository
cd /opt
git clone https://github.com/searxng/searxng-docker.git
cd /opt/searxng-docker

# If your server has low bandwidth, you can set this to false
export IMAGE_PROXY=true

# Modify the configuration file
cat <<EOF > /opt/searxng-docker/searxng/settings.yml
# see https://docs.searxng.org/admin/settings/settings.html#settings-use-default-settings
use_default_settings: true
server:
  # base_url is defined in the SEARXNG_BASE_URL environment variable, see .env and docker-compose.yml
  secret_key: $(openssl rand -hex 32)
  limiter: false  # can be disabled for a private instance
  image_proxy: $IMAGE_PROXY
ui:
  static_use_hash: true
redis:
  url: redis://redis:6379/0
search:
  formats:
    - html
    - json
EOF

If you need to change the local listening port or reuse an existing local nginx, you can edit the docker-compose.yaml file. Refer to the following example:

version: "3.7"

services:
# If you don't need Caddy and want to reuse an existing local Nginx, remove the section below. We don't need Caddy by default.
  caddy:
    container_name: caddy
    image: docker.io/library/caddy:2-alpine
    network_mode: host
    restart: unless-stopped
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile:ro
      - caddy-data:/data:rw
      - caddy-config:/config:rw
    environment:
      - SEARXNG_HOSTNAME=${SEARXNG_HOSTNAME:-http://localhost}
      - SEARXNG_TLS=${LETSENCRYPT_EMAIL:-internal}
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE
    logging:
      driver: "json-file"
      options:
        max-size: "1m"
        max-file: "1"
# If you don't need Caddy and want to reuse an existing local Nginx, remove the section above. We don't need Caddy by default.
  redis:
    container_name: redis
    image: docker.io/valkey/valkey:8-alpine
    command: valkey-server --save 30 1 --loglevel warning
    restart: unless-stopped
    networks:
      - searxng
    volumes:
      - valkey-data2:/data
    cap_drop:
      - ALL
    cap_add:
      - SETGID
      - SETUID
      - DAC_OVERRIDE
    logging:
      driver: "json-file"
      options:
        max-size: "1m"
        max-file: "1"

  searxng:
    container_name: searxng
    image: docker.io/searxng/searxng:latest
    restart: unless-stopped
    networks:
      - searxng
    # By default, it maps to port 8080 on the host. If you want to listen on port 8000, change it to "127.0.0.1:8000:8080"
    ports:
      - "127.0.0.1:8080:8080"
    volumes:
      - ./searxng:/etc/searxng:rw
    environment:
      - SEARXNG_BASE_URL=https://${SEARXNG_HOSTNAME:-localhost}/
      - UWSGI_WORKERS=${SEARXNG_UWSGI_WORKERS:-4}
      - UWSGI_THREADS=${SEARXNG_UWSGI_THREADS:-4}
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - SETGID
      - SETUID
    logging:
      driver: "json-file"
      options:
        max-size: "1m"
        max-file: "1"

networks:
  searxng:

volumes:
# If you don't need Caddy and want to reuse an existing local Nginx, remove the section below
  caddy-data:
  caddy-config:
# If you don't need Caddy and want to reuse an existing local Nginx, remove the section above
  valkey-data2:

Run docker compose up -d to start. Run docker compose logs -f searxng to view the logs.

Deploying Nginx Reverse Proxy and HTTP Basic Authentication

If you are using a server control panel like Baota Panel or 1Panel, please refer to their documentation to add a website and configure the nginx reverse proxy. Then, find where to modify the nginx configuration file and make changes based on the example below:

server
{
    listen 443 ssl;

    # This line is your hostname
    server_name search.example.com;

    # index index.html;
    # root /data/www/default;

    # If you have configured SSL, you should have these two lines
    ssl_certificate    /path/to/your/cert/fullchain.pem;
    ssl_certificate_key    /path/to/your/cert/privkey.pem;

    # HSTS
    # add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload";

    # When configuring a reverse proxy through a panel, the default location block looks like this
    location / {
        # Just add the two lines below to the location block, leaving everything else as is.
        # This example assumes your configuration file is saved in the /etc/nginx/conf.d/ directory.
        # For Baota, it would likely be saved in a directory like /www, so be aware of that.
        auth_basic "Please enter your username and password";
        auth_basic_user_file /etc/nginx/conf.d/search.htpasswd;

        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_redirect off;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_protocol_addr;
        proxy_pass http://127.0.0.1:8000;
        client_max_body_size 0;
    }

    # access_log  ...;
    # error_log  ...;
}

Assuming the Nginx configuration file is saved in /etc/nginx/conf.d, we will save the password file in the same directory.

Execute the command (replace example_name and example_password with the username and password you intend to set):

echo "example_name:$(openssl passwd -5 'example_password')" > /etc/nginx/conf.d/search.htpasswd

Restart Nginx (reloading the configuration also works).

Now, try opening the webpage. You should be prompted to enter a username and password. Enter the credentials you set earlier to see if you can successfully access the SearXNG search page, thereby checking if the configuration is correct.

Cherry Studio Related Configuration

After successfully deploying SearXNG locally or on a server, the next step is to configure it in CherryStudio.

Go to the Web Search settings page and select Searxng:

If you enter the link for the local deployment directly and validation fails, don't worry:

This is because a direct deployment does not have the json return type configured by default, so data cannot be retrieved. You need to modify the configuration file.

Go back to Docker, and in the Files tab, find the tagged folder within the image:

After expanding it, scroll down further, and you will find another tagged folder:

Expand it again and find the settings.yml configuration file:

Click to open the file editor:

Find line 78. You will see that the only type is html

Add the json type, save, and restart the image

Return to Cherry Studio to validate again. Validation successful:

The address can be either local: http://localhost:<port_number> or the Docker address: http://host.docker.internal:<port_number>

If you followed the previous example to deploy on a server and correctly configured the reverse proxy, the json return type will already be enabled. After entering the address and validating, since HTTP Basic Authentication has been configured for the reverse proxy, the validation should now return a 401 error code:

Configure HTTP Basic Authentication in the client, entering the username and password you just set:

Validate, and it should succeed.

Other Configurations

At this point, SearXNG has default web search capabilities. If you need to customize the search engines, you need to configure it yourself.

Note that the preferences here do not affect the configuration when called by the large model.

To configure the search engines that the large model will use, you need to set them in the configuration file:

Language configuration reference:

If the content is too long and inconvenient to edit directly, you can copy it to a local IDE, modify it, and then paste it back into the configuration file.

Common Reasons for Validation Failure

JSON format not added to return formats

Add json to the return formats in the configuration file:

Search engine not configured correctly

Cherry Studio defaults to selecting engines whose categories include both "web" and "general" for searching. By default, engines like Google are selected, which fails in mainland China due to access restrictions. Adding the following configuration to force searxng to use the Baidu engine can solve the problem:

use_default_settings:
  engines:
    keep_only:
      - baidu
engines:
  - name: baidu
    engine: baidu 
    categories: 
      - web
      - general
    disabled: false

Access rate is too fast

The limiter setting in searxng is blocking API access. Please try setting it to false in the settings:

Common Models Reference Information

This document was translated from Chinese by AI and has not yet been reviewed.

Common Model Reference Information

  • The following information is for reference only. If there are any errors, please contact us for correction. The context size and model information may vary for different providers of some models;

  • When inputting data in the client, "k" needs to be converted to its actual numerical value (theoretically 1k=1024 tokens; 1m=1024k tokens), e.g., 8k is 8×1024=8192 tokens. It is recommended to multiply by 1000 in actual use to prevent errors, e.g., 8k as 8×1000=8000, and 1m as 1×1000000=1000000;

  • A max output of "-" indicates that no clear maximum output information for the model was found from official sources.

Model Name
Max Input
Max Output
Function Calling
Model Capabilities
Provider
Introduction

360gpt-pro

8k

-

Not Supported

Conversation

360AI_360gpt

The flagship hundred-billion-parameter large model in the 360 AI Brain series, with the best performance, widely applicable to complex task scenarios in various fields.

360gpt-turbo

7k

-

Not Supported

Conversation

360AI_360gpt

A ten-billion-parameter large model that balances performance and effectiveness, suitable for scenarios with high requirements for performance/cost.

360gpt-turbo-responsibility-8k

8k

-

Not Supported

Conversation

360AI_360gpt

A ten-billion-parameter large model that balances performance and effectiveness, suitable for scenarios with high requirements for performance/cost.

360gpt2-pro

8k

-

Not Supported

Conversation

360AI_360gpt

The flagship hundred-billion-parameter large model in the 360 AI Brain series, with the best performance, widely applicable to complex task scenarios in various fields.

claude-3-5-sonnet-20240620

200k

16k

Not Supported

Conversation, Vision

Anthropic_claude

A snapshot version released on June 20, 2024. Claude 3.5 Sonnet is a model that balances performance and speed, offering top-tier performance while maintaining high speed, and supports multimodal input.

claude-3-5-haiku-20241022

200k

16k

Not Supported

Conversation

Anthropic_claude

A snapshot version released on October 22, 2024. Claude 3.5 Haiku has improved across various skills, including coding, tool use, and reasoning. As the fastest model in the Anthropic family, it provides rapid response times, suitable for applications requiring high interactivity and low latency, such as user-facing chatbots and instant code completion. It also excels in specialized tasks like data extraction and real-time content moderation, making it a versatile tool for wide application across industries. It does not support image input.

claude-3-5-sonnet-20241022

200k

8K

Not Supported

Conversation, Vision

Anthropic_claude

A snapshot version released on October 22, 2024. Claude 3.5 Sonnet offers capabilities surpassing Opus and faster speeds than Sonnet, while maintaining the same price as Sonnet. Sonnet is particularly adept at programming, data science, visual processing, and agentic tasks.

claude-3-5-sonnet-latest

200K

8k

Not Supported

Conversation, Vision

Anthropic_claude

Dynamically points to the latest Claude 3.5 Sonnet version. Claude 3.5 Sonnet offers capabilities surpassing Opus and faster speeds than Sonnet, while maintaining the same price as Sonnet. Sonnet is particularly adept at programming, data science, visual processing, and agentic tasks. This model points to the latest version.

claude-3-haiku-20240307

200k

4k

Not Supported

Conversation, Vision

Anthropic_claude

Claude 3 Haiku is Anthropic's fastest and most compact model, designed for near-instantaneous responses. It features fast and accurate targeted performance.

claude-3-opus-20240229

200k

4k

Not Supported

Conversation, Vision

Anthropic_claude

Claude 3 Opus is Anthropic's most powerful model for handling highly complex tasks. It excels in performance, intelligence, fluency, and comprehension.

claude-3-sonnet-20240229

200k

8k

Not Supported

Conversation, Vision

Anthropic_claude

A snapshot version released on February 29, 2024. Sonnet is particularly adept at: - Coding: Can autonomously write, edit, and run code, with reasoning and troubleshooting capabilities - Data Science: Enhances human data science expertise; can process unstructured data when using multiple tools to gain insights - Visual Processing: Excels at interpreting charts, graphs, and images, accurately transcribing text to extract insights beyond the text itself - Agentic Tasks: Excellent tool use, making it ideal for handling agentic tasks (i.e., complex, multi-step problem-solving that requires interaction with other systems)

google/gemma-2-27b-it

8k

-

Not Supported

Conversation

Google_gamma

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are decoder-only large language models that support English and come with open weights, pre-trained, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning.

google/gemma-2-9b-it

8k

-

Not Supported

Conversation

Google_gamma

Gemma is one of the lightweight, state-of-the-art open model series developed by Google. It is a decoder-only large language model that supports English, with open weights, pre-trained, and instruction-tuned variants available. Gemma models are suitable for various text generation tasks, including question answering, summarization, and reasoning. This 9B model was trained on 8 trillion tokens.

gemini-1.5-pro

2m

8k

Not Supported

Conversation

Google_gemini

The latest stable version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. It is particularly suitable for tasks requiring complex reasoning.

gemini-1.0-pro-001

33k

8k

Not Supported

Conversation

Google_gemini

This is a stable version of Gemini 1.0 Pro. As an NLP model, it specializes in tasks like multi-turn text and code chat, as well as code generation. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.

gemini-1.0-pro-002

32k

8k

Not Supported

Conversation

Google_gemini

This is a stable version of Gemini 1.0 Pro. As an NLP model, it specializes in tasks like multi-turn text and code chat, as well as code generation. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.

gemini-1.0-pro-latest

33k

8k

Not Supported

Conversation, Deprecated or soon to be deprecated

Google_gemini

This is the latest version of Gemini 1.0 Pro. As an NLP model, it specializes in tasks like multi-turn text and code chat, as well as code generation. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.

gemini-1.0-pro-vision-001

16k

2k

Not Supported

Conversation

Google_gemini

This is the vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.

gemini-1.0-pro-vision-latest

16k

2k

Not Supported

Vision

Google_gemini

This is the latest vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.

gemini-1.5-flash

1m

8k

Not Supported

Conversation, Vision

Google_gemini

This is the latest stable version of Gemini 1.5 Flash. As a balanced multimodal model, it can process audio, image, video, and text inputs.

gemini-1.5-flash-001

1m

8k

Not Supported

Conversation, Vision

Google_gemini

This is a stable version of Gemini 1.5 Flash. It offers the same basic features as gemini-1.5-flash but is version-pinned, making it suitable for production environments.

gemini-1.5-flash-002

1m

8k

Not Supported

Conversation, Vision

Google_gemini

This is a stable version of Gemini 1.5 Flash. It offers the same basic features as gemini-1.5-flash but is version-pinned, making it suitable for production environments.

gemini-1.5-flash-8b

1m

8k

Not Supported

Conversation, Vision

Google_gemini

Gemini 1.5 Flash-8B is Google's latest multimodal AI model, designed for efficient handling of large-scale tasks. With 8 billion parameters, the model supports text, image, audio, and video inputs, making it suitable for various application scenarios such as chat, transcription, and translation. Compared to other Gemini models, Flash-8B is optimized for speed and cost-effectiveness, especially for cost-sensitive users. Its rate limit is doubled, allowing developers to handle large-scale tasks more efficiently. Additionally, Flash-8B uses "knowledge distillation" technology to extract key knowledge from larger models, ensuring it is lightweight and efficient while retaining core capabilities.

gemini-1.5-flash-exp-0827

1m

8k

Not Supported

Conversation, Vision

Google_gemini

This is an experimental version of Gemini 1.5 Flash, which is regularly updated with the latest improvements. It is suitable for exploratory testing and prototyping, but not recommended for production environments.

gemini-1.5-flash-latest

1m

8k

Not Supported

Conversation, Vision

Google_gemini

This is the cutting-edge version of Gemini 1.5 Flash, which is regularly updated with the latest improvements. It is suitable for exploratory testing and prototyping, but not recommended for production environments.

gemini-1.5-pro-001

2m

8k

Not Supported

Conversation, Vision

Google_gemini

This is a stable version of Gemini 1.5 Pro, offering fixed model behavior and performance characteristics. It is suitable for production environments that require stability.

gemini-1.5-pro-002

2m

8k

Not Supported

Conversation, Vision

Google_gemini

This is a stable version of Gemini 1.5 Pro, offering fixed model behavior and performance characteristics. It is suitable for production environments that require stability.

gemini-1.5-pro-exp-0801

2m

8k

Not Supported

Conversation, Vision

Google_gemini

An experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. It is particularly suitable for tasks requiring complex reasoning.

gemini-1.5-pro-exp-0827

2m

8k

Not Supported

Conversation, Vision

Google_gemini

An experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. It is particularly suitable for tasks requiring complex reasoning.

gemini-1.5-pro-latest

2m

8k

Not Supported

Conversation, Vision

Google_gemini

This is the latest version of Gemini 1.5 Pro, dynamically pointing to the most recent snapshot version.

gemini-2.0-flash

1m

8k

Not Supported

Conversation, Vision

Google_gemini

Gemini 2.0 Flash is Google's latest model, featuring a faster Time to First Token (TTFT) compared to the 1.5 version, while maintaining a quality level comparable to Gemini Pro 1.5. This model shows significant improvements in multimodal understanding, coding ability, complex instruction following, and function calling, thereby providing a smoother and more powerful intelligent experience.

gemini-2.0-flash-exp

100k

8k

Supported

Conversation, Vision

Google_gemini

Gemini 2.0 Flash introduces a real-time multimodal API, improved speed and performance, enhanced quality, stronger agent capabilities, and adds image generation and voice conversion functions.

gemini-2.0-flash-lite-preview-02-05

1M

8k

Not Supported

Conversation, Vision

Google_gemini

Gemini 2.0 Flash-Lite is Google's latest cost-effective AI model, offering better quality at the same speed as 1.5 Flash. It supports a 1 million token context window and can handle multimodal tasks involving images, audio, and code. As Google's most cost-effective model currently, it uses a simplified single pricing strategy, making it particularly suitable for large-scale application scenarios that require cost control.

gemini-2.0-flash-thinking-exp

40k

8k

Not Supported

Conversation, Reasoning

Google_gemini

gemini-2.0-flash-thinking-exp is an experimental model that can generate the "thinking process" it goes through when formulating a response. Therefore, "thinking mode" responses have stronger reasoning capabilities compared to the basic Gemini 2.0 Flash model.

gemini-2.0-flash-thinking-exp-01-21

1m

64k

Not Supported

Conversation, Reasoning

Google_gemini

Gemini 2.0 Flash Thinking EXP-01-21 is Google's latest AI model, focusing on enhancing reasoning abilities and user interaction experience. The model has strong reasoning capabilities, especially in math and programming, and supports a context window of up to 1 million tokens, suitable for complex tasks and in-depth analysis scenarios. Its unique feature is the ability to generate its thinking process, improving the comprehensibility of AI thinking. It also supports native code execution, enhancing the flexibility and practicality of interactions. By optimizing algorithms, the model reduces logical contradictions, further improving the accuracy and consistency of its answers.

gemini-2.0-flash-thinking-exp-1219

40k

8k

Not Supported

Conversation, Reasoning, Vision

Google_gemini

gemini-2.0-flash-thinking-exp-1219 is an experimental model that can generate the "thinking process" it goes through when formulating a response. Therefore, "thinking mode" responses have stronger reasoning capabilities compared to the basic Gemini 2.0 Flash model.

gemini-2.0-pro-exp-01-28

2m

64k

Not Supported

Conversation, Vision

Google_gemini

Pre-announced model, not yet online.

gemini-2.0-pro-exp-02-05

2m

8k

Not Supported

Conversation, Vision

Google_gemini

Gemini 2.0 Pro Exp 02-05 is Google's latest experimental model released in February 2024, excelling in world knowledge, code generation, and long-text understanding. The model supports an ultra-long context window of 2 million tokens, capable of processing content equivalent to 2 hours of video, 22 hours of audio, over 60,000 lines of code, and more than 1.4 million words. As part of the Gemini 2.0 series, this model adopts a new Flash Thinking training strategy, significantly improving its performance and ranking high on several LLM leaderboards, demonstrating strong comprehensive capabilities.

gemini-exp-1114

8k

4k

Not Supported

Conversation, Vision

Google_gemini

This is an experimental model released on November 14, 2024, primarily focusing on quality improvements.

gemini-exp-1121

8k

4k

Not Supported

Conversation, Vision, Code

Google_gemini

This is an experimental model released on November 21, 2024, with improvements in coding, reasoning, and visual capabilities.

gemini-exp-1206

8k

4k

Not Supported

Conversation, Vision

Google_gemini

This is an experimental model released on December 6, 2024, with improvements in coding, reasoning, and visual capabilities.

gemini-exp-latest

8k

4k

Not Supported

Conversation, Vision

Google_gemini

This is an experimental model, dynamically pointing to the latest version.

gemini-pro

33k

8k

Not Supported

Conversation

Google_gemini

Same as gemini-1.0-pro, it is an alias for gemini-1.0-pro.

gemini-pro-vision

16k

2k

Not Supported

Conversation, Vision

Google_gemini

This is the vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.

grok-2

128k

-

Not Supported

Conversation

Grok_grok

A new version of the grok model released by X.ai on December 12, 2024.

grok-2-1212

128k

-

Not Supported

Conversation

Grok_grok

A new version of the grok model released by X.ai on December 12, 2024.

grok-2-latest

128k

-

Not Supported

Conversation

Grok_grok

A new version of the grok model released by X.ai on December 12, 2024.

grok-2-vision-1212

32k

-

Not Supported

Conversation, Vision

Grok_grok

The grok vision version model released by X.ai on December 12, 2024.

grok-beta

100k

-

Not Supported

Conversation

Grok_grok

Performance comparable to Grok 2, but with improved efficiency, speed, and functionality.

grok-vision-beta

8k

-

Not Supported

Conversation, Vision

Grok_grok

The latest image understanding model can process various visual information, including documents, charts, screenshots, and photos.

internlm/internlm2_5-20b-chat

32k

-

Supported

Conversation

internlm

InternLM2.5-20B-Chat is an open-source large-scale conversational model developed based on the InternLM2 architecture. With 20 billion parameters, this model excels in mathematical reasoning, surpassing comparable models like Llama3 and Gemma2-27B. InternLM2.5-20B-Chat has significantly improved tool-calling capabilities, supporting information collection from hundreds of web pages for analysis and reasoning, and possessing stronger instruction understanding, tool selection, and result reflection abilities.

meta-llama/Llama-3.2-11B-Vision-Instruct

8k

-

Not Supported

Conversation, Vision

Meta_llama

The current Llama series models can not only process text data but also image data. Some models in Llama 3.2 have added visual understanding functions. This model supports simultaneous input of text and image data, understands the image, and outputs text information.

meta-llama/Llama-3.2-3B-Instruct

32k

-

Not Supported

Conversation

Meta_llama

Meta Llama 3.2 multilingual Large Language Models (LLMs), where 1B and 3B are lightweight models that can run on edge and mobile devices. This model is the 3B version.

meta-llama/Llama-3.2-90B-Vision-Instruct

8k

-

Not Supported

Conversation, Vision

Meta_llama

The current Llama series models can not only process text data but also image data. Some models in Llama 3.2 have added visual understanding functions. This model supports simultaneous input of text and image data, understands the image, and outputs text information.

meta-llama/Llama-3.3-70B-Instruct

131k

-

Not Supported

Conversation

Meta_llama

Meta's latest 70B LLM, with performance comparable to Llama 3.1 405B.

meta-llama/Meta-Llama-3.1-405B-Instruct

32k

-

Not Supported

Conversation

Meta_llama

The Meta Llama 3.1 multilingual Large Language Model (LLM) collection is a set of pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. This model is the 405B version. The Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual conversations and outperform many available open-source and closed-source chat models on common industry benchmarks.

meta-llama/Meta-Llama-3.1-70B-Instruct

32k

-

Not Supported

Conversation

Meta_llama

Meta Llama 3.1 is a family of multilingual large language models developed by Meta, including pre-trained and instruction-tuned variants in 8B, 70B, and 405B parameter sizes. This 70B instruction-tuned model is optimized for multilingual conversation scenarios and performs excellently on several industry benchmarks. The model was trained on over 15 trillion tokens of public data and uses techniques like supervised fine-tuning and reinforcement learning with human feedback to enhance its usefulness and safety.

meta-llama/Meta-Llama-3.1-8B-Instruct

32k

-

Not Supported

Conversation

Meta_llama

The Meta Llama 3.1 multilingual Large Language Model (LLM) collection is a set of pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. This model is the 8B version. The Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual conversations and outperform many available open-source and closed-source chat models on common industry benchmarks.

abab5.5-chat

16k

-

Supported

Conversation

Minimax_abab

Chinese persona conversation scenarios.

abab5.5s-chat

8k

-

Supported

Conversation

Minimax_abab

Chinese persona conversation scenarios.

abab6.5g-chat

8k

-

Supported

Conversation

Minimax_abab

Persona conversation scenarios in English and other languages.

abab6.5s-chat

245k

-

Supported

Conversation

Minimax_abab

General scenarios.

abab6.5t-chat

8k

-

Supported

Conversation

Minimax_abab

Chinese persona conversation scenarios.

chatgpt-4o-latest

128k

16k

Not Supported

Conversation, Vision

OpenAI

The chatgpt-4o-latest model version continuously points to the GPT-4o version used in ChatGPT and is updated the fastest when there are significant changes.

gpt-4o-2024-11-20

128k

16k

Supported

Conversation

OpenAI

The latest gpt-4o snapshot version from November 20, 2024.

gpt-4o-audio-preview

128k

16k

Not Supported

Conversation

OpenAI

OpenAI's real-time voice conversation model.

gpt-4o-audio-preview-2024-10-01

128k

16k

Supported

Conversation

OpenAI

OpenAI's real-time voice conversation model.

o1

128k

32k

Not Supported

Conversation, Reasoning, Vision

OpenAI

OpenAI's new reasoning model for complex tasks that require extensive common sense. The model has a 200k context, is currently the most powerful model in the world, and supports image recognition.

o1-mini-2024-09-12

128k

64k

Not Supported

Conversation, Reasoning

OpenAI

A fixed snapshot version of o1-mini. It is smaller, faster, and 80% cheaper than o1-preview, performing well in code generation and small-context operations.

o1-preview-2024-09-12

128k

32k

Not Supported

Conversation, Reasoning

OpenAI

A fixed snapshot version of o1-preview.

gpt-3.5-turbo

16k

4k

Supported

Conversation

OpenAI_gpt-3

Based on GPT-3.5: GPT-3.5 Turbo is an improved version built on the GPT-3.5 model, developed by OpenAI. Performance Goals: Designed to improve model inference speed, processing efficiency, and resource utilization through optimized model structure and algorithms. Increased Inference Speed: Compared to GPT-3.5, GPT-3.5 Turbo typically offers faster inference speeds on the same hardware, which is particularly beneficial for applications requiring large-scale text processing. Higher Throughput: When processing a large number of requests or data, GPT-3.5 Turbo can achieve higher concurrent processing capabilities, thereby increasing overall system throughput. Optimized Resource Consumption: While maintaining performance, it may have reduced demand for hardware resources (such as memory and computing resources), which helps lower operating costs and improve system scalability. Wide Range of NLP Tasks: GPT-3.5 Turbo is suitable for a variety of natural language processing tasks, including but not limited to text generation, semantic understanding, dialogue systems, and machine translation. Developer Tools and API Support: Provides API interfaces that are easy for developers to integrate and use, supporting rapid application development and deployment.

gpt-3.5-turbo-0125

16k

4k

Supported

Conversation

OpenAI_gpt-3

An updated GPT 3.5 Turbo model with higher accuracy in responding to requested formats and a fix for a bug that caused text encoding issues for non-English language function calls. Returns a maximum of 4,096 output tokens.

gpt-3.5-turbo-0613

16k

4k

Supported

Conversation

OpenAI_gpt-3

Updated fixed snapshot version of GPT 3.5 Turbo. Now deprecated.

gpt-3.5-turbo-1106

16k

4k

Supported

Conversation

OpenAI_gpt-3

Features improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens.

gpt-3.5-turbo-16k

16k

4k

Supported

Conversation, Deprecated or soon to be deprecated

OpenAI_gpt-3

(Deprecated)

gpt-3.5-turbo-16k-0613

16k

4k

Supported

Conversation, Deprecated or soon to be deprecated

OpenAI_gpt-3

A snapshot of gpt-3.5-turbo from June 13, 2023. (Deprecated)

gpt-3.5-turbo-instruct

4k

4k

Supported

Conversation

OpenAI_gpt-3

Capabilities similar to GPT-3 era models. Compatible with the legacy Completions endpoint, not for Chat Completions.

gpt-3.5o

16k

4k

Not Supported

Conversation

OpenAI_gpt-3

Same as gpt-4o-lite.

gpt-4

8k

8k

Supported

Conversation

OpenAI_gpt-4

Currently points to gpt-4-0613.

gpt-4-0125-preview

128k

4k

Supported

Conversation

OpenAI_gpt-4

The latest GPT-4 model, designed to reduce "laziness" where the model does not complete tasks. Returns a maximum of 4,096 output tokens.

gpt-4-0314

8k

8k

Supported

Conversation

OpenAI_gpt-4

A snapshot of gpt-4 from March 14, 2023.

gpt-4-0613

8k

8k

Supported

Conversation

OpenAI_gpt-4

A snapshot of gpt-4 from June 13, 2023, with enhanced function calling support.

gpt-4-1106-preview

128k

4k

Supported

Conversation

OpenAI_gpt-4

A GPT-4 Turbo model with improved instruction following, JSON mode, reproducible outputs, function calling, and more. Returns a maximum of 4,096 output tokens. This is a preview model.

gpt-4-32k

32k

4k

Supported

Conversation

OpenAI_gpt-4

gpt-4-32k will be deprecated on 2025-06-06.

gpt-4-32k-0613

32k

4k

Supported

Conversation, Deprecated or soon to be deprecated

OpenAI_gpt-4

Will be deprecated on 2025-06-06.

gpt-4-turbo

128k

4k

Supported

Conversation

OpenAI_gpt-4

The latest version of the GPT-4 Turbo model adds vision capabilities, supporting visual requests via JSON mode and function calling. The current version of this model is gpt-4-turbo-2024-04-09.

gpt-4-turbo-2024-04-09

128k

4k

Supported

Conversation

OpenAI_gpt-4

GPT-4 Turbo model with vision capabilities. Vision requests can now be made via JSON mode and function calling. gpt-4-turbo currently points to this version.

gpt-4-turbo-preview

128k

4k

Supported

Conversation, Vision

OpenAI_gpt-4

Currently points to gpt-4-0125-preview.

gpt-4o

128k

16k

Supported

Conversation, Vision

OpenAI_gpt-4

OpenAI's highly intelligent flagship model, suitable for complex, multi-step tasks. GPT-4o is cheaper and faster than GPT-4 Turbo.

gpt-4o-2024-05-13

128k

4k

Supported

Conversation, Vision

OpenAI_gpt-4

The original gpt-4o snapshot from May 13, 2024.

gpt-4o-2024-08-06

128k

16k

Supported

Conversation, Vision

OpenAI_gpt-4

The first snapshot to support structured outputs. gpt-4o currently points to this version.

gpt-4o-mini

128k

16k

Supported

Conversation, Vision

OpenAI_gpt-4

OpenAI's affordable version of gpt-4o, suitable for fast, lightweight tasks. GPT-4o mini is cheaper and more powerful than GPT-3.5 Turbo. Currently points to gpt-4o-mini-2024-07-18.

gpt-4o-mini-2024-07-18

128k

16k

Supported

Conversation, Vision

OpenAI_gpt-4

A fixed snapshot version of gpt-4o-mini.

gpt-4o-realtime-preview

128k

4k

Supported

Conversation, Real-time Voice

OpenAI_gpt-4

OpenAI's real-time voice conversation model.

gpt-4o-realtime-preview-2024-10-01

128k

4k

Supported

Conversation, Real-time Voice, Vision

OpenAI_gpt-4

gpt-4o-realtime-preview currently points to this snapshot version.

o1-mini

128k

64k

Not Supported

Conversation, Reasoning

OpenAI_o1

Smaller, faster, and 80% cheaper than o1-preview, performing well in code generation and small-context operations.

o1-preview

128k

32k

Not Supported

Conversation, Reasoning

OpenAI_o1

o1-preview is a new reasoning model for complex tasks that require extensive common sense. The model has a 128K context and a knowledge cutoff of October 2023. It focuses on advanced reasoning and solving complex problems, including mathematical and scientific tasks. It is ideal for applications requiring deep contextual understanding and autonomous workflows.

o3-mini

200k

100k

Supported

Conversation, Reasoning

OpenAI_o1

o3-mini is OpenAI's latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on science, math, and coding tasks, supports developer features like structured output, function calling, and batch API, with a knowledge cutoff of October 2023, demonstrating a significant balance in reasoning capability and cost-effectiveness.

o3-mini-2025-01-31

200k

100k

Supported

Conversation, Reasoning

OpenAI_o1

o3-mini currently points to this version. o3-mini-2025-01-31 is OpenAI's latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on science, math, and coding tasks, supports developer features like structured output, function calling, and batch API, with a knowledge cutoff of October 2023, demonstrating a significant balance in reasoning capability and cost-effectiveness.

Baichuan2-Turbo

32k

-

Not Supported

Conversation

Baichuan_baichuan

Compared to similarly sized models in the industry, this model maintains a leading performance while significantly reducing the price.

Baichuan3-Turbo

32k

-

Not Supported

Conversation

Baichuan_baichuan

Compared to similarly sized models in the industry, this model maintains a leading performance while significantly reducing the price.

Baichuan3-Turbo-128k

128k

-

Not Supported

Conversation

Baichuan_baichuan

The Baichuan model processes complex text with a 128k ultra-long context window, is specifically optimized for industries like finance, and significantly reduces costs while maintaining high performance, providing a cost-effective solution for enterprises.

Baichuan4

32k

-

Not Supported

Conversation

Baichuan_baichuan

Baichuan's MoE model provides a highly efficient and cost-effective solution for enterprise applications through specialized optimization, cost reduction, and performance enhancement.

Baichuan4-Air

32k

-

Not Supported

Conversation

Baichuan_baichuan

Baichuan's MoE model provides a highly efficient and cost-effective solution for enterprise applications through specialized optimization, cost reduction, and performance enhancement.

Baichuan4-Turbo

32k

-

Not Supported

Conversation

Baichuan_baichuan

Trained on massive high-quality scenario data, usability in high-frequency enterprise scenarios is improved by 10%+ compared to Baichuan4, information summarization by 50%, multilingual capabilities by 31%, and content generation by 13%. Specially optimized for inference performance, the first token response speed is increased by 51% and token stream speed by 73% compared to Baichuan4.

ERNIE-3.5-128K

128k

4k

Supported

Conversation

Baidu_ernie

Baidu's self-developed flagship large language model, covering massive Chinese and English corpora, with powerful general capabilities to meet most dialogue, Q&A, creative generation, and plugin application requirements. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information.

ERNIE-3.5-8K

8k

1k

Supported

Conversation

Baidu_ernie

Baidu's self-developed flagship large language model, covering massive Chinese and English corpora, with powerful general capabilities to meet most dialogue, Q&A, creative generation, and plugin application requirements. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information.

ERNIE-3.5-8K-Preview

8k

1k

Supported

Conversation

Baidu_ernie

Baidu's self-developed flagship large language model, covering massive Chinese and English corpora, with powerful general capabilities to meet most dialogue, Q&A, creative generation, and plugin application requirements. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information.

ERNIE-4.0-8K

8k

1k

Supported

Conversation

Baidu_ernie

Baidu's self-developed flagship ultra-large-scale language model. Compared to ERNIE 3.5, it has a comprehensive upgrade in model capabilities, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information.

ERNIE-4.0-8K-Latest

8k

2k

Supported

Conversation

Baidu_ernie

ERNIE-4.0-8K-Latest has fully improved capabilities compared to ERNIE-4.0-8K, with significant enhancements in role-playing and instruction-following abilities. Compared to ERNIE 3.5, it has a comprehensive upgrade in model capabilities, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information, and supports 5K tokens input + 2K tokens output. This document introduces the method for calling the ERNIE-4.0-8K-Latest API.

ERNIE-4.0-8K-Preview

8k

1k

Supported

Conversation

Baidu_ernie

Baidu's self-developed flagship ultra-large-scale language model. Compared to ERNIE 3.5, it has a comprehensive upgrade in model capabilities, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information.

ERNIE-4.0-Turbo-128K

128k

4k

Supported

Conversation

Baidu_ernie

ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large-scale language model with outstanding overall performance, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information. It has better performance compared to ERNIE 4.0. ERNIE-4.0-Turbo-128K is a version of the model with better overall performance on long documents than ERNIE-3.5-128K. This document introduces the relevant API and its usage.

ERNIE-4.0-Turbo-8K

8k

2k

Supported

Conversation

Baidu_ernie

ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large-scale language model with outstanding overall performance, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information. It has better performance compared to ERNIE 4.0. ERNIE-4.0-Turbo-8K is a version of the model. This document introduces the relevant API and its usage.

ERNIE-4.0-Turbo-8K-Latest

8k

2k

Supported

Conversation

Baidu_ernie

ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large-scale language model with outstanding overall performance, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information. It has better performance compared to ERNIE 4.0. ERNIE-4.0-Turbo-8K is a version of the model.

ERNIE-4.0-Turbo-8K-Preview

8k

2k

Supported

Conversation

Baidu_ernie

ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large-scale language model with outstanding overall performance, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information. ERNIE-4.0-Turbo-8K-Preview is a version of the model.

ERNIE-Character-8K

8k

1k

Not Supported

Conversation

Baidu_ernie

Baidu's self-developed vertical large language model, suitable for application scenarios such as game NPCs, customer service dialogues, and dialogue role-playing. It has a more distinct and consistent persona style, stronger instruction-following ability, and better inference performance.

ERNIE-Lite-8K

8k

4k

Not Supported

Conversation

Baidu_ernie

Baidu's self-developed lightweight large language model, balancing excellent model performance with inference efficiency, suitable for inference on low-power AI accelerator cards.

ERNIE-Lite-Pro-128K

128k

2k

Supported

Conversation

Baidu_ernie

Baidu's self-developed lightweight large language model, with better performance than ERNIE Lite, balancing excellent model performance with inference efficiency, suitable for inference on low-power AI accelerator cards. ERNIE-Lite-Pro-128K supports a 128K context length and has better performance than ERNIE-Lite-128K.

ERNIE-Novel-8K

8k

2k

Not Supported

Conversation

Baidu_ernie

ERNIE-Novel-8K is Baidu's self-developed general-purpose large language model, with a significant advantage in novel continuation capabilities. It can also be used in scenarios like short dramas and movies.

ERNIE-Speed-128K

128k

4k

Not Supported

Conversation

Baidu_ernie

Baidu's latest self-developed high-performance large language model released in 2024, with excellent general capabilities. It is suitable as a base model for fine-tuning to better handle specific scenario problems, while also having excellent inference performance.

ERNIE-Speed-8K

8k

1k

Not Supported

Conversation

Baidu_ernie

Baidu's latest self-developed high-performance large language model released in 2024, with excellent general capabilities. It is suitable as a base model for fine-tuning to better handle specific scenario problems, while also having excellent inference performance.

ERNIE-Speed-Pro-128K

128k

4k

Not Supported

Conversation

Baidu_ernie

ERNIE Speed Pro is Baidu's latest self-developed high-performance large language model released in 2024, with excellent general capabilities. It is suitable as a base model for fine-tuning to better handle specific scenario problems, while also having excellent inference performance. ERNIE-Speed-Pro-128K is the initial version released on August 30, 2024, supporting a 128K context length and having better performance than ERNIE-Speed-128K.

ERNIE-Tiny-8K

8k

1k

Not Supported

Conversation

Baidu_ernie

Baidu's self-developed ultra-high-performance large language model, with the lowest deployment and fine-tuning costs in the ERNIE series.

Doubao-1.5-lite-32k

32k

12k

Supported

Conversation

Doubao_doubao

Doubao1.5-lite is also among the world's top-tier lightweight language models, matching or surpassing GPT-4o mini and Claude 3.5 Haiku on authoritative evaluation benchmarks for general knowledge (MMLU_pro), reasoning (BBH), math (MATH), and professional knowledge (GPQA).

Doubao-1.5-pro-256k

256k

12k

Supported

Conversation

Doubao_doubao

Doubao-1.5-Pro-256k, a fully upgraded version based on Doubao-1.5-Pro. Compared to Doubao-pro-256k/241115, the overall performance is significantly improved by 10%. The output length is greatly increased, supporting up to 12k tokens.

Doubao-1.5-pro-32k

32k

12k

Supported

Conversation

Doubao_doubao

Doubao-1.5-pro, a new generation flagship model with comprehensive performance upgrades, excelling in knowledge, code, reasoning, and more. It achieves world-leading performance on multiple public evaluation benchmarks, especially achieving the best scores on knowledge, code, reasoning, and Chinese authoritative benchmarks, with a composite score superior to top industry models like GPT4o and Claude 3.5 Sonnet.

Doubao-1.5-vision-pro

32k

12k

Not Supported

Conversation, Vision

Doubao_doubao

Doubao-1.5-vision-pro, a newly upgraded multimodal large model, supports image recognition of any resolution and extreme aspect ratios, enhancing visual reasoning, document recognition, detailed information understanding, and instruction-following capabilities.

Doubao-embedding

4k

-

Supported

Embedding

Doubao_doubao

Doubao-embedding is a semantic vectorization model developed by ByteDance, primarily for vector retrieval scenarios. It supports Chinese and English, with a maximum context length of 4K. The following versions are currently available: text-240715: Maximum vector dimension of 2560, supports dimensionality reduction to 512, 1024, and 2048. Chinese and English retrieval performance is significantly improved compared to the text-240515 version, and this version is recommended. text-240515: Maximum vector dimension of 2048, supports dimensionality reduction to 512 and 1024.

Doubao-embedding-large

4k

-

Not Supported

Embedding

Doubao_doubao

Chinese and English retrieval performance is significantly improved compared to the Doubao-embedding/text-240715 version.

Doubao-embedding-vision

8k

-

Not Supported

Embedding

Doubao_doubao

Doubao-embedding-vision, a newly upgraded image-text multimodal vectorization model, is primarily for image-text multi-vector retrieval scenarios. It supports image input and Chinese/English text input, with a maximum context length of 8K.

Doubao-lite-128k

128k

4k

Supported

Conversation

Doubao_doubao

Doubao-lite offers extremely fast response speeds and better cost-effectiveness, providing more flexible choices for customers in different scenarios. Supports inference and fine-tuning with a 128k context window.

Doubao-lite-32k

32k

4k

Supported

Conversation

Doubao_doubao

Doubao-lite offers extremely fast response speeds and better cost-effectiveness, providing more flexible choices for customers in different scenarios. Supports inference and fine-tuning with a 32k context window.

Doubao-lite-4k

4k

4k

Supported

Conversation

Doubao_doubao

Doubao-lite offers extremely fast response speeds and better cost-effectiveness, providing more flexible choices for customers in different scenarios. Supports inference and fine-tuning with a 4k context window.

Doubao-pro-128k

128k

4k

Supported

Conversation

Doubao_doubao

The flagship model with the best performance, suitable for handling complex tasks, with excellent results in reference Q&A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 128k context window.

Doubao-pro-32k

32k

4k

Supported

Conversation

Doubao_doubao

The flagship model with the best performance, suitable for handling complex tasks, with excellent results in reference Q&A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 32k context window.

Doubao-pro-4k

4k

4k

Supported

Conversation

Doubao_doubao

The flagship model with the best performance, suitable for handling complex tasks, with excellent results in reference Q&A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 4k context window.

step-1-128k

128k

-

Supported

Conversation

StepFun

The step-1-128k model is an ultra-large-scale language model capable of processing inputs of up to 128,000 tokens. This capability gives it a significant advantage in generating long-form content and performing complex reasoning, making it suitable for applications that require rich context, such as writing novels and scripts.

step-1-256k

256k

-

Supported

Conversation

StepFun

The step-1-256k model is one of the largest language models available, supporting inputs of 256,000 tokens. It is designed to meet extremely complex task requirements, such as large-scale data analysis and multi-turn dialogue systems, and can provide high-quality output in various domains.

step-1-32k

32k

-

Supported

Conversation

StepFun

The step-1-32k model extends the context window to support 32,000 tokens of input. This makes it perform excellently when handling long articles and complex conversations, suitable for tasks that require deep understanding and analysis, such as legal documents and academic research.

step-1-8k

8k

-

Supported

Conversation

StepFun

The step-1-8k model is an efficient language model designed for processing shorter texts. It can perform reasoning within a context of 8,000 tokens, making it suitable for application scenarios that require quick responses, such as chatbots and real-time translation.

step-1-flash

8k

-

Supported

Conversation

StepFun

The step-1-flash model focuses on rapid response and efficient processing, suitable for real-time applications. Its design allows it to provide high-quality language understanding and generation capabilities even with limited computing resources, making it suitable for mobile devices and edge computing scenarios.

step-1.5v-mini

32k

-

Supported

Conversation, Vision

StepFun

The step-1.5v-mini model is a lightweight version designed to run in resource-constrained environments. Despite its small size, it still retains good language processing capabilities, making it suitable for embedded systems and low-power devices.

step-1v-32k

32k

-

Supported

Conversation, Vision

StepFun

The step-1v-32k model supports inputs of 32,000 tokens, suitable for applications requiring longer context. It performs excellently in handling complex dialogues and long texts, making it suitable for fields such as customer service and content creation.

step-1v-8k

8k

-

Supported

Conversation, Vision

StepFun

The step-1v-8k model is an optimized version designed for 8,000-token inputs, suitable for fast generation and processing of short texts. It strikes a good balance between speed and accuracy, making it suitable for real-time applications.

step-2-16k

16k

-

Supported

Conversation

StepFun

The step-2-16k model is a medium-sized language model supporting 16,000 tokens of input. It performs well in various tasks and is suitable for application scenarios such as education, training, and knowledge management.

yi-lightning

16k

-

Supported

Conversation

01.AI_yi

The latest high-performance model, ensuring high-quality output while significantly increasing inference speed. Suitable for real-time interaction and highly complex reasoning scenarios, its extremely high cost-effectiveness can provide excellent support for commercial products.

yi-vision-v2

16K

-

Supported

Conversation, Vision

01.AI_yi

Suitable for scenarios that require analyzing and interpreting images and charts, such as image Q&A, chart understanding, OCR, visual reasoning, education, research report understanding, or multilingual document reading.

qwen-14b-chat

8k

2k

Supported

Conversation

Qwen_qwen

Alibaba Cloud's official open-source version of Tongyi Qianwen.

qwen-72b-chat

32k

2k

Supported

Conversation

Qwen_qwen

Alibaba Cloud's official open-source version of Tongyi Qianwen.

qwen-7b-chat

7.5k

1.5k

Supported

Conversation

Qwen_qwen

Alibaba Cloud's official open-source version of Tongyi Qianwen.

qwen-coder-plus

128k

8k

Supported

Conversation, Code

Qwen_qwen

Qwen-Coder-Plus is a programming-specific model in the Qwen series, designed to enhance code generation and understanding capabilities. Trained on a large scale of programming data, this model can handle multiple programming languages and supports functions like code completion, error detection, and code refactoring. Its design goal is to provide developers with more efficient programming assistance and improve development efficiency.

qwen-coder-plus-latest

128k

8k

Supported

Conversation, Code

Qwen_qwen

Qwen-Coder-Plus-Latest is the newest version of Qwen-Coder-Plus, incorporating the latest algorithm optimizations and dataset updates. This model shows significant performance improvements, enabling it to understand context more accurately and generate code that better meets developers' needs. It also introduces support for more programming languages, enhancing its multilingual programming capabilities.

qwen-coder-turbo

128k

8k

Supported

Conversation, Code

Qwen_qwen

The Tongyi Qianwen series of code and programming models are language models specifically for programming and code generation, featuring fast inference speed and low cost. This version always points to the latest stable snapshot.

qwen-coder-turbo-latest

128k

8k

Supported

Conversation, Code

Qwen_qwen

The Tongyi Qianwen series of code and programming models are language models specifically for programming and code generation, featuring fast inference speed and low cost. This version always points to the latest snapshot.

qwen-long

10m

6k

Supported

Conversation

Qwen_qwen

Qwen-Long is a large language model from Tongyi Qianwen for ultra-long context processing scenarios. It supports input in different languages such as Chinese and English, and supports ultra-long context dialogues of up to 10 million tokens (about 15 million words or 15,000 pages of documents). Combined with the synchronously launched document service, it can parse and have dialogues on various document formats such as Word, PDF, Markdown, EPUB, and MOBI. Note: For requests submitted directly via HTTP, it supports a length of 1M tokens. For lengths exceeding this, it is recommended to submit via file.

qwen-math-plus

4k

3k

Supported

Conversation

Qwen_qwen

Qwen-Math-Plus is a model focused on solving mathematical problems, designed to provide efficient mathematical reasoning and calculation capabilities. Trained on a large number of math problems, this model can handle complex mathematical expressions and problems, supporting a variety of calculation needs from basic arithmetic to higher mathematics. Its application scenarios include education, scientific research, and engineering.

qwen-math-plus-latest

4k

3k

Supported

Conversation

Qwen_qwen

Qwen-Math-Plus-Latest is the newest version of Qwen-Math-Plus, integrating the latest mathematical reasoning techniques and algorithm improvements. This model performs better in handling complex mathematical problems, providing more accurate solutions and reasoning processes. It also expands its understanding of mathematical symbols and formulas, making it suitable for a wider range of mathematical applications.

qwen-math-turbo

4k

3k

Supported

Conversation

Qwen_qwen

Qwen-Math-Turbo is a high-performance mathematical model designed for fast calculation and real-time inference. This model optimizes calculation speed, enabling it to process a large number of mathematical problems in a very short time, suitable for application scenarios that require quick feedback, such as online education and real-time data analysis. Its efficient algorithms allow users to get instant results in complex calculations.

qwen-math-turbo-latest

4k

3k

Supported

Conversation

Qwen_qwen

Qwen-Math-Turbo-Latest is the newest version of Qwen-Math-Turbo, further improving calculation efficiency and accuracy. This model has undergone multiple algorithmic optimizations, enabling it to handle more complex mathematical problems and maintain high efficiency in real-time inference. It is suitable for mathematical applications that require rapid response, such as financial analysis and scientific computing.

qwen-max

32k

8k

Supported

Conversation

Qwen_qwen

The Tongyi Qianwen 2.5 series hundred-billion-level ultra-large-scale language model supports input in different languages such as Chinese and English. As the model is upgraded, qwen-max will be updated on a rolling basis.

qwen-max-latest

32k

8k

Supported

Conversation

Qwen_qwen

The best-performing model in the Tongyi Qianwen series. This model is a dynamically updated version, and model updates will not be announced in advance. It is suitable for complex, multi-step tasks. The model's comprehensive abilities in Chinese and English are significantly improved, human preference is significantly enhanced, reasoning ability and complex instruction understanding are significantly strengthened, performance on difficult tasks is better, and math and code abilities are significantly improved. It also has enhanced understanding and generation capabilities for structured data like tables and JSON.

qwen-plus

128k

8k

Supported

Conversation

Qwen_qwen

A well-balanced model in the Tongyi Qianwen series, with inference performance and speed between Tongyi Qianwen-Max and Tongyi Qianwen-Turbo, suitable for moderately complex tasks. The model's comprehensive abilities in Chinese and English are significantly improved, human preference is significantly enhanced, reasoning ability and complex instruction understanding are significantly strengthened, performance on difficult tasks is better, and math and code abilities are significantly improved.

qwen-plus-latest

128k

8k

Supported

Conversation

Qwen_qwen

Qwen-Plus is an enhanced version of the visual language model in the Tongyi Qianwen series, designed to improve detail recognition and text recognition capabilities. This model supports images with resolutions over one million pixels and any aspect ratio, performing excellently in a wide range of visual language tasks, making it suitable for applications requiring high-precision image understanding.

qwen-turbo

128k

8k

Supported

Conversation

Qwen_qwen

The fastest and most cost-effective model in the Tongyi Qianwen series, suitable for simple tasks. The model's comprehensive abilities in Chinese and English are significantly improved, human preference is significantly enhanced, reasoning ability and complex instruction understanding are significantly strengthened, performance on difficult tasks is better, and math and code abilities are significantly improved.

qwen-turbo-latest

1m

8k

Supported

Conversation

Qwen_qwen

Qwen-Turbo is an efficient model designed for simple tasks, emphasizing speed and cost-effectiveness. It performs excellently in basic visual language tasks and is suitable for applications with strict response time requirements, such as real-time image recognition and simple Q&A systems.

qwen-vl-max

32k

2k

Supported

Conversation

Qwen_qwen

Tongyi Qianwen VL-Max (qwen-vl-max), the ultra-large-scale visual language model from Tongyi Qianwen. Compared to the enhanced version, it further improves visual reasoning and instruction-following capabilities, providing a higher level of visual perception and cognition. It offers the best performance on more complex tasks.

qwen-vl-max-latest

32k

2k

Supported

Conversation, Vision

Qwen_qwen

Qwen-VL-Max is the most advanced version in the Qwen-VL series, designed to solve complex multimodal tasks. It combines advanced visual and language processing technologies, capable of understanding and analyzing high-resolution images with extremely strong reasoning abilities, suitable for applications requiring deep understanding and complex reasoning.

qwen-vl-ocr

34k

4k

Supported

Conversation, Vision

Qwen_qwen

Only supports OCR, not conversation.

qwen-vl-ocr-latest

34k

4k

Supported

Conversation, Vision

Qwen_qwen

Only supports OCR, not conversation.

qwen-vl-plus

8k

2k

Supported

Conversation, Vision

Qwen_qwen

Tongyi Qianwen VL-Plus (qwen-vl-plus), the enhanced version of the Tongyi Qianwen large-scale visual language model. It significantly improves detail recognition and text recognition capabilities, supports images with resolutions over one million pixels and any aspect ratio. It provides excellent performance on a wide range of visual tasks.

qwen-vl-plus-latest

32k

2k

Supported

Conversation, Vision

Qwen_qwen

Qwen-VL-Plus-Latest is the newest version of Qwen-VL-Plus, enhancing the model's multimodal understanding capabilities. It excels in the combined processing of images and text, making it suitable for applications that need to efficiently handle multiple input formats, such as intelligent customer service and content generation.

Qwen/Qwen2-1.5B-Instruct

32k

6k

Not Supported

Conversation

Qwen_qwen

Qwen2-1.5B-Instruct is an instruction-tuned large language model in the Qwen2 series with a parameter size of 1.5B. Based on the Transformer architecture, the model uses SwiGLU activation functions, attention QKV biases, and group query attention. It excels in multiple benchmark tests for language understanding, generation, multilingual capabilities, coding, math, and reasoning, surpassing most open-source models.

Qwen/Qwen2-72B-Instruct

128k

6k

Not Supported

Conversation

Qwen_qwen

Qwen2-72B-Instruct is an instruction-tuned large language model in the Qwen2 series with a parameter size of 72B. Based on the Transformer architecture, the model uses SwiGLU activation functions, attention QKV biases, and group query attention. It can handle large-scale inputs. The model excels in multiple benchmark tests for language understanding, generation, multilingual capabilities, coding, math, and reasoning, surpassing most open-source models.

Qwen/Qwen2-7B-Instruct

128k

6k

Not Supported

Conversation

Qwen_qwen

Qwen2-7B-Instruct is an instruction-tuned large language model in the Qwen2 series with a parameter size of 7B. Based on the Transformer architecture, the model uses SwiGLU activation functions, attention QKV biases, and group query attention. It can handle large-scale inputs. The model excels in multiple benchmark tests for language understanding, generation, multilingual capabilities, coding, math, and reasoning, surpassing most open-source models.

Qwen/Qwen2-VL-72B-Instruct

32k

2k

Not Supported

Conversation

Qwen_qwen

Qwen2-VL is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance in visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can understand videos over 20 minutes long for high-quality video-based Q&A, dialogue, and content creation. It also has complex reasoning and decision-making capabilities, and can be integrated with mobile devices, robots, etc., for automated operations based on visual environments and text instructions.

Qwen/Qwen2-VL-7B-Instruct

32k

-

Not Supported

Conversation

Qwen_qwen

Qwen2-VL-7B-Instruct is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance in visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can be used for high-quality video-based Q&A, dialogue, and content creation, and also has complex reasoning and decision-making capabilities, and can be integrated with mobile devices, robots, etc., for automated operations based on visual environments and text instructions.

Qwen/Qwen2.5-72B-Instruct

128k

8k

Not Supported

Conversation

Qwen_qwen

Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 72B model has significantly improved capabilities in areas such as coding and mathematics. It supports inputs of up to 128K tokens and can generate long texts of over 8K tokens.

Qwen/Qwen2.5-72B-Instruct-128K

128k

8k

Not Supported

Conversation

Qwen_qwen

Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 72B model has significantly improved capabilities in areas such as coding and mathematics. It supports inputs of up to 128K tokens and can generate long texts of over 8K tokens.

Qwen/Qwen2.5-7B-Instruct

128k

8k

Not Supported

Conversation

Qwen_qwen

Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model has significantly improved capabilities in areas such as coding and mathematics. The model also provides multilingual support, covering over 29 languages, including Chinese and English. The model has significant improvements in instruction following, understanding structured data, and generating structured output (especially JSON).

Qwen/Qwen2.5-Coder-32B-Instruct

128k

8k

Not Supported

Conversation, Code

Qwen_qwen

Qwen2.5-32B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 32B model has significantly improved capabilities in areas such as coding and mathematics. The model also provides multilingual support, covering over 29 languages, including Chinese and English. The model has significant improvements in instruction following, understanding structured data, and generating structured output (especially JSON).

Qwen/Qwen2.5-Coder-7B-Instruct

128k

8k

Not Supported

Conversation

Qwen_qwen

Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model has significantly improved capabilities in areas such as coding and mathematics. The model also provides multilingual support, covering over 29 languages, including Chinese and English. The model has significant improvements in instruction following, understanding structured data, and generating structured output (especially JSON).

Qwen/QwQ-32B-Preview

32k

16k

Not Supported

Conversation, Reasoning

Qwen_qwen

QwQ-32B-Preview is an experimental research model developed by the Qwen team, aimed at enhancing the reasoning capabilities of artificial intelligence. As a preview version, it demonstrates excellent analytical abilities, but also has some important limitations: 1. Language mixing and code-switching: The model may mix languages or switch between languages unexpectedly, affecting the clarity of the response. 2. Recursive reasoning loops: The model may enter a cyclic reasoning mode, leading to lengthy answers without a clear conclusion. 3. Safety and ethical considerations: The model requires strengthened safety measures to ensure reliable and safe performance, and users should exercise caution when using it. 4. Performance and benchmark limitations: The model performs excellently in mathematics and programming, but there is still room for improvement in other areas such as common sense reasoning and nuanced language understanding.

qwen1.5-110b-chat

32k

8k

Not Supported

Conversation

Qwen_qwen

-

qwen1.5-14b-chat

8k

2k

Not Supported

Conversation

Qwen_qwen

-

qwen1.5-32b-chat

32k

2k

Not Supported

Conversation

Qwen_qwen

-

qwen1.5-72b-chat

32k

2k

Not Supported

Conversation

Qwen_qwen

-

qwen1.5-7b-chat

8k

2k

Not Supported

Conversation

Qwen_qwen

-

qwen2-57b-a14b-instruct

65k

6k

Not Supported

Conversation

Qwen_qwen

-

Qwen2-72B-Instruct

-

-

Not Supported

Conversation

Qwen_qwen

-

qwen2-7b-instruct

128k

6k

Not Supported

Conversation

Qwen_qwen

-

qwen2-math-72b-instruct

4k

3k

Not Supported

Conversation

Qwen_qwen

-

qwen2-math-7b-instruct

4k

3k

Not Supported

Conversation

Qwen_qwen

-

qwen2.5-14b-instruct

128k

8k

Not Supported

Conversation

Qwen_qwen

-

qwen2.5-32b-instruct

128k

8k

Not Supported

Conversation

Qwen_qwen

-

qwen2.5-72b-instruct

128k

8k

Not Supported

Conversation

Qwen_qwen

-

qwen2.5-7b-instruct

128k

8k

Not Supported

Conversation

Qwen_qwen

-

qwen2.5-coder-14b-instruct

128k

8k

Not Supported

Conversation, Code

Qwen_qwen

-

qwen2.5-coder-32b-instruct

128k

8k

Not Supported

Conversation, Code

Qwen_qwen

-

qwen2.5-coder-7b-instruct

128k

8k

Not Supported

Conversation, Code

Qwen_qwen

-

qwen2.5-math-72b-instruct

4k

3k

Not Supported

Conversation

Qwen_qwen

-

qwen2.5-math-7b-instruct

4k

3k

Not Supported

Conversation

Qwen_qwen

-

deepseek-ai/DeepSeek-R1

64k

-

Not Supported

Conversation, Reasoning

DeepSeek_deepseek

The DeepSeek-R1 model is an open-source reasoning model based purely on reinforcement learning. It excels in tasks such as mathematics, code, and natural language reasoning, with performance comparable to OpenAI's o1 model and achieving excellent results in several benchmark tests.

deepseek-ai/DeepSeek-V2-Chat

128k

-

Not Supported

Conversation

DeepSeek_deepseek

DeepSeek-V2 is a powerful, cost-effective Mixture-of-Experts (MoE) language model. It was pre-trained on a high-quality corpus of 8.1 trillion tokens and further enhanced with Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). Compared to DeepSeek 67B, DeepSeek-V2 achieves stronger performance while saving 42.5% in training costs, reducing KV cache by 93.3%, and increasing maximum generation throughput by 5.76 times.

deepseek-ai/DeepSeek-V2.5

32k

-

Supported

Conversation

DeepSeek_deepseek

DeepSeek-V2.5 is an upgraded version of DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating the general and coding capabilities of the two previous versions. This model has been optimized in several aspects, including writing and instruction-following abilities, to better align with human preferences.

deepseek-ai/DeepSeek-V3

128k

4k

Not Supported

Conversation

DeepSeek_deepseek

Open-source version of deepseek. Compared to the official version, it has a longer context and no issues with sensitive word refusal.

deepseek-chat

64k

8k

Supported

Conversation

DeepSeek_deepseek

236B parameters, 64K context (API), top-ranked on the open-source leaderboard for Chinese comprehensive ability (AlignBench), and in the same tier as closed-source models like GPT-4-Turbo and ERNIE 4.0 in evaluations.

deepseek-coder

64k

8k

Supported

Conversation, Code

DeepSeek_deepseek

236B parameters, 64K context (API), top-ranked on the open-source leaderboard for Chinese comprehensive ability (AlignBench), and in the same tier as closed-source models like GPT-4-Turbo and ERNIE 4.0 in evaluations.

deepseek-reasoner

64k

8k

Supported

Conversation, Reasoning

DeepSeek_deepseek

DeepSeek-Reasoner (DeepSeek-R1) is the latest reasoning model from DeepSeek, designed to enhance reasoning capabilities through reinforcement learning training. The model's reasoning process involves a large amount of reflection and validation, enabling it to handle complex logical reasoning tasks, with a chain-of-thought length that can reach tens of thousands of words. DeepSeek-R1 excels in solving mathematical, coding, and other complex problems and has been widely applied in various scenarios, demonstrating its powerful reasoning ability and flexibility. Compared to other models, DeepSeek-R1's reasoning performance is close to that of top-tier closed-source models, showcasing the potential and competitiveness of open-source models in the field of reasoning.

hunyuan-code

4k

4k

Not Supported

Conversation, Code

Tencent_hunyuan

Hunyuan's latest code generation model. The base model was augmented with 200B high-quality code data and trained with high-quality SFT data for half a year. The context window length has been increased to 8K. It ranks at the top in automatic evaluation metrics for code generation in five major languages. In high-quality manual evaluations of 10 comprehensive code tasks across five major languages, its performance is in the top tier.

hunyuan-functioncall

28k

4k

Supported

Conversation

Tencent_hunyuan

Hunyuan's latest MOE architecture FunctionCall model, trained with high-quality FunctionCall data, with a context window of up to 32K, leading in evaluation metrics across multiple dimensions.

hunyuan-large

28k

4k

Not Supported

Conversation

Tencent_hunyuan

The Hunyuan-large model has a total of about 389B parameters, with about 52B activated parameters, making it the open-source MoE model with the largest parameter scale and best performance in the industry.

hunyuan-large-longcontext

128k

6k

Not Supported

Conversation

Tencent_hunyuan

Excels at handling long-text tasks such as document summarization and document Q&A, while also being capable of handling general text generation tasks. It performs excellently in the analysis and generation of long texts, effectively handling complex and detailed long-form content processing needs.

hunyuan-lite

250k

6k

Not Supported

Conversation

Tencent_hunyuan

Upgraded to an MOE structure with a 256k context window, leading many open-source models in NLP, code, math, and industry-specific evaluation sets.

hunyuan-pro

28k

4k

Supported

Conversation

Tencent_hunyuan

A trillion-parameter scale MOE-32K long-text model. It achieves an absolute leading level on various benchmarks, with complex instruction and reasoning capabilities, complex mathematical abilities, and supports functioncall. It is specially optimized for applications in multilingual translation, finance, law, and medicine.

hunyuan-role

28k

4k

Not Supported

Conversation

Tencent_hunyuan

Hunyuan's latest role-playing model. This is a role-playing model officially fine-tuned and launched by Hunyuan, based on the Hunyuan model and augmented with role-playing scenario datasets, providing better foundational performance in role-playing scenarios.

hunyuan-standard

30k

2k

Not Supported

Conversation

Tencent_hunyuan

Adopts a better routing strategy, while also alleviating the problems of load balancing and expert convergence. MOE-32K has a relatively higher cost-performance ratio and can handle long text inputs while balancing performance and price.

hunyuan-standard-256K

250k

6k

Not Supported

Conversation

Tencent_hunyuan

Adopts a better routing strategy, while also alleviating the problems of load balancing and expert convergence. For long texts, the "needle in a haystack" metric reaches 99.9%. MOE-256K further breaks through in length and performance, greatly expanding the input length.

hunyuan-translation-lite

4k

4k

Not Supported

Conversation

Tencent_hunyuan

The Hunyuan translation model supports natural language conversational translation; it supports mutual translation between Chinese and 15 languages including English, Japanese, French, Portuguese, Spanish, Turkish, Russian, Arabic, Korean, Italian, German, Vietnamese, Malay, and Indonesian.

hunyuan-turbo

28k

4k

Supported

Conversation

Tencent_hunyuan

The default version of the Hunyuan-turbo model, which uses a new Mixture-of-Experts (MoE) structure, resulting in faster inference efficiency and stronger performance compared to hunyuan-pro.

hunyuan-turbo-latest

28k

4k

Supported

Conversation

Tencent_hunyuan

The dynamically updated version of the Hunyuan-turbo model. It is the best-performing version in the Hunyuan model series, consistent with the C-end (Tencent Yuanbao).

hunyuan-turbo-vision

8k

2k

Supported

Vision, Conversation

Tencent_hunyuan

Hunyuan's new generation flagship visual language model, using a new Mixture-of-Experts (MoE) structure. Its capabilities in basic recognition, content creation, knowledge Q&A, and analysis/reasoning related to image-text understanding are comprehensively improved compared to the previous generation model. Max input 6k, max output 2k.

hunyuan-vision

8k

2k

Supported

Conversation, Vision

Tencent_hunyuan

Hunyuan's latest multimodal model, supporting image + text input to generate text content. Basic Image Recognition: Recognizes subjects, elements, scenes, etc., in images. Image Content Creation: Summarizes images, creates advertising copy, social media posts, poems, etc. Multi-turn Image Dialogue: Engages in multi-turn interactive Q&A about a single image. Image Analysis and Reasoning: Performs statistical analysis on logical relationships, math problems, code, and charts in images. Image Knowledge Q&A: Answers questions about knowledge points contained in images, such as historical events, movie posters. Image OCR: Recognizes text in images from natural life scenes and non-natural scenes.

SparkDesk-Lite

4k

-

Not Supported

Conversation

Spark_SparkDesk

Supports online web search function, with fast and convenient responses, suitable for low-power inference and model fine-tuning and other customized scenarios.

SparkDesk-Max

128k

-

Supported

Conversation

Spark_SparkDesk

Quantized from the latest Spark Large Model Engine 4.0 Turbo. It supports multiple built-in plugins such as web search, weather, and date. Core capabilities are fully upgraded, with universal improvements in application effects across various scenarios. Supports System role persona and FunctionCall.

SparkDesk-Max-32k

32k

-

Supported

Conversation

Spark_SparkDesk

Stronger reasoning: Enhanced context understanding and logical reasoning abilities. Longer input: Supports 32K tokens of text input, suitable for long document reading, private knowledge Q&A, and other scenarios.

SparkDesk-Pro

128k

-

Not Supported

Conversation

Spark_SparkDesk

Specially optimized for scenarios such as math, code, medicine, and education. Supports multiple built-in plugins like web search, weather, and date, covering most knowledge Q&A, language understanding, and text creation scenarios.

SparkDesk-Pro-128K

128k

-

Not Supported

Conversation

Spark_SparkDesk

Professional-grade large language model with tens of billions of parameters. It has been specially optimized for scenarios in medicine, education, and code, with lower latency in search scenarios. Suitable for business scenarios that have higher requirements for performance and response speed, such as text and intelligent Q&A.

moonshot-v1-128k

128k

4k

Supported

Conversation

Moonshot AI_moonshot

A model with a length of 8k, suitable for generating short text.

moonshot-v1-32k

32k

4k

Supported

Conversation

Moonshot AI_moonshot

A model with a length of 32k, suitable for generating long text.

moonshot-v1-8k

8k

4k

Supported

Conversation

Moonshot AI_moonshot

A model with a length of 128k, suitable for generating ultra-long text.

codegeex-4

128k

4k

Not Supported

Conversation, Code

Zhipu_codegeex

Zhipu's code model: suitable for automatic code completion tasks.

charglm-3

4k

2k

Not Supported

Conversation

Zhipu_glm

Persona model.

emohaa

8k

4k

Not Supported

Conversation

Zhipu_glm

Psychology model: possesses professional counseling abilities to help users understand emotions and cope with emotional problems.

glm-3-turbo

128k

4k

Not Supported

Conversation

Zhipu_glm

To be deprecated (June 30, 2025).

glm-4

128k

4k

Supported

Conversation

Zhipu_glm

Old flagship: released on January 16, 2024, now replaced by GLM-4-0520.

glm-4-0520

128k

4k

Supported

Conversation

Zhipu_glm

High-intelligence model: suitable for handling highly complex and diverse tasks.

glm-4-air

128k

4k

Supported

Conversation

Zhipu_glm

High cost-performance: the most balanced model between inference capability and price.

glm-4-airx

8k

4k

Supported

Conversation

Zhipu_glm

Extremely fast inference: has ultra-fast inference speed and powerful inference effects.

glm-4-flash

128k

4k

Supported

Conversation

Zhipu_glm

High speed, low price: ultra-fast inference speed.

glm-4-flashx

128k

4k

Supported

Conversation

Zhipu_glm

High speed, low price: Enhanced version of Flash, ultra-fast inference speed.

glm-4-long

1m

4k

Supported

Conversation

Zhipu_glm

Ultra-long input: specially designed for handling ultra-long text and memory-intensive tasks.

glm-4-plus

128k

4k

Supported

Conversation

Zhipu_glm

High-intelligence flagship: comprehensive performance improvement, with significantly enhanced long-text and complex task capabilities.

glm-4v

2k

-

Not Supported

Conversation, Vision

Zhipu_glm

Image understanding: possesses image understanding and reasoning capabilities.

glm-4v-flash

2k

1k

Not Supported

Conversation, Vision

Zhipu_glm

Free model: possesses powerful image understanding capabilities.

Model Leaderboard

This document was translated from Chinese by AI and has not yet been reviewed.

LLM Arena Leaderboard (Live Updates)

This is a leaderboard based on data from Chatbot Arena (lmarena.ai), generated through an automated process.

Data Updated: 2025-06-12 11:42:10 UTC / 2025-06-12 19:42:10 CST (Beijing Time)

Click on the Model Name in the leaderboard to go to its detailed information or trial page.

Leaderboard

Rank (UB)
Rank (StyleCtrl)
Model Name
Score
Confidence Interval
Votes
Provider
License
Knowledge Cutoff

1

1

1478

+6/-7

7,343

Google

Proprietary

No data

2

2

1446

+6/-7

12,351

Google

Proprietary

No data

3

2

1425

+4/-5

15,210

OpenAI

Proprietary

No data

3

4

1423

+4/-4

19,762

OpenAI

Proprietary

No data

3

6

1420

+5/-5

12,614

Google

Proprietary

No data

3

8

1417

+4/-4

21,879

xAI

Proprietary

No data

5

4

1411

+4/-5

15,271

OpenAI

Proprietary

No data

8

8

1396

+5/-6

14,148

Google

Proprietary

No data

9

7

1384

+4/-5

13,830

OpenAI

Proprietary

No data

9

11

1382

+4/-4

16,550

DeepSeek

MIT

No data

11

5

1373

+4/-4

13,850

Anthropic

Proprietary

No data

11

16

1372

+6/-7

5,944

Tencent

Proprietary

No data

11

11

1371

+3/-4

19,430

DeepSeek

MIT

No data

11

16

1363

+6/-5

12,003

Mistral

Proprietary

No data

12

44

1361

+8/-6

6,636

xAI

Proprietary

No data

13

11

1363

+4/-3

29,038

OpenAI

Proprietary

No data

13

21

1362

+3/-3

34,240

Google

Proprietary

No data

13

10

1361

+6/-5

13,554

OpenAI

Proprietary

No data

14

28

1360

+4/-5

10,677

Alibaba

Apache 2.0

No data

14

19

1358

+3/-3

29,484

Alibaba

Proprietary

No data

15

21

1355

+4/-5

20,295

Google

Gemma

No data

21

16

1348

+3/-2

33,177

OpenAI

Proprietary

2023/10

21

9

1345

+5/-6

10,740

Anthropic

Proprietary

No data

23

20

1338

+4/-6

19,404

OpenAI

Proprietary

No data

23

16

1336

+5/-5

12,702

OpenAI

Proprietary

No data

23

28

1334

+7/-8

3,976

Google

Gemma

No data

23

38

1330

+11/-11

2,595

Amazon

Proprietary

No data

24

26

1332

+3/-4

22,841

DeepSeek

DeepSeek

No data

24

47

1330

+5/-5

15,930

Alibaba

Apache 2.0

No data

25

28

1324

+8/-7

6,055

Alibaba

Proprietary

No data

26

28

1326

+3/-4

26,104

Google

Proprietary

No data

26

32

1324

+6/-7

6,028

Zhipu

Proprietary

No data

27

28

1323

+4/-3

20,084

Cohere

CC-BY-NC-4.0

No data

27

27

1316

+10/-8

2,452

Tencent

Proprietary

No data

28

35

1318

+7/-6

5,126

StepFun

Proprietary

No data

30

28

1319

+3/-3

32,421

OpenAI

Proprietary

No data

30

35

1310

+11/-8

2,371

Nvidia

Nvidia

No data

30

28

1310

+11/-11

2,510

Tencent

Proprietary

No data

31

35

1317

+2/-2

54,951

OpenAI

Proprietary

2023/10

32

28

1316

+2/-2

58,645

Google

Proprietary

No data

32

18

1313

+4/-4

21,310

Anthropic

Proprietary

No data

38

52

1301

+8/-8

3,913

Google

Gemma

No data

39

17

1306

+3/-3

25,983

Anthropic

Proprietary

No data

40

38

1301

+2/-2

67,084

xAI

Proprietary

2024/3

40

43

1301

+4/-3

28,968

01 AI

Proprietary

No data

42

31

1298

+2/-2

117,747

OpenAI

Proprietary

2023/10

42

53

1296

+5/-6

10,715

Alibaba

Proprietary

No data

43

21

1297

+2/-2

73,327

Anthropic

Proprietary

2024/4

43

46

1293

+7/-5

7,243

DeepSeek

DeepSeek

No data

46

69

1289

+8/-10

4,321

Google

Gemma

No data

47

43

1285

+9/-7

3,856

Tencent

Proprietary

No data

48

55

1289

+3/-3

26,074

NexusFlow

NexusFlow

No data

48

51

1287

+4/-3

27,788

Zhipu AI

Proprietary

No data

48

37

1287

+4/-5

13,750

Meta

Llama 4

No data

48

44

1284

+8/-7

6,302

OpenAI

Proprietary

No data

49

52

1285

+3/-2

72,536

OpenAI

Proprietary

2023/10

49

62

1285

+3/-3

37,021

Google

Proprietary

No data

49

71

1282

+6/-6

7,577

Nvidia

Llama 3.1

2023/12

51

35

1282

+2/-3

43,788

Meta

Llama 3.1 Community

2023/12

52

33

1282

+2/-2

86,159

Anthropic

Proprietary

2024/4

52

51

1274

+10/-9

4,014

Tencent

Proprietary

No data

53

35

1281

+2/-2

63,038

Meta

Llama 3.1 Community

2023/12

53

35

1280

+3/-2

52,144

Google

Proprietary

Online

54

69

1280

+2/-3

55,442

xAI

Proprietary

2024/3

55

37

1279

+3/-3

47,973

OpenAI

Proprietary

2023/10

55

53

1277

+4/-4

17,432

Alibaba

Qwen

No data

65

49

1273

+2/-2

82,435

Google

Proprietary

2023/11

65

64

1272

+2/-5

26,344

DeepSeek

DeepSeek

No data

65

70

1271

+2/-3

41,519

Alibaba

Qwen

2024/9

65

51

1270

+3/-3

44,800

Meta

Llama-3.3

No data

65

69

1263

+10/-11

2,484

Mistral

Apache 2.0

No data

66

47

1270

+2/-2

102,133

OpenAI

Proprietary

2023/12

69

54

1265

+2/-3

48,217

Mistral

Mistral Research

2024/7

69

67

1264

+4/-3

20,580

NexusFlow

CC-BY-NC-4.0

2024/7

71

74

1258

+8/-8

3,010

Ai2

Llama 3.1

No data

72

52

1263

+1/-2

103,748

OpenAI

Proprietary

2023/4

72

70

1262

+2/-2

29,633

Mistral

MRL

No data

72

77

1261

+3/-2

58,637

Meta

Llama 3.1 Community

2023/12

73

49

1261

+1/-2

202,641

Anthropic

Proprietary

2023/8

74

78

1258

+3/-4

26,371

Amazon

Proprietary

No data

75

57

1258

+2/-2

97,079

OpenAI

Proprietary

2023/12

80

52

1251

+3/-3

44,893

Anthropic

Propretary

No data

80

77

1249

+6/-5

7,948

Reka AI

Proprietary

No data

84

80

1240

+2/-2

65,661

Google

Proprietary

2023/11

84

78

1235

+6/-5

9,125

AI21 Labs

Jamba Open

2024/3

84

88

1231

+7/-6

5,730

Alibaba

Apache 2.0

No data

85

80

1233

+2/-2

79,538

Google

Gemma license

2024/6

85

88

1231

+5/-4

15,321

Mistral

Apache 2.0

No data

85

95

1230

+3/-4

20,646

Amazon

Proprietary

No data

85

82

1230

+6/-5

10,548

Princeton

MIT

2024/7

85

83

1229

+4/-6

10,535

Cohere

CC-BY-NC-4.0

2024/8

85

77

1225

+8/-7

3,889

Nvidia

Llama 3.1

2023/12

87

99

1226

+3/-2

37,697

Google

Proprietary

No data

87

96

1219

+9/-10

3,460

Allen AI

Apache-2.0

No data

89

94

1223

+3/-3

28,768

Cohere

CC-BY-NC-4.0

No data

89

86

1223

+4/-5

20,608

Nvidia

NVIDIA Open Model

2023/6

89

91

1220

+5/-6

10,221

Zhipu AI

Proprietary

No data

89

85

1219

+6/-6

8,132

Reka AI

Proprietary

No data

93

88

1220

+2/-2

163,629

Meta

Llama 3 Community

2023/12

93

101

1219

+3/-3

25,213

Microsoft

MIT

No data

97

86

1214

+2/-2

113,067

Anthropic

Proprietary

2023/8

98

109

1211

+4/-3

20,654

Amazon

Proprietary

No data

99

109

1202

+12/-11

2,901

Tencent

Proprietary

No data

103

98

1206

+3/-2

57,197

Google

Gemma license

2024/6

103

96

1203

+2/-2

80,846

Cohere

CC-BY-NC-4.0

2024/3

103

111

1199

+9/-9

3,074

Ai2

Llama 3.1

No data

104

98

1201

+3/-3

38,872

Alibaba

Qianwen LICENSE

2024/6

104

83

1200

+3/-3

55,962

OpenAI

Proprietary

2021/9

104

109

1196

+7/-7

5,111

Mistral

MRL

No data

105

111

1193

+5/-5

10,391

Cohere

CC-BY-NC-4.0

No data

105

100

1193

+7/-4

10,851

Cohere

CC-BY-NC-4.0

2024/8

107

101

1193

+2/-2

122,309

Anthropic

Proprietary

2023/8

107

95

1192

+4/-4

15,753

DeepSeek AI

DeepSeek License

2024/6

107

109

1189

+5/-6

9,274

AI21 Labs

Jamba Open

2024/3

108

126

1189

+2/-3

52,578

Meta

Llama 3.1 Community

2023/12

116

94

1177

+2/-2

91,614

OpenAI

Proprietary

2021/9

116

111

1175

+3/-3

27,430

Alibaba

Qianwen LICENSE

2024/4

116

144

1166

+11/-9

3,410

Alibaba

Apache 2.0

No data

117

126

1171

+4/-3

25,135

01 AI

Apache-2.0

2024/5

117

111

1171

+2/-3

64,926

Mistral

Proprietary

No data

117

111

1169

+4/-4

16,027

Reka AI

Proprietary

Online

119

120

1165

+2/-2

109,056

Meta

Llama 3 Community

2023/3

120

133

1162

+5/-7

10,599

InternLM

Other

2024/8

121

115

1162

+2/-3

56,398

Cohere

CC-BY-NC-4.0

2024/3

121

120

1161

+3/-3

35,556

Mistral

Proprietary

No data

121

114

1161

+2/-2

53,751

Mistral

Apache 2.0

2024/4

121

118

1161

+3/-3

25,803

Reka AI

Proprietary

2023/11

121

115

1161

+3/-3

40,658

Alibaba

Qianwen LICENSE

2024/2

121

121

1156

+8/-9

3,289

IBM

Apache 2.0

No data

122

133

1157

+2/-3

48,892

Google

Gemma license

2024/7

130

115

1145

+4/-4

18,800

Google

Proprietary

2023/4

130

126

1141

+8/-8

4,854

HuggingFace

Apache 2.0

2024/4

131

129

1139

+4/-4

22,765

Alibaba

Qianwen LICENSE

2024/2

131

135

1133

+8/-7

3,380

IBM

Apache 2.0

No data

132

133

1136

+3/-4

26,105

Microsoft

MIT

2023/10

132

143

1132

+4/-4

16,676

Nexusflow

Apache-2.0

2024/3

135

133

1128

+3/-2

76,126

Mistral

Apache 2.0

2023/12

135

138

1125

+4/-4

15,917

01 AI

Yi License

2023/6

135

125

1124

+7/-7

6,557

Google

Proprietary

2023/4

136

136

1122

+5/-3

18,687

Alibaba

Qianwen LICENSE

2024/2

136

135

1120

+6/-4

8,383

Microsoft

Llama 2 Community

2023/8

138

123

1119

+3/-2

68,867

OpenAI

Proprietary

2021/9

138

143

1116

+7/-6

8,390

Meta

Llama 3.2

2023/12

139

133

1117

+3/-3

33,743

Databricks

DBRX LICENSE

2023/12

139

140

1116

+4/-4

18,476

Microsoft

MIT

2023/10

139

143

1113

+7/-5

6,658

AllenAI/UW

AI2 ImpACT Low-risk

2023/11

143

133

1107

+8/-7

7,002

IBM

Apache 2.0

No data

145

138

1105

+5/-4

12,990

OpenChat

Apache-2.0

2024/1

146

152

1106

+3/-3

39,595

Meta

Llama 2 Community

2023/7

146

144

1104

+3/-4

22,936

LMSYS

Non-commercial

2023/8

146

148

1102

+6/-5

10,415

UC Berkeley

CC-BY-NC-4.0

2023/11

147

138

1103

+2/-3

34,173

Snowflake

Apache 2.0

2024/4

147

156

1098

+7/-8

3,836

NousResearch

Apache-2.0

2024/1

147

154

1094

+10/-9

3,636

Nvidia

Llama 2 Community

2023/11

151

140

1097

+3/-4

25,070

Google

Gemma license

2024/2

152

143

1090

+9/-8

4,988

DeepSeek AI

DeepSeek License

2023/11

153

141

1090

+6/-5

8,106

OpenChat

Apache-2.0

2023/11

153

143

1088

+7/-11

5,088

NousResearch

Apache-2.0

2023/11

153

148

1087

+6/-7

7,191

IBM

Apache 2.0

No data

154

159

1083

+8/-10

4,872

Alibaba

Qianwen LICENSE

2024/2

155

159

1086

+4/-5

20,067

Mistral

Apache-2.0

2023/12

155

159

1084

+5/-5

12,808

Microsoft

MIT

2023/10

155

135

1081

+5/-4

17,036

OpenAI

Proprietary

2021/9

155

154

1076

+11/-13

1,714

Cognitive Computations

Apache-2.0

2023/10

157

163

1080

+4/-3

21,097

Microsoft

MIT

2023/10

157

159

1076

+8/-10

4,286

Upstage AI

CC-BY-NC-4.0

2023/11

160

163

1077

+3/-4

19,722

Meta

Llama 2 Community

2023/7

161

159

1072

+7/-7

7,176

Microsoft

Llama 2 Community

2023/7

165

169

1067

+6/-6

8,523

Meta

Llama 3.2

2023/12

166

167

1067

+4/-5

11,321

HuggingFace

MIT

2023/10

166

162

1060

+12/-10

2,375

HuggingFace

Apache 2.0

No data

166

159

1059

+9/-12

2,644

MosaicML

CC-BY-NC-SA-4.0

2023/6

166

168

1056

+9/-6

7,509

Meta

Llama 2 Community

2023/7

166

168

1055

+16/-17

1,192

Meta

Llama 2 Community

2024/1

166

163

1054

+15/-14

1,811

HuggingFace

MIT

2023/10

169

159

1048

+16/-16

1,327

TII

Falcon-180B TII License

2023/9

171

162

1055

+4/-4

19,775

LMSYS

Llama 2 Community

2023/7

171

169

1051

+6/-5

9,176

Google

Gemma license

2024/2

171

168

1050

+3/-4

21,622

Microsoft

MIT

2023/10

171

183

1050

+4/-6

14,532

Meta

Llama 2 Community

2023/7

171

162

1048

+9/-7

5,065

Alibaba

Qianwen LICENSE

2023/8

171

169

1046

+11/-12

2,996

UW

Non-commercial

2023/5

180

174

1034

+5/-4

11,351

Google

Gemma license

2024/2

180

176

1031

+8/-8

5,276

Together AI

Apache 2.0

2023/12

181

189

1029

+6/-8

6,503

Allen AI

Apache-2.0

2024/2

184

181

1021

+7/-6

9,142

Mistral

Apache 2.0

2023/9

184

183

1018

+6/-6

7,017

LMSYS

Llama 2 Community

2023/7

184

172

1017

+7/-6

8,713

Google

Proprietary

2021/6

188

187

1003

+9/-9

4,918

Google

Gemma license

2024/2

189

185

1002

+5/-6

7,816

Alibaba

Qianwen LICENSE

2024/2

191

190

978

+6/-8

7,020

UC Berkeley

Non-commercial

2023/4

191

191

968

+8/-8

4,763

Tsinghua

Apache-2.0

2023/10

193

190

946

+14/-15

1,788

Nomic AI

Non-commercial

2023/3

193

191

942

+9/-9

3,997

MosaicML

CC-BY-NC-SA-4.0

2023/5

193

196

938

+13/-14

2,713

Tsinghua

Apache-2.0

2023/6

193

196

935

+9/-8

4,920

RWKV

Apache 2.0

2023/4

197

191

915

+6/-9

5,864

Stanford

Non-commercial

2023/3

197

196

906

+8/-9

6,368

OpenAssistant

Apache 2.0

2023/4

198

199

892

+9/-10

4,983

Tsinghua

Non-commercial

2023/3

199

199

881

+8/-9

4,288

LMSYS

Apache 2.0

2023/4

201

201

853

+10/-10

3,336

Stability AI

CC-BY-NC-SA-4.0

2023/4

201

199

836

+12/-12

3,480

Databricks

MIT

2023/4

202

200

813

+14/-12

2,446

Meta

Non-commercial

2023/2

Explanation

  • Rank (UB): A ranking calculated based on the Bradley-Terry model. This rank reflects the model's overall performance in the arena and provides an upper bound estimate of its Elo score, helping to understand the model's potential competitiveness.

  • Rank (StyleCtrl): The ranking after applying dialogue style control. This ranking aims to reduce preference bias caused by the model's response style (e.g., verbosity, conciseness) to more purely evaluate its core capabilities.

  • Model Name: The name of the Large Language Model (LLM). This column has embedded links to the models; click to navigate.

  • Score: The Elo rating the model received from user votes in the arena. The Elo rating is a relative ranking system where a higher score indicates better performance. This score is dynamic and reflects the model's relative strength in the current competitive environment.

  • Confidence Interval: The 95% confidence interval for the model's Elo rating (e.g., +6/-6). A smaller interval indicates that the model's rating is more stable and reliable; conversely, a larger interval may suggest insufficient data or significant performance fluctuations. It provides a quantitative assessment of the rating's accuracy.

  • Votes: The total number of votes the model has received in the arena. A higher number of votes generally means higher statistical reliability of its rating.

  • Provider: The organization or company that provides the model.

  • License: The type of license for the model, such as Proprietary, Apache 2.0, MIT, etc.

  • Knowledge Cutoff: The knowledge cutoff date for the model's training data. No data indicates that the relevant information is not provided or is unknown.

Data Source and Update Frequency

The data for this leaderboard is automatically generated and provided by the fboulnois/llm-leaderboard-csv project, which sources and processes data from lmarena.ai. This leaderboard is updated daily via GitHub Actions.

Disclaimer

This report is for reference only. The leaderboard data is dynamic and based on user preference votes on Chatbot Arena over a specific period. The completeness and accuracy of the data depend on the upstream data source and the updates and processing from the fboulnois/llm-leaderboard-csv project. Different models may have different license agreements; please refer to the official documentation from the model provider before use.

Gemini-2.5-Pro-Preview-06-05
Gemini-2.5-Pro-Preview-05-06
o3-2025-04-16
ChatGPT-4o-latest (2025-03-26)
Gemini-2.5-Flash-Preview-05-20
Grok-3-Preview-02-24
GPT-4.5-Preview
Gemini-2.5-Flash-Preview-04-17
GPT-4.1-2025-04-14
DeepSeek-V3-0324
Claude Opus 4 (20250514)
Hunyuan-Turbos-20250416
DeepSeek-R1
Mistral Medium 3
Grok-3-Mini-beta
o1-2024-12-17
Gemini-2.0-Flash-001
o4-mini-2025-04-16
Qwen3-235B-A22B
Qwen2.5-Max
Gemma-3-27B-it
o1-preview
Claude Sonnet 4 (20250514)
o3-mini-high
GPT-4.1-mini-2025-04-14
Gemma-3-12B-it
Amazon-Nova-Experimental-Chat-05-14
DeepSeek-V3
QwQ-32B
Qwen-Plus-0125
Gemini-2.0-Flash-Lite
GLM-4-Plus-0111
Command A (03-2025)
Hunyuan-TurboS-20250226
Step-2-16K-Exp
o3-mini
Llama-3.3-Nemotron-Super-49B-v1
Hunyuan-Turbo-0110
o1-mini
Gemini-1.5-Pro-002
Claude 3.7 Sonnet (thinking-32k)
Gemma-3n-e4b-it
Claude 3.7 Sonnet
Grok-2-08-13
Yi-Lightning
GPT-4o-2024-05-13
Qwen2.5-plus-1127
Claude 3.5 Sonnet (20241022)
Deepseek-v2.5-1210
Gemma-3-4B-it
Hunyuan-Large-2025-02-10
Athene-v2-Chat-72B
GLM-4-Plus
Llama-4-Maverick-17B-128E-Instruct
GPT-4.1-nano-2025-04-14
GPT-4o-mini-2024-07-18
Gemini-1.5-Flash-002
Llama-3.1-Nemotron-70B-Instruct
Meta-Llama-3.1-405B-Instruct-bf16
Claude 3.5 Sonnet (20240620)
Hunyuan-Standard-2025-02-10
Meta-Llama-3.1-405B-Instruct-fp8
Gemini Advanced App (2024-05-14)
Grok-2-Mini-08-13
GPT-4o-2024-08-06
Qwen-Max-0919
Gemini-1.5-Pro-001
Deepseek-v2.5
Qwen2.5-72B-Instruct
Llama-3.3-70B-Instruct
Mistral-Small-3.1-24B-Instruct-2503
GPT-4-Turbo-2024-04-09
Mistral-Large-2407
Athene-70B
Llama-3.1-Tulu-3-70B
GPT-4-1106-preview
Mistral-Large-2411
Meta-Llama-3.1-70B-Instruct
Claude 3 Opus
Amazon Nova Pro 1.0
GPT-4-0125-preview
Claude 3.5 Haiku (20241022)
Reka-Core-20240904
Gemini-1.5-Flash-001
Jamba-1.5-Large
Qwen2.5-Coder-32B-Instruct
Gemma-2-27B-it
Mistral-Small-24B-Instruct-2501
Amazon Nova Lite 1.0
Gemma-2-9B-it-SimPO
Command R+ (08-2024)
Llama-3.1-Nemotron-51B-Instruct
Gemini-1.5-Flash-8B-001
OLMo-2-0325-32B-Instruct
Aya-Expanse-32B
Nemotron-4-340B-Instruct
GLM-4-0520
Reka-Flash-20240904
Llama-3-70B-Instruct
Phi-4
Claude 3 Sonnet
Amazon Nova Micro 1.0
Hunyuan-Standard-256K
Gemma-2-9B-it
Command R+ (04-2024)
Llama-3.1-Tulu-3-8B
Qwen2-72B-Instruct
GPT-4-0314
Ministral-8B-2410
Aya-Expanse-8B
Command R (08-2024)
Claude 3 Haiku
DeepSeek-Coder-V2-Instruct
Jamba-1.5-Mini
Meta-Llama-3.1-8B-Instruct
GPT-4-0613
Qwen1.5-110B-Chat
QwQ-32B-Preview
Yi-1.5-34B-Chat
Mistral-Large-2402
Reka-Flash-21B-online
Llama-3-8B-Instruct
InternLM2.5-20B-chat
Command R (04-2024)
Mistral Medium
Mixtral-8x22b-Instruct-v0.1
Reka-Flash-21B
Qwen1.5-72B-Chat
Granite-3.1-8B-Instruct
Gemma-2-2b-it
Gemini-1.0-Pro-001
Zephyr-ORPO-141b-A35b-v0.1
Qwen1.5-32B-Chat
Granite-3.1-2B-Instruct
Phi-3-Medium-4k-Instruct
Starling-LM-7B-beta
Mixtral-8x7B-Instruct-v0.1
Yi-34B-Chat
Gemini Pro
Qwen1.5-14B-Chat
WizardLM-70B-v1.0
GPT-3.5-Turbo-0125
Meta-Llama-3.2-3B-Instruct
DBRX-Instruct-Preview
Phi-3-Small-8k-Instruct
Tulu-2-DPO-70B
Granite-3.0-8B-Instruct
OpenChat-3.5-0106
Llama-2-70B-chat
Vicuna-33B
Starling-LM-7B-alpha
Snowflake Arctic Instruct
Nous-Hermes-2-Mixtral-8x7B-DPO
NV-Llama2-70B-SteerLM-Chat
Gemma-1.1-7B-it
DeepSeek-LLM-67B-Chat
OpenChat-3.5
OpenHermes-2.5-Mistral-7B
Granite-3.0-2B-Instruct
Qwen1.5-7B-Chat
Mistral-7B-Instruct-v0.2
Phi-3-Mini-4K-Instruct-June-24
GPT-3.5-Turbo-1106
Dolphin-2.2.1-Mistral-7B
Phi-3-Mini-4k-Instruct
SOLAR-10.7B-Instruct-v1.0
Llama-2-13b-chat
WizardLM-13b-v1.2
Meta-Llama-3.2-1B-Instruct
Zephyr-7B-beta
SmolLM2-1.7B-Instruct
MPT-30B-chat
CodeLlama-34B-instruct
CodeLlama-70B-instruct
Zephyr-7B-alpha
falcon-180b-chat
Vicuna-13B
Gemma-7B-it
Phi-3-Mini-128k-Instruct
Llama-2-7B-chat
Qwen-14B-Chat
Guanaco-33B
Gemma-1.1-2b-it
StripedHyena-Nous-7B
OLMo-7B-instruct
Mistral-7B-Instruct-v0.1
Vicuna-7B
PaLM-Chat-Bison-001
Gemma-2B-it
Qwen1.5-4B-Chat
Koala-13B
ChatGLM3-6B
GPT4All-13B-Snoozy
MPT-7B-Chat
ChatGLM2-6B
RWKV-4-Raven-14B
Alpaca-13B
OpenAssistant-Pythia-12B
ChatGLM-6B
FastChat-T5-3B
StableLM-Tuned-Alpha-7B
Dolly-V2-12B
LLaMA-13B
Logo
Logo