> For the complete documentation index, see [llms.txt](https://docs.cherry-ai.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.cherry-ai.com/docs/en-us/other/models-info.md).

# Model Data

{% hint style="info" %}

* The following information is for reference only. If there are any errors, please contact us for corrections. The service providers for some models differ, so their context sizes and model information may also vary;
* When entering data in the client, you need to convert “k” to the actual value (theoretically, 1k = 1024 tokens; 1m = 1024k tokens). For example, 8k is 8 × 1024 = 8192 tokens. In actual use, it is recommended to multiply by 1000 to avoid errors, e.g. 8k = 8 × 1000 = 8000, 1m = 1 × 1000000 = 1000000;
* A maximum output of “-” means that no explicit maximum output information for this model was found in the official query.
  {% endhint %}

<table><thead><tr><th width="313">Model name</th><th width="158">Max input</th><th width="72">Max output</th><th width="95">Function calling</th><th width="142">Model capabilities</th><th width="540">Provider</th><th width="257">Description</th></tr></thead><tbody><tr><td>360gpt-pro</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat</td><td>360AI_360gpt</td><td>The flagship trillion-parameter model in the 360 Zhinao series with the best performance, widely suitable for complex task scenarios across various fields.</td></tr><tr><td>360gpt-turbo</td><td>7k</td><td>-</td><td>Not supported</td><td>Chat</td><td>360AI_360gpt</td><td>A tens-of-billions-parameter model balancing performance and quality, suitable for scenarios with high performance/cost requirements.</td></tr><tr><td>360gpt-turbo-responsibility-8k</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat</td><td>360AI_360gpt</td><td>A tens-of-billions-parameter model balancing performance and quality, suitable for scenarios with high performance/cost requirements.</td></tr><tr><td>360gpt2-pro</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat</td><td>360AI_360gpt</td><td>The flagship trillion-parameter model in the 360 Zhinao series with the best performance, widely suitable for complex task scenarios across various fields.</td></tr><tr><td>claude-3-5-sonnet-20240620</td><td>200k</td><td>16k</td><td>Not supported</td><td>Chat, image understanding</td><td>Anthropic_claude</td><td>A snapshot version released on June 20, 2024. Claude 3.5 Sonnet is a model that balances performance and speed, delivering top-tier performance while maintaining high speed, and supports multimodal input.</td></tr><tr><td>claude-3-5-haiku-20241022</td><td>200k</td><td>16k</td><td>Not supported</td><td>Chat</td><td>Anthropic_claude</td><td>A snapshot version released on October 22, 2024. Claude 3.5 Haiku has improved across all skills, including coding, tool use, and reasoning. As the fastest model in the Anthropic family, it offers quick response times and is well suited for applications requiring high interactivity and low latency, such as user-facing chatbots and real-time code completion. It also excels at specialized tasks such as data extraction and real-time content moderation, making it a versatile tool for broad use across industries. It does not support image input.</td></tr><tr><td>claude-3-5-sonnet-20241022</td><td>200k</td><td>8K</td><td>Not supported</td><td>Chat, image understanding</td><td>Anthropic_claude</td><td>A snapshot version released on October 22, 2024. Claude 3.5 Sonnet offers capabilities beyond Opus and faster speed than Sonnet, while keeping the same price as Sonnet. Sonnet is especially strong at programming, data science, visual processing, and agent tasks.</td></tr><tr><td>claude-3-5-sonnet-latest</td><td>200K</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Anthropic_claude</td><td>Dynamically points to the latest Claude 3.5 Sonnet version. Claude 3.5 Sonnet offers capabilities beyond Opus and faster speed than Sonnet, while keeping the same price as Sonnet. Sonnet is especially strong at programming, data science, visual processing, and agent tasks. This model points to the latest version.</td></tr><tr><td>claude-3-haiku-20240307</td><td>200k</td><td>4k</td><td>Not supported</td><td>Chat, image understanding</td><td>Anthropic_claude</td><td>Claude 3 Haiku is Anthropic’s fastest and most compact model, designed for near-instant responses. It offers fast and accurate directional performance.</td></tr><tr><td>claude-3-opus-20240229</td><td>200k</td><td>4k</td><td>Not supported</td><td>Chat, image understanding</td><td>Anthropic_claude</td><td>Claude 3 Opus is Anthropic’s most powerful model for highly complex tasks. It excels in performance, intelligence, fluency, and comprehension.</td></tr><tr><td>claude-3-sonnet-20240229</td><td>200k</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Anthropic_claude</td><td>A snapshot version released on February 29, 2024. Sonnet is especially good at:<br><br>- Coding: can independently write, edit, and run code, with reasoning and troubleshooting capabilities<br>- Data science: enhances human data science expertise; can handle unstructured data when using multiple tools to gain insights<br>- Visual processing: excels at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond the text itself<br>- Agent tasks: excellent tool use, well suited for agent tasks (i.e., complex multi-step problem-solving tasks that require interacting with other systems)</td></tr><tr><td>google/gemma-2-27b-it</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Google_gamma</td><td>Gemma is a lightweight, state-of-the-art open model family developed by Google, built using the same research and technology as the Gemini models. These models are decoder-only large language models that support English and provide open weights in both pre-trained and instruction-tuned variants. Gemma models are suitable for a variety of text generation tasks, including question answering, summarization, and reasoning.</td></tr><tr><td>google/gemma-2-9b-it</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Google_gamma</td><td>Gemma is one of Google’s lightweight, state-of-the-art open model families. It is a decoder-only large language model that supports English and provides open weights, pre-trained variants, and instruction-tuned variants. Gemma models are suitable for a variety of text generation tasks, including question answering, summarization, and reasoning. This 9B model was trained on 8 trillion tokens.</td></tr><tr><td>gemini-1.5-pro</td><td>2m</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Google_gemini</td><td>The latest stable version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. It is especially suitable for tasks requiring complex reasoning.</td></tr><tr><td>gemini-1.0-pro-001</td><td>33k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Google_gemini</td><td>This is the stable version of Gemini 1.0 Pro. As an NLP model, it is specialized in tasks such as multi-turn text and code chat as well as code generation. This model will be discontinued on February 15, 2025, and migration to the 1.5 series models is recommended.</td></tr><tr><td>gemini-1.0-pro-002</td><td>32k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Google_gemini</td><td>This is the stable version of Gemini 1.0 Pro. As an NLP model, it is specialized in tasks such as multi-turn text and code chat as well as code generation. This model will be discontinued on February 15, 2025, and migration to the 1.5 series models is recommended.</td></tr><tr><td>gemini-1.0-pro-latest</td><td>33k</td><td>8k</td><td>Not supported</td><td>Chat, deprecated or soon to be deprecated</td><td>Google_gemini</td><td>This is the latest version of Gemini 1.0 Pro. As an NLP model, it is specialized in tasks such as multi-turn text and code chat as well as code generation. This model will be discontinued on February 15, 2025, and migration to the 1.5 series models is recommended.</td></tr><tr><td>gemini-1.0-pro-vision-001</td><td>16k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Google_gemini</td><td>This is the vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and migration to the 1.5 series models is recommended.</td></tr><tr><td>gemini-1.0-pro-vision-latest</td><td>16k</td><td>2k</td><td>Not supported</td><td>Image understanding</td><td>Google_gemini</td><td>This is the latest vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and migration to the 1.5 series models is recommended.</td></tr><tr><td>gemini-1.5-flash</td><td>1m</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>This is the latest stable version of Gemini 1.5 Flash. As a balanced multimodal model, it can handle audio, images, video, and text input.</td></tr><tr><td>gemini-1.5-flash-001</td><td>1m</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>This is the stable version of Gemini 1.5 Flash. It offers the same core functionality as gemini-1.5-flash, but with a fixed version, making it suitable for production use.</td></tr><tr><td>gemini-1.5-flash-002</td><td>1m</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>This is the stable version of Gemini 1.5 Flash. It offers the same core functionality as gemini-1.5-flash, but with a fixed version, making it suitable for production use.</td></tr><tr><td>gemini-1.5-flash-8b</td><td>1m</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>Gemini 1.5 Flash-8B is Google’s latest multimodal AI model, designed for efficient handling of large-scale tasks. With 8 billion parameters, it supports text, image, audio, and video input, making it suitable for a variety of use cases such as chat, transcription, and translation. Compared with other Gemini models, Flash-8B is optimized for speed and cost-effectiveness, making it especially suitable for cost-sensitive users. Its rate limits have been doubled, enabling developers to handle large-scale tasks more efficiently. In addition, Flash-8B uses knowledge distillation to extract key knowledge from larger models, ensuring a lightweight and efficient design while preserving core capabilities.</td></tr><tr><td>gemini-1.5-flash-exp-0827</td><td>1m</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>This is an experimental version of Gemini 1.5 Flash, updated regularly to include the latest improvements. It is suitable for exploratory testing and prototyping, but not recommended for production use.</td></tr><tr><td>gemini-1.5-flash-latest</td><td>1m</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>This is the cutting-edge version of Gemini 1.5 Flash, updated regularly to include the latest improvements. It is suitable for exploratory testing and prototyping, but not recommended for production use.</td></tr><tr><td>gemini-1.5-pro-001</td><td>2m</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>This is the stable version of Gemini 1.5 Pro, providing fixed model behavior and performance characteristics. It is suitable for production use where stability is required.</td></tr><tr><td>gemini-1.5-pro-002</td><td>2m</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>This is the stable version of Gemini 1.5 Pro, providing fixed model behavior and performance characteristics. It is suitable for production use where stability is required.</td></tr><tr><td>gemini-1.5-pro-exp-0801</td><td>2m</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>An experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. It is especially suitable for tasks requiring complex reasoning.</td></tr><tr><td>gemini-1.5-pro-exp-0827</td><td>2m</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>An experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. It is especially suitable for tasks requiring complex reasoning.</td></tr><tr><td>gemini-1.5-pro-latest</td><td>2m</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>This is the latest version of Gemini 1.5 Pro, dynamically pointing to the latest snapshot version</td></tr><tr><td>gemini-2.0-flash</td><td>1m</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>Gemini 2.0 Flash is Google’s latest model. Compared with version 1.5, it has faster time-to-first-token (TTFT) while maintaining a quality level comparable to Gemini Pro 1.5. The model has seen significant improvements in multimodal understanding, coding ability, complex instruction execution, and function calling, delivering a smoother and more powerful intelligent experience.</td></tr><tr><td>gemini-2.0-flash-exp</td><td>100k</td><td>8k</td><td>Supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>Gemini 2.0 Flash introduces multimodal real-time APIs, improved speed and performance, better quality, enhanced agent capabilities, and adds image generation and speech conversion features.</td></tr><tr><td>gemini-2.0-flash-lite-preview-02-05</td><td>1M</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>Gemini 2.0 Flash-Lite is Google’s latest cost-effective AI model, offering better quality while maintaining the same speed as 1.5 Flash. It supports a 1 million token context window and can handle multimodal tasks such as images, audio, and code. As Google’s most cost-effective model currently, it uses a simplified single pricing strategy and is especially suitable for large-scale applications that need cost control.</td></tr><tr><td>gemini-2.0-flash-thinking-exp</td><td>40k</td><td>8k</td><td>Not supported</td><td>Chat, reasoning</td><td>Google_gemini</td><td>gemini-2.0-flash-thinking-exp is an experimental model that can generate the "thought process" it goes through when producing a response. Therefore, compared with the basic Gemini 2.0 Flash model, responses in "thinking mode" have stronger reasoning ability.</td></tr><tr><td>gemini-2.0-flash-thinking-exp-01-21</td><td>1m</td><td>64k</td><td>Not supported</td><td>Chat, reasoning</td><td>Google_gemini</td><td>Gemini 2.0 Flash Thinking EXP-01-21 is Google’s latest AI model, focused on improving reasoning ability and user interaction experience. The model has strong reasoning capabilities, especially in mathematics and programming, and supports a context window of up to 1 million tokens, making it suitable for complex tasks and in-depth analysis scenarios. What makes it unique is its ability to generate a thought process, improving the interpretability of AI thinking, while also supporting native code execution, enhancing flexibility and practicality in interaction. Through algorithm optimization, the model reduces logical contradictions, further improving the accuracy and consistency of responses.</td></tr><tr><td>gemini-2.0-flash-thinking-exp-1219</td><td>40k</td><td>8k</td><td>Not supported</td><td>Chat, reasoning, image understanding</td><td>Google_gemini</td><td>gemini-2.0-flash-thinking-exp-1219 is an experimental model that can generate the "thought process" it goes through when producing a response. Therefore, compared with the basic Gemini 2.0 Flash model, responses in "thinking mode" have stronger reasoning ability.</td></tr><tr><td>gemini-2.0-pro-exp-01-28</td><td>2m</td><td>64k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>Preloaded model, not yet launched</td></tr><tr><td>gemini-2.0-pro-exp-02-05</td><td>2m</td><td>8k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>Gemini 2.0 Pro Exp 02-05 is Google’s latest experimental model released in February 2024, excelling in world knowledge, code generation, and long-text understanding. The model supports an ultra-long 2 million token context window and can handle 2 hours of video, 22 hours of audio, more than 60,000 lines of code, and over 1.4 million words. As part of the Gemini 2.0 series, it adopts a new Flash Thinking training strategy, achieving significant performance improvements and ranking near the top on multiple LLM leaderboards, demonstrating strong overall capabilities.</td></tr><tr><td>gemini-exp-1114</td><td>8k</td><td>4k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>This is an experimental model released on November 14, 2024, mainly focused on quality improvements.</td></tr><tr><td>gemini-exp-1121</td><td>8k</td><td>4k</td><td>Not supported</td><td>Chat, image understanding, code</td><td>Google_gemini</td><td>This is an experimental model released on November 21, 2024, with improved coding, reasoning, and visual capabilities.</td></tr><tr><td>gemini-exp-1206</td><td>8k</td><td>4k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>This is an experimental model released on December 6, 2024, with improved coding, reasoning, and visual capabilities.</td></tr><tr><td>gemini-exp-latest</td><td>8k</td><td>4k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>This is an experimental model that dynamically points to the latest version</td></tr><tr><td>gemini-pro</td><td>33k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Google_gemini</td><td>Same as gemini-1.0-pro, an alias for gemini-1.0-pro</td></tr><tr><td>gemini-pro-vision</td><td>16k</td><td>2k</td><td>Not supported</td><td>Chat, image understanding</td><td>Google_gemini</td><td>This is the vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and migration to the 1.5 series models is recommended.</td></tr><tr><td>grok-2</td><td>128k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Grok_grok</td><td>A new version of the Grok model released by X.ai on 2024-12-12.</td></tr><tr><td>grok-2-1212</td><td>128k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Grok_grok</td><td>A new version of the Grok model released by X.ai on 2024-12-12.</td></tr><tr><td>grok-2-latest</td><td>128k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Grok_grok</td><td>A new version of the Grok model released by X.ai on 2024-12-12.</td></tr><tr><td>grok-2-vision-1212</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat, image understanding</td><td>Grok_grok</td><td>The Grok vision model released by X.ai on 2024-12-12.</td></tr><tr><td>grok-beta</td><td>100k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Grok_grok</td><td>Performance is comparable to Grok 2, but with improved efficiency, speed, and functionality.</td></tr><tr><td>grok-vision-beta</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat, image understanding</td><td>Grok_grok</td><td>The latest image understanding model can handle a variety of visual information, including documents, charts, screenshots, and photos.</td></tr><tr><td>internlm/internlm2_5-20b-chat</td><td>32k</td><td>-</td><td>Supported</td><td>Chat</td><td>internlm</td><td>InternLM2.5-20B-Chat is an open-source large-scale chat model developed based on the InternLM2 architecture. The model has 20 billion parameters and excels at mathematical reasoning, outperforming Llama3 and Gemma2-27B models of the same scale. InternLM2.5-20B-Chat has seen significant improvements in tool-calling capabilities, supports collecting information from hundreds of web pages for analysis and reasoning, and has stronger instruction understanding, tool selection, and result reflection capabilities.</td></tr><tr><td>meta-llama/Llama-3.2-11B-Vision-Instruct</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat, image understanding</td><td>Meta_llama</td><td>The current Llama series models can handle not only text data but also image data. Some Llama 3.2 models include visual understanding capabilities. This model supports inputting both text and image data at the same time, understanding images and outputting text information.</td></tr><tr><td>meta-llama/Llama-3.2-3B-Instruct</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Meta_llama</td><td>Meta Llama 3.2 multilingual large language model (LLM), where 1B and 3B are lightweight models that can run on edge and mobile devices. This model is the 3B version.</td></tr><tr><td>meta-llama/Llama-3.2-90B-Vision-Instruct</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat, image understanding</td><td>Meta_llama</td><td>The current Llama series models can handle not only text data but also image data. Some Llama 3.2 models include visual understanding capabilities. This model supports inputting both text and image data at the same time, understanding images and outputting text information.</td></tr><tr><td>meta-llama/Llama-3.3-70B-Instruct</td><td>131k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Meta_llama</td><td>Meta’s latest 70B LLM, with performance comparable to Llama 3.1 405B.</td></tr><tr><td>meta-llama/Meta-Llama-3.1-405B-Instruct</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Meta_llama</td><td>The Meta Llama 3.1 multilingual large language model (LLM) collection is a set of pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. This model is the 405B version. The Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual conversation and outperform many available open-source and closed-source chat models on common industry benchmarks.</td></tr><tr><td>meta-llama/Meta-Llama-3.1-70B-Instruct</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Meta_llama</td><td>Meta Llama 3.1 is a family of multilingual large language models developed by Meta, including pre-trained and instruction-tuned variants in three parameter sizes: 8B, 70B, and 405B. This 70B instruction-tuned model is optimized for multilingual conversational scenarios and performs well on multiple industry benchmarks. The model was trained on more than 15 trillion tokens of public data and uses techniques such as supervised fine-tuning and reinforcement learning from human feedback to improve usefulness and safety.</td></tr><tr><td>meta-llama/Meta-Llama-3.1-8B-Instruct</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Meta_llama</td><td>The Meta Llama 3.1 multilingual large language model (LLM) collection is a set of pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. This model is the 8B version. The Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual conversation and outperform many available open-source and closed-source chat models on common industry benchmarks.</td></tr><tr><td>abab5.5-chat</td><td>16k</td><td>-</td><td>Supported</td><td>Chat</td><td>Minimax_abab</td><td>Chinese persona chat scenarios</td></tr><tr><td>abab5.5s-chat</td><td>8k</td><td>-</td><td>Supported</td><td>Chat</td><td>Minimax_abab</td><td>Chinese persona chat scenarios</td></tr><tr><td>abab6.5g-chat</td><td>8k</td><td>-</td><td>Supported</td><td>Chat</td><td>Minimax_abab</td><td>English and other multilingual persona chat scenarios</td></tr><tr><td>abab6.5s-chat</td><td>245k</td><td>-</td><td>Supported</td><td>Chat</td><td>Minimax_abab</td><td>General scenarios</td></tr><tr><td>abab6.5t-chat</td><td>8k</td><td>-</td><td>Supported</td><td>Chat</td><td>Minimax_abab</td><td>Chinese persona chat scenarios</td></tr><tr><td>chatgpt-4o-latest</td><td>128k</td><td>16k</td><td>Not supported</td><td>Chat, image understanding</td><td>OpenAI</td><td>The chatgpt-4o-latest model version continuously points to the GPT-4o version used in ChatGPT and updates as quickly as possible when there are major changes.</td></tr><tr><td>gpt-4o-2024-11-20</td><td>128k</td><td>16k</td><td>Supported</td><td>Chat</td><td>OpenAI</td><td>The latest gpt-4o snapshot version from November 20, 2024.</td></tr><tr><td>gpt-4o-audio-preview</td><td>128k</td><td>16k</td><td>Not supported</td><td>Chat</td><td>OpenAI</td><td>OpenAI’s real-time voice conversation model</td></tr><tr><td>gpt-4o-audio-preview-2024-10-01</td><td>128k</td><td>16k</td><td>Supported</td><td>Chat</td><td>OpenAI</td><td>OpenAI’s real-time voice conversation model</td></tr><tr><td>o1</td><td>128k</td><td>32k</td><td>Not supported</td><td>Chat, reasoning, image understanding</td><td>OpenAI</td><td>OpenAI’s new reasoning model for complex tasks that require extensive common sense. This model has a 200k context, is currently the world’s strongest model, and supports image recognition</td></tr><tr><td>o1-mini-2024-09-12</td><td>128k</td><td>64k</td><td>Not supported</td><td>Chat, reasoning</td><td>OpenAI</td><td>A fixed snapshot version of o1-mini. Smaller, faster, and 80% cheaper than o1-preview, it performs well in code generation and small-context operations.</td></tr><tr><td>o1-preview-2024-09-12</td><td>128k</td><td>32k</td><td>Not supported</td><td>Chat, reasoning</td><td>OpenAI</td><td>A fixed snapshot version of o1-preview</td></tr><tr><td>gpt-3.5-turbo</td><td>16k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-3</td><td>Based on GPT-3.5: GPT-3.5 Turbo is an improved version built on the GPT-3.5 model, developed by OpenAI.<br>Performance goal: Designed to improve inference speed, processing efficiency, and resource utilization through model architecture and algorithm optimization.<br>Improved inference speed: Compared with GPT-3.5, GPT-3.5 Turbo can usually provide faster inference under the same hardware conditions, which is especially beneficial for applications requiring large-scale text processing.<br>Higher throughput: When handling a large number of requests or data, GPT-3.5 Turbo can achieve higher concurrent processing capacity, thereby improving overall system throughput.<br>Optimized resource consumption: While maintaining performance, it may reduce demands on hardware resources such as memory and compute, helping lower operating costs and improve system scalability.<br>Broad natural language processing tasks: GPT-3.5 Turbo is suitable for a variety of NLP tasks, including but not limited to text generation, semantic understanding, dialogue systems, machine translation, and more.<br>Developer tools and API support: Provides API interfaces that are convenient for developers to integrate and use, supporting rapid application development and deployment.</td></tr><tr><td>gpt-3.5-turbo-0125</td><td>16k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-3</td><td>An updated GPT-3.5 Turbo with higher accuracy in responding to request formats and a fix for a bug that caused function-calling text encoding issues in non-English languages. Returns up to 4,096 output tokens.</td></tr><tr><td>gpt-3.5-turbo-0613</td><td>16k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-3</td><td>Updated GPT-3.5 Turbo fixed snapshot version. Currently deprecated</td></tr><tr><td>gpt-3.5-turbo-1106</td><td>16k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-3</td><td>Improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns up to 4,096 output tokens.</td></tr><tr><td>gpt-3.5-turbo-16k</td><td>16k</td><td>4k</td><td>Supported</td><td>Chat, deprecated or soon to be deprecated</td><td>OpenAI_gpt-3</td><td>(Deprecated)</td></tr><tr><td>gpt-3.5-turbo-16k-0613</td><td>16k</td><td>4k</td><td>Supported</td><td>Chat, deprecated or soon to be deprecated</td><td>OpenAI_gpt-3</td><td>Snapshot of gpt-3.5-turbo from June 13, 2023. (Deprecated)</td></tr><tr><td>gpt-3.5-turbo-instruct</td><td>4k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-3</td><td>Capabilities similar to models from the GPT-3 era. Compatible with the legacy Completions endpoint, not suitable for Chat Completions.</td></tr><tr><td>gpt-3.5o</td><td>16k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>OpenAI_gpt-3</td><td>Same as gpt-4o-lite</td></tr><tr><td>gpt-4</td><td>8k</td><td>8k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>Currently points to gpt-4-0613.</td></tr><tr><td>gpt-4-0125-preview</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>The latest GPT-4 model, designed to reduce cases of "laziness," where the model fails to complete tasks. Returns up to 4,096 output tokens.</td></tr><tr><td>gpt-4-0314</td><td>8k</td><td>8k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>Snapshot of gpt-4 from March 14, 2023</td></tr><tr><td>gpt-4-0613</td><td>8k</td><td>8k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>Snapshot of gpt-4 from June 13, 2023, with enhanced function calling support.</td></tr><tr><td>gpt-4-1106-preview</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>GPT-4 Turbo model with improved instruction following, JSON mode, reproducible outputs, function calling, and more. Returns up to 4,096 output tokens. This is a preview model.</td></tr><tr><td>gpt-4-32k</td><td>32k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>gpt-4-32k will be deprecated on 2025-06-06.</td></tr><tr><td>gpt-4-32k-0613</td><td>32k</td><td>4k</td><td>Supported</td><td>Chat, deprecated or soon to be deprecated</td><td>OpenAI_gpt-4</td><td>Will be deprecated on 2025-06-06.</td></tr><tr><td>gpt-4-turbo</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>The latest GPT-4 Turbo model adds vision capabilities and supports processing visual requests through JSON mode and function calling. The current version of this model is gpt-4-turbo-2024-04-09.</td></tr><tr><td>gpt-4-turbo-2024-04-09</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>GPT-4 Turbo model with vision capabilities. Now, visual requests can be handled through JSON mode and function calling. The current version of gpt-4-turbo is this version.</td></tr><tr><td>gpt-4-turbo-preview</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat, image understanding</td><td>OpenAI_gpt-4</td><td>Currently points to gpt-4-0125-preview.</td></tr><tr><td>gpt-4o</td><td>128k</td><td>16k</td><td>Supported</td><td>Chat, image understanding</td><td>OpenAI_gpt-4</td><td>OpenAI’s highly intelligent flagship model, suitable for complex multi-step tasks. GPT-4o is cheaper and faster than GPT-4 Turbo.</td></tr><tr><td>gpt-4o-2024-05-13</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat, image understanding</td><td>OpenAI_gpt-4</td><td>The original gpt-4o snapshot from May 13, 2024.</td></tr><tr><td>gpt-4o-2024-08-06</td><td>128k</td><td>16k</td><td>Supported</td><td>Chat, image understanding</td><td>OpenAI_gpt-4</td><td>The first snapshot to support structured outputs. gpt-4o currently points to this version.</td></tr><tr><td>gpt-4o-mini</td><td>128k</td><td>16k</td><td>Supported</td><td>Chat, image understanding</td><td>OpenAI_gpt-4</td><td>OpenAI’s affordable gpt-4o version, suitable for fast, lightweight tasks. GPT-4o mini is cheaper and more capable than GPT-3.5 Turbo. Currently points to gpt-4o-mini-2024-07-18.</td></tr><tr><td>gpt-4o-mini-2024-07-18</td><td>128k</td><td>16k</td><td>Supported</td><td>Chat, image understanding</td><td>OpenAI_gpt-4</td><td>A fixed snapshot version of gpt-4o-mini.</td></tr><tr><td>gpt-4o-realtime-preview</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat, real-time voice</td><td>OpenAI_gpt-4</td><td>OpenAI’s real-time voice conversation model</td></tr><tr><td>gpt-4o-realtime-preview-2024-10-01</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat, real-time voice, image understanding</td><td>OpenAI_gpt-4</td><td>gpt-4o-realtime-preview currently points to this snapshot version</td></tr><tr><td>o1-mini</td><td>128k</td><td>64k</td><td>Not supported</td><td>Chat, reasoning</td><td>OpenAI_o1</td><td>Smaller, faster, and 80% cheaper than o1-preview, it performs well in code generation and small-context operations.</td></tr><tr><td>o1-preview</td><td>128k</td><td>32k</td><td>Not supported</td><td>Chat, reasoning</td><td>OpenAI_o1</td><td>o1-preview is a new reasoning model for complex tasks that require extensive common sense. This model has a 128K context and a knowledge cutoff of October 2023. It focuses on advanced reasoning and solving complex problems, including mathematical and scientific tasks. It is ideal for applications requiring deep contextual understanding and autonomous workflows.</td></tr><tr><td>o3-mini</td><td>200k</td><td>100k</td><td>Supported</td><td>Chat, reasoning</td><td>OpenAI_o1</td><td>o3-mini is OpenAI’s latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on scientific, mathematical, and coding tasks, supports developer features such as structured outputs, function calling, and batch API, and has a knowledge cutoff of October 2023, showing a notable balance between reasoning ability and affordability.</td></tr><tr><td>o3-mini-2025-01-31</td><td>200k</td><td>100k</td><td>Supported</td><td>Chat, reasoning</td><td>OpenAI_o1</td><td>o3-mini currently points to this version. o3-mini-2025-01-31 is OpenAI’s latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on scientific, mathematical, and coding tasks, supports developer features such as structured outputs, function calling, and batch API, and has a knowledge cutoff of October 2023, showing a notable balance between reasoning ability and affordability.</td></tr><tr><td>Baichuan2-Turbo</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>baichuan_baichuan</td><td>Compared with similarly sized models in the industry, this model maintains industry-leading performance while significantly reducing price</td></tr><tr><td>Baichuan3-Turbo</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>baichuan_baichuan</td><td>Compared with similarly sized models in the industry, this model maintains industry-leading performance while significantly reducing price</td></tr><tr><td>Baichuan3-Turbo-128k</td><td>128k</td><td>-</td><td>Not supported</td><td>Chat</td><td>baichuan_baichuan</td><td>The Baichuan model processes complex text through a 128k ultra-long context window, is specially optimized for industries such as finance, and significantly reduces cost while maintaining high performance, providing enterprises with a cost-effective solution.</td></tr><tr><td>Baichuan4</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>baichuan_baichuan</td><td>Baichuan’s MoE model provides an efficient and cost-effective solution for enterprise applications through specialized optimization, reduced cost, and improved performance.</td></tr><tr><td>Baichuan4-Air</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>baichuan_baichuan</td><td>Baichuan’s MoE model provides an efficient and cost-effective solution for enterprise applications through specialized optimization, reduced cost, and improved performance.</td></tr><tr><td>Baichuan4-Turbo</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>baichuan_baichuan</td><td>Trained on massive high-quality scenario data, it improves usability in common enterprise scenarios by more than 10% compared with Baichuan4, improves information summarization by 50%, multilingual performance by 31%, and content generation by 13%<br>Specially optimized for inference performance, with first-token response speed improved by 51% and token throughput improved by 73% compared with Baichuan4</td></tr><tr><td>ERNIE-3.5-128K</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu’s self-developed flagship large-scale language model, covering massive Chinese and English corpora, with powerful general capabilities that can meet most requirements for dialogue Q&#x26;A, content generation, and plugin application scenarios; it supports automatic connection to Baidu Search plugins to ensure the timeliness of answers.</td></tr><tr><td>ERNIE-3.5-8K</td><td>8k</td><td>1k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu’s self-developed flagship large-scale language model, covering massive Chinese and English corpora, with powerful general capabilities that can meet most requirements for dialogue Q&#x26;A, content generation, and plugin application scenarios; it supports automatic connection to Baidu Search plugins to ensure the timeliness of answers.</td></tr><tr><td>ERNIE-3.5-8K-Preview</td><td>8k</td><td>1k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu’s self-developed flagship large-scale language model, covering massive Chinese and English corpora, with powerful general capabilities that can meet most requirements for dialogue Q&#x26;A, content generation, and plugin application scenarios; it supports automatic connection to Baidu Search plugins to ensure the timeliness of answers.</td></tr><tr><td>ERNIE-4.0-8K</td><td>8k</td><td>1k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu’s self-developed flagship ultra-large language model, with全面 upgraded model capabilities compared with ERNIE 3.5, widely suitable for complex tasks across various fields; supports automatic connection to Baidu Search plugins to ensure the timeliness of answers.</td></tr><tr><td>ERNIE-4.0-8K-Latest</td><td>8k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>Compared with ERNIE-4.0-8K, ERNIE-4.0-8K-Latest has significantly improved capabilities, especially in role-playing and instruction following; compared with ERNIE 3.5, it achieves a全面 upgrade in model capabilities and is widely suitable for complex task scenarios across various fields; it supports automatic connection to Baidu Search plugins to ensure the timeliness of answers, and supports 5K tokens input + 2K tokens output. This article introduces the API calling method for ERNIE-4.0-8K-Latest.</td></tr><tr><td>ERNIE-4.0-8K-Preview</td><td>8k</td><td>1k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu’s self-developed flagship ultra-large language model, with全面 upgraded model capabilities compared with ERNIE 3.5, widely suitable for complex tasks across various fields; supports automatic connection to Baidu Search plugins to ensure the timeliness of answers.</td></tr><tr><td>ERNIE-4.0-Turbo-128K</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>ERNIE 4.0 Turbo is Baidu’s self-developed flagship ultra-large language model, with excellent overall performance and wide applicability to complex tasks across various fields; it supports automatic connection to Baidu Search plugins to ensure the timeliness of answers. It performs better than ERNIE 4.0. ERNIE-4.0-Turbo-128K is one version of the model, and its overall performance on long documents is better than ERNIE-3.5-128K. This article introduces the related API and usage.</td></tr><tr><td>ERNIE-4.0-Turbo-8K</td><td>8k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>ERNIE 4.0 Turbo is Baidu’s self-developed flagship ultra-large language model, with excellent overall performance and wide applicability to complex tasks across various fields; it supports automatic connection to Baidu Search plugins to ensure the timeliness of answers. It performs better than ERNIE 4.0. ERNIE-4.0-Turbo-8K is one version of the model. This article introduces the related API and usage.</td></tr><tr><td>ERNIE-4.0-Turbo-8K-Latest</td><td>8k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>ERNIE 4.0 Turbo is Baidu’s self-developed flagship ultra-large language model, with excellent overall performance and wide applicability to complex tasks across various fields; it supports automatic connection to Baidu Search plugins to ensure the timeliness of answers. It performs better than ERNIE 4.0. ERNIE-4.0-Turbo-8K is one version of the model.</td></tr><tr><td>ERNIE-4.0-Turbo-8K-Preview</td><td>8k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>ERNIE 4.0 Turbo is Baidu’s self-developed flagship ultra-large language model, with excellent overall performance and wide applicability to complex tasks across various fields; it supports automatic connection to Baidu Search plugins to ensure the timeliness of answers. ERNIE-4.0-Turbo-8K-Preview is one version of the model</td></tr><tr><td>ERNIE-Character-8K</td><td>8k</td><td>1k</td><td>Not supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu’s self-developed vertical-scenario large language model, suitable for game NPCs, customer service dialogues, character role-playing, and other application scenarios. It has a more distinct and consistent persona style, stronger instruction-following ability, and better reasoning performance</td></tr><tr><td>ERNIE-Lite-8K</td><td>8k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu’s self-developed lightweight large language model, balancing excellent model quality and inference performance, suitable for inference on low-compute AI acceleration cards.</td></tr><tr><td>ERNIE-Lite-Pro-128K</td><td>128k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu’s self-developed lightweight large language model, with better performance than ERNIE Lite while balancing excellent model quality and inference performance, suitable for inference on low-compute AI acceleration cards. ERNIE-Lite-Pro-128K supports a 128K context length and performs better than ERNIE-Lite-128K.</td></tr><tr><td>ERNIE-Novel-8K</td><td>8k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Baidu_ernie</td><td>ERNIE-Novel-8K is Baidu’s self-developed general-purpose large language model, with a clear advantage in novel continuation and also suitable for scenarios such as short dramas and films.</td></tr><tr><td>ERNIE-Speed-128K</td><td>128k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu’s latest self-developed high-performance large language model released in 2024, with excellent general capabilities, suitable as a base model for fine-tuning to better handle specific scenario issues, while also offering excellent inference performance.</td></tr><tr><td>ERNIE-Speed-8K</td><td>8k</td><td>1k</td><td>Not supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu’s latest self-developed high-performance large language model released in 2024, with excellent general capabilities, suitable as a base model for fine-tuning to better handle specific scenario issues, while also offering excellent inference performance.</td></tr><tr><td>ERNIE-Speed-Pro-128K</td><td>128k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Baidu_ernie</td><td>ERNIE Speed Pro is Baidu’s latest self-developed high-performance large language model released in 2024, with excellent general capabilities, suitable as a base model for fine-tuning to better handle specific scenario issues, while also offering excellent inference performance. ERNIE-Speed-Pro-128K is the initial version released on August 30, 2024, supports a 128K context length, and performs better than ERNIE-Speed-128K.</td></tr><tr><td>ERNIE-Tiny-8K</td><td>8k</td><td>1k</td><td>Not supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu’s self-developed ultra-high-performance large language model, with the lowest deployment and fine-tuning cost among Wenxin series models.</td></tr><tr><td>Doubao-1.5-lite-32k</td><td>32k</td><td>12k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>Doubao 1.5 lite is also at a world-leading level among lightweight language models, matching or surpassing GPT-4o mini and Cluade 3.5 Haiku on authoritative evaluation metrics for comprehensive performance (MMLU_pro), reasoning (BBH), mathematics (MATH), and specialized knowledge (GPQA).<br></td></tr><tr><td>Doubao-1.5-pro-256k</td><td>256k</td><td>12k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>Doubao-1.5-Pro-256k, a comprehensively upgraded version based on Doubao-1.5-Pro. Compared with Doubao-pro-256k/241115, the overall performance has improved significantly by 10%. Output length has been greatly increased, supporting up to 12k tokens.</td></tr><tr><td>Doubao-1.5-pro-32k</td><td>32k</td><td>12k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>Doubao-1.5-pro, a new-generation flagship model with全面 upgraded performance and outstanding capabilities in knowledge, code, reasoning, and more. It has reached a globally leading level on multiple public benchmark evaluations, especially achieving the best results on authoritative benchmarks for knowledge, code, reasoning, and Chinese, with an overall score better than industry-leading models such as GPT-4o and Claude 3.5 Sonnet.</td></tr><tr><td>Doubao-1.5-vision-pro</td><td>32k</td><td>12k</td><td>Not supported</td><td>Chat, image understanding</td><td>Doubao_doubao</td><td>Doubao-1.5-vision-pro, a newly upgraded multimodal large model that supports image recognition at any resolution and extreme aspect ratios, with enhanced visual reasoning, document recognition, detailed information understanding, and instruction-following capabilities.</td></tr><tr><td>Doubao-embedding</td><td>4k</td><td>-</td><td>Supported</td><td>Embedding</td><td>Doubao_doubao</td><td>Doubao-embedding is a semantic vectorization model developed by ByteDance, mainly designed for vector retrieval scenarios. It supports Chinese and English bilingual input, with a maximum 4K context length. The following versions are currently available:<br><br>text-240715: maximum vector dimension 2560, supports dimensionality reduction to 512, 1024, and 2048. Chinese-English retrieval performance is significantly improved over the text-240515 version, and this version is recommended.<br>text-240515: maximum vector dimension 2048, supports dimensionality reduction to 512, 1024.</td></tr><tr><td>Doubao-embedding-large</td><td>4k</td><td>-</td><td>Not supported</td><td>Embedding</td><td>Doubao_doubao</td><td><br>Chinese-English retrieval performance is significantly improved compared with Doubao-embedding/text-240715 version</td></tr><tr><td>Doubao-embedding-vision</td><td>8k</td><td>-</td><td>Not supported</td><td>Embedding</td><td>Doubao_doubao</td><td>Doubao-embedding-vision, a newly upgraded image-text multimodal vectorization model, mainly designed for image-text multimodal vector retrieval scenarios. It supports image input and Chinese-English bilingual text input, with a maximum 8K context length.</td></tr><tr><td>Doubao-lite-128k</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>Doubao-lite offers extreme response speed and better cost-effectiveness, providing customers with more flexible choices for different scenarios. It supports inference and fine-tuning with a 128k context window.</td></tr><tr><td>Doubao-lite-32k</td><td>32k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>Doubao-lite offers extreme response speed and better cost-effectiveness, providing customers with more flexible choices for different scenarios. It supports inference and fine-tuning with a 32k context window.</td></tr><tr><td>Doubao-lite-4k</td><td>4k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>Doubao-lite offers extreme response speed and better cost-effectiveness, providing customers with more flexible choices for different scenarios. It supports inference and fine-tuning with a 4k context window.</td></tr><tr><td>Doubao-pro-128k</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>The best flagship model, suitable for handling complex tasks, and performs very well in scenarios such as reference Q&#x26;A, summarization, content creation, text classification, and role-playing. It supports inference and fine-tuning with a 128k context window.</td></tr><tr><td>Doubao-pro-32k</td><td>32k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>The best flagship model, suitable for handling complex tasks, and performs very well in scenarios such as reference Q&#x26;A, summarization, content creation, text classification, and role-playing. It supports inference and fine-tuning with a 32k context window.</td></tr><tr><td>Doubao-pro-4k</td><td>4k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>The best flagship model, suitable for handling complex tasks, and performs very well in scenarios such as reference Q&#x26;A, summarization, content creation, text classification, and role-playing. It supports inference and fine-tuning with a 4k context window.</td></tr><tr><td>step-1-128k</td><td>128k</td><td>-</td><td>Supported</td><td>Chat</td><td>StepFun</td><td>The step-1-128k model is a ultra-large language model capable of processing inputs of up to 128,000 tokens. This capability gives it a significant advantage in generating long-form content and performing complex reasoning, making it suitable for applications such as novel and script writing that require rich context.</td></tr><tr><td>step-1-256k</td><td>256k</td><td>-</td><td>Supported</td><td>Chat</td><td>StepFun</td><td>The step-1-256k model is currently one of the largest language models, supporting inputs of 256,000 tokens. It is designed to meet extremely complex task requirements, such as large-scale data analysis and multi-turn dialogue systems, and can provide high-quality output across a variety of domains.</td></tr><tr><td>step-1-32k</td><td>32k</td><td>-</td><td>Supported</td><td>Chat</td><td>StepFun</td><td>The step-1-32k model expands the context window to support inputs of 32,000 tokens. This makes it excel at handling long articles and complex conversations, and it is suitable for tasks that require deep understanding and analysis, such as legal documents and academic research.</td></tr><tr><td>step-1-8k</td><td>8k</td><td>-</td><td>Supported</td><td>Chat</td><td>StepFun</td><td>The step-1-8k model is an efficient language model designed specifically for shorter texts. It can perform reasoning within an 8,000-token context, making it suitable for applications that require quick responses, such as chatbots and real-time translation.</td></tr><tr><td>step-1-flash</td><td>8k</td><td>-</td><td>Supported</td><td>Chat</td><td>StepFun</td><td>The step-1-flash model focuses on fast responses and efficient processing, making it suitable for real-time applications. Its design enables high-quality language understanding and generation even with limited computing resources, making it suitable for mobile devices and edge computing scenarios.</td></tr><tr><td>step-1.5v-mini</td><td>32k</td><td>-</td><td>Supported</td><td>Chat, image understanding</td><td>StepFun</td><td>The step-1.5v-mini model is a lightweight version designed to run in resource-constrained environments. Although small in size, it still retains strong language processing capabilities, making it suitable for embedded systems and low-power devices.</td></tr><tr><td>step-1v-32k</td><td>32k</td><td>-</td><td>Supported</td><td>Chat, image understanding</td><td>StepFun</td><td>The step-1v-32k model supports inputs of 32,000 tokens, making it suitable for applications that require longer context. It excels at handling complex conversations and long texts, and is suitable for customer service and content creation.</td></tr><tr><td>step-1v-8k</td><td>8k</td><td>-</td><td>Supported</td><td>Chat, image understanding</td><td>StepFun</td><td>The step-1v-8k model is an optimized version designed specifically for inputs of 8,000 tokens, making it suitable for fast generation and processing of short texts. It strikes a good balance between speed and accuracy, making it suitable for real-time applications.</td></tr><tr><td>step-2-16k</td><td>16k</td><td>-</td><td>Supported</td><td>Chat</td><td>StepFun</td><td>The step-2-16k model is a medium-sized language model supporting inputs of 16,000 tokens. It performs well across a variety of tasks and is suitable for application scenarios such as education, training, and knowledge management.<br></td></tr><tr><td>yi-lightning</td><td>16k</td><td>-</td><td>Supported</td><td>Chat</td><td>01.AI_yi</td><td>The latest high-performance model guarantees high-quality output while greatly increasing inference speed.<br>Suitable for real-time interaction and highly complex reasoning scenarios; its extremely high cost-effectiveness can provide excellent support for commercial products.</td></tr><tr><td>yi-vision-v2</td><td>16K</td><td>-</td><td>Supported</td><td>Chat, image understanding</td><td>01.AI_yi</td><td>Suitable for scenarios that require analyzing and explaining images and charts, such as image Q&#x26;A, chart understanding, OCR, visual reasoning, education, research report understanding, or multilingual document reading.</td></tr><tr><td>qwen-14b-chat</td><td>8k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>Alibaba Cloud's official open-source version of Tongyi Qianwen.</td></tr><tr><td>qwen-72b-chat</td><td>32k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>Alibaba Cloud's official open-source version of Tongyi Qianwen.</td></tr><tr><td>qwen-7b-chat</td><td>7.5k</td><td>1.5k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>Alibaba Cloud's official open-source version of Tongyi Qianwen.</td></tr><tr><td>qwen-coder-plus</td><td>128k</td><td>8k</td><td>Supported</td><td>Dialogue, code</td><td>Qwen</td><td>Qwen-Coder-Plus is a coding-focused model in the Qwen series, designed to improve code generation and understanding capabilities. Trained on large-scale programming data, the model can handle multiple programming languages and supports functions such as code completion, bug detection, and code refactoring. Its design goal is to provide developers with more efficient coding assistance and improve development productivity.</td></tr><tr><td>qwen-coder-plus-latest</td><td>128k</td><td>8k</td><td>Supported</td><td>Dialogue, code</td><td>Qwen</td><td>Qwen-Coder-Plus-Latest is the latest version of Qwen-Coder-Plus, incorporating the newest algorithm optimizations and dataset updates. The model has seen significant performance improvements, can understand context more accurately, and generates code that better meets developers' needs. It also adds support for more programming languages, enhancing multilingual coding capabilities.</td></tr><tr><td>qwen-coder-turbo</td><td>128k</td><td>8k</td><td>Supported</td><td>Dialogue, code</td><td>Qwen</td><td>The code and programming models in the Tongyi Qianwen series are language models specialized for programming and code generation, with fast inference and low cost. This version always points to the latest stable snapshot</td></tr><tr><td>qwen-coder-turbo-latest</td><td>128k</td><td>8k</td><td>Supported</td><td>Dialogue, code</td><td>Qwen</td><td>The code and programming models in the Tongyi Qianwen series are language models specialized for programming and code generation, with fast inference and low cost. This version always points to the latest snapshot</td></tr><tr><td>qwen-long</td><td>10m</td><td>6k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>Qwen-Long is a large language model in the Tongyi Qianwen series designed for ultra-long-context scenarios. It supports input in various languages such as Chinese and English, and supports ultra-long-context conversations of up to 10 million tokens (about 15 million Chinese characters or 15,000 pages of documents). Together with the document service launched at the same time, it supports parsing and dialogue for various document formats such as Word, PDF, Markdown, EPUB, and MOBI. Note: When submitting requests directly via HTTP, up to 1M tokens are supported; beyond that, it is recommended to submit via file.</td></tr><tr><td>qwen-math-plus</td><td>4k</td><td>3k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>Qwen-Math-Plus is a model focused on solving mathematical problems, designed to provide efficient mathematical reasoning and computation capabilities. Trained on a large number of math problem sets, the model can handle complex mathematical expressions and problems, supporting a variety of computation needs from basic arithmetic to advanced mathematics. Its application scenarios include education, research, and engineering.</td></tr><tr><td>qwen-math-plus-latest</td><td>4k</td><td>3k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>Qwen-Math-Plus-Latest is the latest version of Qwen-Math-Plus, integrating the newest mathematical reasoning techniques and algorithmic improvements. The model performs even better when handling complex math problems, providing more accurate answers and reasoning processes. It also expands understanding of mathematical symbols and formulas, making it suitable for a broader range of mathematical application scenarios.</td></tr><tr><td>qwen-math-turbo</td><td>4k</td><td>3k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>Qwen-Math-Turbo is a high-performance math model designed for fast computation and real-time reasoning. The model optimizes computation speed and can handle a large number of math problems in a very short time, making it suitable for application scenarios that require quick feedback, such as online education and real-time data analysis. Its efficient algorithms enable users to get instant results in complex calculations.</td></tr><tr><td>qwen-math-turbo-latest</td><td>4k</td><td>3k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>Qwen-Math-Turbo-Latest is the latest version of Qwen-Math-Turbo, further improving computation efficiency and accuracy. The model has undergone multiple algorithmic optimizations, can handle more complex mathematical problems, and remains efficient in real-time reasoning. It is suitable for math applications that require fast responses, such as financial analysis and scientific computing.</td></tr><tr><td>qwen-max</td><td>32k</td><td>8k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>A hundreds-of-billions-parameter ultra-large language model in the Tongyi Qianwen 2.5 series, supporting input in various languages such as Chinese and English. As the model is upgraded, qwen-max will be updated in a rolling manner.</td></tr><tr><td>qwen-max-latest</td><td>32k</td><td>8k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>The best-performing model in the Tongyi Qianwen series. This model is a dynamically updated version, and model updates are made without prior notice. It is suitable for complex, multi-step tasks. The model's overall Chinese-English capabilities have been significantly improved, human preference has been significantly improved, reasoning ability and understanding of complex instructions have been greatly enhanced, performance on difficult tasks is better, math and coding capabilities have been significantly improved, and its ability to understand and generate structured data such as tables and JSON has been enhanced.</td></tr><tr><td>qwen-plus</td><td>128k</td><td>8k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>A well-balanced model in the Tongyi Qianwen series, with reasoning performance and speed between Tongyi Qianwen-Max and Tongyi Qianwen-Turbo, suitable for moderately complex tasks. The model's overall Chinese-English capabilities have been significantly improved, human preference has been significantly improved, reasoning ability and understanding of complex instructions have been greatly enhanced, performance on difficult tasks is better, and math and coding capabilities have been significantly improved.</td></tr><tr><td>qwen-plus-latest</td><td>128k</td><td>8k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>Qwen-Plus is an enhanced vision-language model in the Tongyi Qianwen series, designed to improve detail recognition and text recognition capabilities. The model supports images with ultra-million-pixel resolution and any aspect ratio, and performs excellently in a variety of vision-language tasks, making it suitable for application scenarios that require high-precision image understanding.</td></tr><tr><td>qwen-turbo</td><td>128k</td><td>8k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>The fastest and lowest-cost model in the Tongyi Qianwen series, suitable for simple tasks. The model's overall Chinese-English capabilities have been significantly improved, human preference has been significantly improved, reasoning ability and understanding of complex instructions have been greatly enhanced, performance on difficult tasks is better, and math and coding capabilities have been significantly improved.</td></tr><tr><td>qwen-turbo-latest</td><td>1m</td><td>8k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>Qwen-Turbo is an efficient model designed for simple tasks, emphasizing speed and cost-effectiveness. It excels at handling basic vision-language tasks and is suitable for applications with strict response-time requirements, such as real-time image recognition and simple Q&#x26;A systems.</td></tr><tr><td>qwen-vl-max</td><td>32k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Qwen</td><td>Qwen-VL-Max (qwen-vl-max), the ultra-large-scale vision-language model in the Tongyi Qianwen series. Compared with the enhanced version, it further improves visual reasoning and instruction-following capabilities, providing a higher level of visual perception and cognition. It delivers best-in-class performance on more complex tasks.</td></tr><tr><td>qwen-vl-max-latest</td><td>32k</td><td>2k</td><td>Supported</td><td>Chat, image understanding</td><td>Qwen</td><td>Qwen-VL-Max is the most advanced version in the Qwen-VL series, designed specifically to solve complex multimodal tasks. It combines advanced visual and language processing technologies, can understand and analyze high-resolution images, has very strong reasoning capabilities, and is suitable for application scenarios that require deep understanding and complex reasoning.</td></tr><tr><td>qwen-vl-ocr</td><td>34k</td><td>4k</td><td>Supported</td><td>Chat, image understanding</td><td>Qwen</td><td>Supports OCR only, not conversation.</td></tr><tr><td>qwen-vl-ocr-latest</td><td>34k</td><td>4k</td><td>Supported</td><td>Chat, image understanding</td><td>Qwen</td><td>Supports OCR only, not conversation.</td></tr><tr><td>qwen-vl-plus</td><td>8k</td><td>2k</td><td>Supported</td><td>Chat, image understanding</td><td>Qwen</td><td>Qwen-VL-Plus (qwen-vl-plus), the enhanced version of the Tongyi Qianwen large-scale vision-language model. It greatly improves detail recognition and text recognition capabilities, and supports images with ultra-million-pixel resolution and any aspect ratio. It delivers outstanding performance across a wide range of vision tasks.</td></tr><tr><td>qwen-vl-plus-latest</td><td>32k</td><td>2k</td><td>Supported</td><td>Chat, image understanding</td><td>Qwen</td><td>Qwen-VL-Plus-Latest is the latest version of Qwen-VL-Plus, with enhanced multimodal understanding capabilities. It excels at joint processing of images and text, making it suitable for applications that need efficient handling of multiple input formats, such as intelligent customer service and content generation.</td></tr><tr><td>Qwen/Qwen2-1.5B-Instruct</td><td>32k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>Qwen2-1.5B-Instruct is an instruction-tuned large language model in the Qwen2 series, with 1.5B parameters. Based on the Transformer architecture, the model adopts technologies such as SwiGLU activation, attention QKV bias, and group query attention. It performs excellently across multiple benchmarks in language understanding, generation, multilingual ability, coding, math, and reasoning, outperforming most open-source models.</td></tr><tr><td>Qwen/Qwen2-72B-Instruct</td><td>128k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>Qwen2-72B-Instruct is an instruction-tuned large language model in the Qwen2 series, with 72B parameters. Based on the Transformer architecture, the model adopts technologies such as SwiGLU activation, attention QKV bias, and group query attention. It can handle large-scale inputs. The model performs excellently across multiple benchmarks in language understanding, generation, multilingual ability, coding, math, and reasoning, outperforming most open-source models.</td></tr><tr><td>Qwen/Qwen2-7B-Instruct</td><td>128k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>Qwen2-7B-Instruct is an instruction-tuned large language model in the Qwen2 series, with 7B parameters. Based on the Transformer architecture, the model adopts technologies such as SwiGLU activation, attention QKV bias, and group query attention. It can handle large-scale inputs. The model performs excellently across multiple benchmarks in language understanding, generation, multilingual ability, coding, math, and reasoning, outperforming most open-source models</td></tr><tr><td>Qwen/Qwen2-VL-72B-Instruct</td><td>32k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>Qwen2-VL is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance on visual understanding benchmarks including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can understand videos longer than 20 minutes for high-quality video-based Q&#x26;A, dialogue, and content creation. It also has complex reasoning and decision-making capabilities, and can be integrated with mobile devices, robots, and more to perform automatic operations based on visual environments and text instructions.</td></tr><tr><td>Qwen/Qwen2-VL-7B-Instruct</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>Qwen2-VL-7B-Instruct is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance on visual understanding benchmarks including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can be used for high-quality video-based Q&#x26;A, dialogue, and content creation, and also has complex reasoning and decision-making capabilities. It can be integrated with mobile devices, robots, and more to perform automatic operations based on visual environments and text instructions.</td></tr><tr><td>Qwen/Qwen2.5-72B-Instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 72B model has significantly improved capabilities in areas such as coding and mathematics. It supports inputs of up to 128K tokens and can generate long text of over 8K tokens.</td></tr><tr><td>Qwen/Qwen2.5-72B-Instruct-128K</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 72B model has significantly improved capabilities in areas such as coding and mathematics. It supports inputs of up to 128K tokens and can generate long text of over 8K tokens.</td></tr><tr><td>Qwen/Qwen2.5-7B-Instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model has significantly improved capabilities in areas such as coding and mathematics. The model also provides multilingual support, covering more than 29 languages including Chinese and English. It has seen significant improvements in instruction following, understanding structured data, and generating structured outputs, especially JSON.</td></tr><tr><td>Qwen/Qwen2.5-Coder-32B-Instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Dialogue, code</td><td>Qwen</td><td>Qwen2.5-32B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 32B model has significantly improved capabilities in areas such as coding and mathematics. The model also provides multilingual support, covering more than 29 languages including Chinese and English. It has seen significant improvements in instruction following, understanding structured data, and generating structured outputs, especially JSON.</td></tr><tr><td>Qwen/Qwen2.5-Coder-7B-Instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model has significantly improved capabilities in areas such as coding and mathematics. The model also provides multilingual support, covering more than 29 languages including Chinese and English. It has seen significant improvements in instruction following, understanding structured data, and generating structured outputs, especially JSON.</td></tr><tr><td>Qwen/QwQ-32B-Preview</td><td>32k</td><td>16k</td><td>Not supported</td><td>Chat, reasoning</td><td>Qwen</td><td>QwQ-32B-Preview is an experimental research model developed by the Qwen team to improve AI reasoning capabilities. As a preview version, it demonstrates strong analytical ability, but it also has some important limitations:<br>1. Language mixing and code-switching: the model may mix languages or switch between languages unexpectedly, affecting response clarity.<br>2. Recursive reasoning loops: the model may enter a loop of reasoning, resulting in lengthy answers without a clear conclusion.<br>3. Safety and ethical considerations: the model requires stronger safety measures to ensure reliable and secure performance, and users should use it with caution.<br>4. Performance and benchmark limitations: the model performs well in math and programming, but there is still room for improvement in areas such as common-sense reasoning and nuanced language understanding.</td></tr><tr><td>qwen1.5-110b-chat</td><td>32k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>qwen1.5-14b-chat</td><td>8k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>qwen1.5-32b-chat</td><td>32k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>qwen1.5-72b-chat</td><td>32k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>qwen1.5-7b-chat</td><td>8k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>qwen2-57b-a14b-instruct</td><td>65k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>Qwen2-72B-Instruct</td><td>-</td><td>-</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>qwen2-7b-instruct</td><td>128k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>qwen2-math-72b-instruct</td><td>4k</td><td>3k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>qwen2-math-7b-instruct</td><td>4k</td><td>3k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>qwen2.5-14b-instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>qwen2.5-32b-instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>qwen2.5-72b-instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>qwen2.5-7b-instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>qwen2.5-coder-14b-instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Dialogue, code</td><td>Qwen</td><td>-</td></tr><tr><td>qwen2.5-coder-32b-instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Dialogue, code</td><td>Qwen</td><td>-</td></tr><tr><td>qwen2.5-coder-7b-instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Dialogue, code</td><td>Qwen</td><td>-</td></tr><tr><td>qwen2.5-math-72b-instruct</td><td>4k</td><td>3k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>qwen2.5-math-7b-instruct</td><td>4k</td><td>3k</td><td>Not supported</td><td>Chat</td><td>Qwen</td><td>-</td></tr><tr><td>deepseek-ai/DeepSeek-R1</td><td>64k</td><td>-</td><td>Not supported</td><td>Chat, reasoning</td><td>DeepSeek</td><td>The DeepSeek-R1 model is an open-source reasoning model based on pure reinforcement learning. It performs excellently on tasks such as mathematics, code, and natural language reasoning, with performance comparable to OpenAI's o1 model, and has achieved excellent results on multiple benchmarks.</td></tr><tr><td>deepseek-ai/DeepSeek-V2-Chat</td><td>128k</td><td>-</td><td>Not supported</td><td>Chat</td><td>DeepSeek</td><td>DeepSeek-V2 is a powerful, cost-effective Mixture-of-Experts (MoE) language model. It was pretrained on 8.1 trillion tokens of high-quality corpus and further improved through supervised fine-tuning (SFT) and reinforcement learning (RL). Compared with DeepSeek 67B, DeepSeek-V2 delivers stronger performance while saving 42.5% in training costs, reducing KV cache by 93.3%, and increasing maximum generation throughput by 5.76x.</td></tr><tr><td>deepseek-ai/DeepSeek-V2.5</td><td>32k</td><td>-</td><td>Supported</td><td>Chat</td><td>DeepSeek</td><td>DeepSeek-V2.5 is an upgraded version of DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating the general and coding capabilities of the two earlier versions. The model has been optimized in several areas, including writing and instruction-following abilities, and is better aligned with human preferences.</td></tr><tr><td>deepseek-ai/DeepSeek-V3</td><td>128k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>DeepSeek</td><td>DeepSeek open-source version, with a longer context than the official version and no issues such as refusals due to sensitive words.</td></tr><tr><td>deepseek-chat</td><td>64k</td><td>8k</td><td>Supported</td><td>Chat</td><td>DeepSeek</td><td>236B parameters, 64K context (API), top-ranked among open-source models in overall Chinese capability (AlignBench), and on par in evaluations with closed-source models such as GPT-4-Turbo and ERNIE 4.0</td></tr><tr><td>deepseek-coder</td><td>64k</td><td>8k</td><td>Supported</td><td>Dialogue, code</td><td>DeepSeek</td><td>236B parameters, 64K context (API), top-ranked among open-source models in overall Chinese capability (AlignBench), and on par in evaluations with closed-source models such as GPT-4-Turbo and ERNIE 4.0</td></tr><tr><td>deepseek-reasoner</td><td>64k</td><td>8k</td><td>Supported</td><td>Chat, reasoning</td><td>DeepSeek</td><td>DeepSeek-Reasoner (DeepSeek-R1) is the latest reasoning model released by DeepSeek, designed to improve reasoning ability through reinforcement learning training. Its reasoning process includes extensive reflection and verification, enabling it to handle complex logical reasoning tasks, with chain-of-thought lengths reaching tens of thousands of characters. DeepSeek-R1 performs excellently in solving mathematics, code, and other complex problems, and has been widely used in a variety of scenarios, demonstrating strong reasoning ability and flexibility. Compared with other models, DeepSeek-R1's reasoning performance is close to top-tier closed-source models, showing the potential and competitiveness of open-source models in the reasoning field.</td></tr><tr><td>hunyuan-code</td><td>4k</td><td>4k</td><td>Not supported</td><td>Dialogue, code</td><td>Tencent_Hunyuan</td><td>Hunyuan's latest code generation model. The base model was further trained on 200B high-quality code data, with six months of iterative high-quality SFT data training. The context window has been expanded to 8K, and it ranks among the top on automatic evaluation metrics for code generation in five major languages; in comprehensive human high-quality evaluations covering 10 aspects of code tasks in five major languages, its performance is in the first tier.</td></tr><tr><td>hunyuan-functioncall</td><td>28k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Tencent_Hunyuan</td><td>Hunyuan's latest MoE-architecture FunctionCall model, trained on high-quality FunctionCall data, has a context window of 32K and leads in evaluation metrics across multiple dimensions.</td></tr><tr><td>hunyuan-large</td><td>28k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Tencent_Hunyuan</td><td>The Hunyuan-large model has about 389B total parameters and about 52B active parameters. It is currently the industry's largest-parameter and best-performing open-source MoE model with a Transformer architecture.</td></tr><tr><td>hunyuan-large-longcontext</td><td>128k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Tencent_Hunyuan</td><td>It excels at long-document tasks such as document summarization and document Q&#x26;A, and also has the ability to handle general text generation tasks. It performs exceptionally well in analyzing and generating long texts, and can effectively handle the needs of complex and detailed long-document processing.</td></tr><tr><td>hunyuan-lite</td><td>250k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Tencent_Hunyuan</td><td>Upgraded to a MoE architecture with a 256k context window, it leads many open-source models on multiple benchmark datasets in NLP, code, math, and industry domains.</td></tr><tr><td>hunyuan-pro</td><td>28k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Tencent_Hunyuan</td><td>A trillion-parameter MoE-32K long-form model. It reaches an absolutely leading level on various benchmarks, supports complex instructions and reasoning, has strong mathematical capabilities, supports function calls, and is specifically optimized for multilingual translation as well as finance, law, and healthcare applications.</td></tr><tr><td>hunyuan-role</td><td>28k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Tencent_Hunyuan</td><td>The latest role-playing model in the Hunyuan series, an officially fine-tuned role-playing model released by Hunyuan. It is further trained based on the Hunyuan model combined with role-playing scenario datasets, and delivers better baseline performance in role-playing scenarios.</td></tr><tr><td>hunyuan-standard</td><td>30k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Tencent_Hunyuan</td><td>Uses a better routing strategy while also alleviating the issues of load balancing and expert convergence.<br>MOE-32K offers relatively higher cost-effectiveness and can handle long-text input while balancing performance and price.</td></tr><tr><td>hunyuan-standard-256K</td><td>250k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Tencent_Hunyuan</td><td>Uses a better routing strategy while also alleviating the issues of load balancing and expert convergence. For long-form tasks, its needle-in-a-haystack metric reaches 99.9%. MOE-256K further breaks through in both length and performance, greatly expanding the input length it can handle.</td></tr><tr><td>hunyuan-translation-lite</td><td>4k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Tencent_Hunyuan</td><td>The Hunyuan translation model supports conversational translation in natural language; it supports bidirectional translation among Chinese and 15 languages, including English, Japanese, French, Portuguese, Spanish, Turkish, Russian, Arabic, Korean, Italian, German, Vietnamese, Malay, and Indonesian.</td></tr><tr><td>hunyuan-turbo</td><td>28k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Tencent_Hunyuan</td><td>The default version of the Hunyuan-turbo model adopts a brand-new Mixture-of-Experts (MoE) architecture. Compared with hunyuan-pro, it offers faster inference efficiency and stronger performance.</td></tr><tr><td>hunyuan-turbo-latest</td><td>28k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Tencent_Hunyuan</td><td>The dynamically updated version of the Hunyuan-turbo model, and the best-performing version in the Hunyuan model series, consistent with the consumer-facing product (Tencent Yuanbao).</td></tr><tr><td>hunyuan-turbo-vision</td><td>8k</td><td>2k</td><td>Supported</td><td>Image recognition, dialogue</td><td>Tencent_Hunyuan</td><td>A new-generation flagship vision-language model in the Hunyuan series, adopting a brand-new Mixture-of-Experts (MoE) architecture. Compared with the previous generation, it has been comprehensively improved in basic recognition, content creation, knowledge Q&#x26;A, analysis, and reasoning capabilities related to image-text understanding. Maximum input: 6k, maximum output: 2k</td></tr><tr><td>hunyuan-vision</td><td>8k</td><td>2k</td><td>Supported</td><td>Chat, image understanding</td><td>Tencent_Hunyuan</td><td>Hunyuan's latest multimodal model supports generating text content from image + text input.<br>Basic image recognition: identify the main subjects, elements, and scenes in images<br>Image content creation: summarize images, create ad copy, social media captions, poems, and more<br>Multi-turn image dialogue: output a single image for multi-turn interactive Q&#x26;A<br>Image analysis and reasoning: perform statistical analysis on logical relationships, math problems, code, and charts in images<br>Image knowledge Q&#x26;A: ask and answer questions about knowledge points contained in images, such as historical events and movie posters<br>Image OCR: recognize text in images from natural life scenes and non-natural scenes.</td></tr><tr><td>SparkDesk-Lite</td><td>4k</td><td>-</td><td>Not supported</td><td>Chat</td><td>SparkDesk</td><td>Supports online web search, with fast and convenient responses, suitable for customized scenarios such as low-compute inference and model fine-tuning</td></tr><tr><td>SparkDesk-Max</td><td>128k</td><td>-</td><td>Supported</td><td>Chat</td><td>SparkDesk</td><td>Quantized from the latest Spark large-model engine 4.0 Turbo, supports built-in plugins such as web search, weather, and date. Core capabilities have been comprehensively upgraded, application effectiveness across scenarios has generally improved, and it supports System personas and FunctionCall function invocation</td></tr><tr><td>SparkDesk-Max-32k</td><td>32k</td><td>-</td><td>Supported</td><td>Chat</td><td>SparkDesk</td><td>Stronger reasoning: better context understanding and logical reasoning ability; longer input: supports 32K tokens of text input, suitable for long-document reading, private knowledge Q&#x26;A, and similar scenarios</td></tr><tr><td>SparkDesk-Pro</td><td>128k</td><td>-</td><td>Not supported</td><td>Chat</td><td>SparkDesk</td><td>Specially optimized for scenarios such as math, code, healthcare, and education; supports built-in plugins such as web search, weather, and date, covering most scenarios including knowledge Q&#x26;A, language understanding, and text creation</td></tr><tr><td>SparkDesk-Pro-128K</td><td>128k</td><td>-</td><td>Not supported</td><td>Chat</td><td>SparkDesk</td><td>Professional-grade large language model with tens of billions of parameters, specially optimized for healthcare, education, and code scenarios, with lower latency in search scenarios. Suitable for business scenarios such as text and intelligent Q&#x26;A that require higher performance and response speed.</td></tr><tr><td>moonshot-v1-128k</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Moonshot</td><td>A model with an 8k length, suitable for generating short texts.</td></tr><tr><td>moonshot-v1-32k</td><td>32k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Moonshot</td><td>A model with a 32k length, suitable for generating long texts.</td></tr><tr><td>moonshot-v1-8k</td><td>8k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Moonshot</td><td>A model with a 128k length, suitable for generating ultra-long texts.</td></tr><tr><td>codegeex-4</td><td>128k</td><td>4k</td><td>Not supported</td><td>Dialogue, code</td><td>Zhipu_codegeex</td><td>Zhipu's code model: suitable for code auto-completion tasks</td></tr><tr><td>charglm-3</td><td>4k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Zhipu_glm</td><td>Anthropomorphic model</td></tr><tr><td>emohaa</td><td>8k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Zhipu_glm</td><td>Psychological model: equipped with professional consulting capabilities to help users understand emotions and cope with emotional issues</td></tr><tr><td>glm-3-turbo</td><td>128k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Zhipu_glm</td><td>Will be deprecated soon (June 30, 2025)</td></tr><tr><td>glm-4</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu_glm</td><td>Previous flagship: released on January 16, 2024, and has now been replaced by GLM-4-0520</td></tr><tr><td>glm-4-0520</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu_glm</td><td>High-intelligence model: suitable for handling highly complex and diverse tasks</td></tr><tr><td>glm-4-air</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu_glm</td><td>High cost-performance: the most balanced model between reasoning ability and price</td></tr><tr><td>glm-4-airx</td><td>8k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu_glm</td><td>Ultra-fast reasoning: ultra-fast inference speed and strong reasoning performance</td></tr><tr><td>glm-4-flash</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu_glm</td><td>High speed, low price: ultra-fast inference speed</td></tr><tr><td>glm-4-flashx</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu_glm</td><td>High speed, low price: Flash enhanced version, ultra-fast inference speed</td></tr><tr><td>glm-4-long</td><td>1m</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu_glm</td><td>Ultra-long input: designed specifically for processing ultra-long texts and memory-based tasks</td></tr><tr><td>glm-4-plus</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu_glm</td><td>High-intelligence flagship: comprehensive performance improvements, with significantly enhanced long-text and complex-task capabilities</td></tr><tr><td>glm-4v</td><td>2k</td><td>-</td><td>Not supported</td><td>Chat, image understanding</td><td>Zhipu_glm</td><td>Image understanding: has image understanding and reasoning capabilities</td></tr><tr><td>glm-4v-flash</td><td>2k</td><td>1k</td><td>Not supported</td><td>Chat, image understanding</td><td>Zhipu_glm</td><td>Free model: has powerful image understanding capabilities</td></tr></tbody></table>

***

### 💡 Get help and submit feedback

If you encounter any questions, bugs, or have suggestions for feature improvements during configuration or use, please refer to [Feedback and Suggestions](/docs/en-us/question-contact/suggestions.md) the official channels provided there.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.cherry-ai.com/docs/en-us/other/models-info.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
