# Model Data

{% hint style="info" %}

* The following information is for reference only. If there are any errors, you can contact us for corrections. Providers differ for some models, so their context sizes and model information may also vary.
* When entering data on the client side, you need to convert “k” into the actual value (theoretically, 1k = 1024 tokens; 1m = 1024k tokens). For example, 8k = 8 × 1024 = 8192 tokens. It is recommended to use ×1000 in actual use to avoid errors, for example, 8k = 8 × 1000 = 8000, 1m = 1 × 1000000 = 1000000;
* A maximum output of “-” means that no explicit maximum output information for the model was found from the official source.
  {% endhint %}

<table><thead><tr><th width="313">Model Name</th><th width="158">Max Input</th><th width="72">Max Output</th><th width="95">Function Calling</th><th width="142">Model Capabilities</th><th width="540">Provider</th><th width="257">Description</th></tr></thead><tbody><tr><td>360gpt-pro</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat</td><td>360AI_360gpt</td><td>The flagship hundred-billion-parameter model with the best results in the 360 Zhinao series, widely suitable for complex task scenarios across various fields.</td></tr><tr><td>360gpt-turbo</td><td>7k</td><td>-</td><td>Not supported</td><td>Chat</td><td>360AI_360gpt</td><td>A ten-billion-parameter model that balances performance and quality, suitable for scenarios with high performance/cost requirements.</td></tr><tr><td>360gpt-turbo-responsibility-8k</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat</td><td>360AI_360gpt</td><td>A ten-billion-parameter model that balances performance and quality, suitable for scenarios with high performance/cost requirements.</td></tr><tr><td>360gpt2-pro</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat</td><td>360AI_360gpt</td><td>The flagship hundred-billion-parameter model with the best results in the 360 Zhinao series, widely suitable for complex task scenarios across various fields.</td></tr><tr><td>claude-3-5-sonnet-20240620</td><td>200k</td><td>16k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Anthropic_claude</td><td>A snapshot version released on June 20, 2024, Claude 3.5 Sonnet is a model that balances performance and speed, delivering top-tier performance while maintaining high speed, and supports multimodal input.</td></tr><tr><td>claude-3-5-haiku-20241022</td><td>200k</td><td>16k</td><td>Not supported</td><td>Chat</td><td>Anthropic_claude</td><td>A snapshot version released on October 22, 2024, Claude 3.5 Haiku has improved in all skills, including coding, tool use, and reasoning. As the fastest model in the Anthropic family, it offers quick response times and is suitable for highly interactive, low-latency applications such as user-facing chatbots and instant code completion. It also excels at specialized tasks such as data extraction and real-time content moderation, making it a versatile tool for broad use across industries. It does not support image input.</td></tr><tr><td>claude-3-5-sonnet-20241022</td><td>200k</td><td>8K</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Anthropic_claude</td><td>A snapshot version released on October 22, 2024, Claude 3.5 Sonnet offers capabilities beyond Opus and faster speed than Sonnet, while keeping the same price as Sonnet. Sonnet is particularly strong at programming, data science, visual processing, and agent tasks.</td></tr><tr><td>claude-3-5-sonnet-latest</td><td>200K</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Anthropic_claude</td><td>Dynamically points to the latest Claude 3.5 Sonnet version. Claude 3.5 Sonnet offers capabilities beyond Opus and faster speed than Sonnet, while keeping the same price as Sonnet. Sonnet is particularly strong at programming, data science, visual processing, and agent tasks. This model points to the latest version.</td></tr><tr><td>claude-3-haiku-20240307</td><td>200k</td><td>4k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Anthropic_claude</td><td>Claude 3 Haiku is Anthropic’s fastest and most compact model, designed for near-instant responses. It delivers fast and accurate targeted performance.</td></tr><tr><td>claude-3-opus-20240229</td><td>200k</td><td>4k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Anthropic_claude</td><td>Claude 3 Opus is Anthropic’s most powerful model for highly complex tasks. It excels in performance, intelligence, fluency, and comprehension.</td></tr><tr><td>claude-3-sonnet-20240229</td><td>200k</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Anthropic_claude</td><td>A snapshot version released on February 29, 2024, Sonnet is especially good at:<br><br>- Coding: Able to independently write, edit, and run code, with reasoning and debugging capabilities<br>- Data science: Enhances human data science expertise; can handle unstructured data while using multiple tools to gain insights<br>- Visual processing: Excels at interpreting charts, graphs, and images, accurately transcribing text to gain insights beyond the text itself<br>- Agent tasks: Excellent at tool use, making it well suited for agent tasks (i.e., complex multi-step problem-solving tasks requiring interaction with other systems)</td></tr><tr><td>google/gemma-2-27b-it</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Google_gamma</td><td>Gemma is a lightweight, state-of-the-art open model family developed by Google and built using the same research and technology as Gemini models. These models are decoder-only large language models that support English and provide open weights in both pre-trained and instruction-tuned variants. Gemma models are suitable for a variety of text generation tasks, including Q&#x26;A, summarization, and reasoning.</td></tr><tr><td>google/gemma-2-9b-it</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Google_gamma</td><td>Gemma is one of Google's lightweight, state-of-the-art open model families. It is a decoder-only large language model that supports English and provides open-weight pre-trained and instruction-tuned variants. Gemma models are suitable for a variety of text generation tasks, including Q&#x26;A, summarization, and reasoning. This 9B model was trained on 8 trillion tokens.</td></tr><tr><td>gemini-1.5-pro</td><td>2m</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Google_gemini</td><td>The latest stable version of Gemini 1.5 Pro. As a powerful multimodal model, it can process up to 60,000 lines of code or 2,000 pages of text. It is especially suitable for tasks requiring complex reasoning.</td></tr><tr><td>gemini-1.0-pro-001</td><td>33k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Google_gemini</td><td>This is the stable version of Gemini 1.0 Pro. As an NLP model, it is specialized in multi-turn text and code chat as well as code generation tasks. This model will be discontinued on February 15, 2025, and migration to the 1.5 series models is recommended.</td></tr><tr><td>gemini-1.0-pro-002</td><td>32k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Google_gemini</td><td>This is the stable version of Gemini 1.0 Pro. As an NLP model, it is specialized in multi-turn text and code chat as well as code generation tasks. This model will be discontinued on February 15, 2025, and migration to the 1.5 series models is recommended.</td></tr><tr><td>gemini-1.0-pro-latest</td><td>33k</td><td>8k</td><td>Not supported</td><td>Chat, Deprecated or to be deprecated</td><td>Google_gemini</td><td>This is the latest version of Gemini 1.0 Pro. As an NLP model, it is specialized in multi-turn text and code chat as well as code generation tasks. This model will be discontinued on February 15, 2025, and migration to the 1.5 series models is recommended.</td></tr><tr><td>gemini-1.0-pro-vision-001</td><td>16k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Google_gemini</td><td>This is the vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and migration to the 1.5 series models is recommended.</td></tr><tr><td>gemini-1.0-pro-vision-latest</td><td>16k</td><td>2k</td><td>Not supported</td><td>Image Recognition</td><td>Google_gemini</td><td>This is the latest vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and migration to the 1.5 series models is recommended.</td></tr><tr><td>gemini-1.5-flash</td><td>1m</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>This is the latest stable version of Gemini 1.5 Flash. As a balanced multimodal model, it can process audio, images, video, and text input.</td></tr><tr><td>gemini-1.5-flash-001</td><td>1m</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>This is the stable version of Gemini 1.5 Flash. It offers the same core functionality as gemini-1.5-flash, but with fixed versioning, making it suitable for production use.</td></tr><tr><td>gemini-1.5-flash-002</td><td>1m</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>This is the stable version of Gemini 1.5 Flash. It offers the same core functionality as gemini-1.5-flash, but with fixed versioning, making it suitable for production use.</td></tr><tr><td>gemini-1.5-flash-8b</td><td>1m</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>Gemini 1.5 Flash-8B is Google's latest multimodal AI model, designed for efficient processing of large-scale tasks. The model has 8 billion parameters and supports text, image, audio, and video input, making it suitable for a variety of application scenarios such as chat, transcription, and translation. Compared with other Gemini models, Flash-8B is optimized for speed and cost-effectiveness, especially for cost-sensitive users. Its rate limits have been doubled, enabling developers to handle large-scale tasks more efficiently. In addition, Flash-8B uses knowledge distillation technology to extract key knowledge from larger models, ensuring a lightweight and efficient design while maintaining core capabilities.</td></tr><tr><td>gemini-1.5-flash-exp-0827</td><td>1m</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>This is an experimental version of Gemini 1.5 Flash that is updated periodically to include the latest improvements. Suitable for exploratory testing and prototype development; not recommended for production environments.</td></tr><tr><td>gemini-1.5-flash-latest</td><td>1m</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>This is the cutting-edge version of Gemini 1.5 Flash that is updated periodically to include the latest improvements. Suitable for exploratory testing and prototype development; not recommended for production environments.</td></tr><tr><td>gemini-1.5-pro-001</td><td>2m</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>This is the stable version of Gemini 1.5 Pro, providing fixed model behavior and performance characteristics. Suitable for production environments where stability is required.</td></tr><tr><td>gemini-1.5-pro-002</td><td>2m</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>This is the stable version of Gemini 1.5 Pro, providing fixed model behavior and performance characteristics. Suitable for production environments where stability is required.</td></tr><tr><td>gemini-1.5-pro-exp-0801</td><td>2m</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>An experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can process up to 60,000 lines of code or 2,000 pages of text. It is especially suitable for tasks requiring complex reasoning.</td></tr><tr><td>gemini-1.5-pro-exp-0827</td><td>2m</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>An experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can process up to 60,000 lines of code or 2,000 pages of text. It is especially suitable for tasks requiring complex reasoning.</td></tr><tr><td>gemini-1.5-pro-latest</td><td>2m</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>This is the latest version of Gemini 1.5 Pro, dynamically pointing to the latest snapshot version</td></tr><tr><td>gemini-2.0-flash</td><td>1m</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>Gemini 2.0 Flash is Google's latest model. Compared with the 1.5 version, it has faster time to first token (TTFT) while maintaining a quality level comparable to Gemini Pro 1.5. The model has significantly improved multimodal understanding, coding ability, complex instruction execution, and function calling, delivering a smoother and more powerful intelligent experience.</td></tr><tr><td>gemini-2.0-flash-exp</td><td>100k</td><td>8k</td><td>Supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>Gemini 2.0 Flash introduces a multimodal real-time API, improved speed and performance, higher quality, enhanced agent capabilities, and added image generation and speech conversion features.</td></tr><tr><td>gemini-2.0-flash-lite-preview-02-05</td><td>1M</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>Gemini 2.0 Flash-Lite is Google's latest cost-effective AI model, delivering better quality while maintaining the same speed as 1.5 Flash. It supports a 1 million token context window and can handle multimodal tasks such as images, audio, and code. As Google's most cost-effective model to date, it uses a simplified single-pricing strategy and is especially suitable for large-scale applications that need cost control.</td></tr><tr><td>gemini-2.0-flash-thinking-exp</td><td>40k</td><td>8k</td><td>Not supported</td><td>Chat, Reasoning</td><td>Google_gemini</td><td>gemini-2.0-flash-thinking-exp is an experimental model that can generate the "thought process" it goes through when responding. Therefore, compared with the basic Gemini 2.0 Flash model, responses in "thinking mode" have stronger reasoning ability.</td></tr><tr><td>gemini-2.0-flash-thinking-exp-01-21</td><td>1m</td><td>64k</td><td>Not supported</td><td>Chat, Reasoning</td><td>Google_gemini</td><td>Gemini 2.0 Flash Thinking EXP-01-21 is Google's latest AI model, focused on improving reasoning ability and user interaction experience. The model has strong reasoning capabilities, especially in mathematics and programming, and supports a context window of up to 1 million tokens, making it suitable for complex tasks and deep analysis scenarios. Its uniqueness lies in its ability to generate a thought process, improving the interpretability of AI thinking, while also supporting native code execution, enhancing interaction flexibility and practicality. Through algorithm optimization, the model reduces logical contradictions, further improving the accuracy and consistency of responses.</td></tr><tr><td>gemini-2.0-flash-thinking-exp-1219</td><td>40k</td><td>8k</td><td>Not supported</td><td>Chat, Reasoning, Image Recognition</td><td>Google_gemini</td><td>gemini-2.0-flash-thinking-exp-1219 is an experimental model that can generate the "thought process" it goes through when responding. Therefore, compared with the basic Gemini 2.0 Flash model, responses in "thinking mode" have stronger reasoning ability.</td></tr><tr><td>gemini-2.0-pro-exp-01-28</td><td>2m</td><td>64k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>Preloaded model, not yet launched</td></tr><tr><td>gemini-2.0-pro-exp-02-05</td><td>2m</td><td>8k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>Gemini 2.0 Pro Exp 02-05 is Google's latest experimental model released in February 2024, excelling in world knowledge, code generation, and long-text understanding. The model supports an ultra-long 2 million token context window and can handle content such as 2 hours of video, 22 hours of audio, more than 60,000 lines of code, and more than 1.4 million words. As part of the Gemini 2.0 series, the model adopts a new Flash Thinking training strategy, with significantly improved performance and top rankings on multiple LLM leaderboards, demonstrating strong overall capabilities.</td></tr><tr><td>gemini-exp-1114</td><td>8k</td><td>4k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>An experimental model released on November 14, 2024, primarily focused on quality improvements.</td></tr><tr><td>gemini-exp-1121</td><td>8k</td><td>4k</td><td>Not supported</td><td>Chat, Image Recognition, Code</td><td>Google_gemini</td><td>An experimental model released on November 21, 2024, with improvements in coding, reasoning, and vision capabilities.</td></tr><tr><td>gemini-exp-1206</td><td>8k</td><td>4k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>An experimental model released on December 6, 2024, with improvements in coding, reasoning, and vision capabilities.</td></tr><tr><td>gemini-exp-latest</td><td>8k</td><td>4k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>An experimental model, dynamically pointing to the latest version</td></tr><tr><td>gemini-pro</td><td>33k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Google_gemini</td><td>Same as gemini-1.0-pro; an alias of gemini-1.0-pro</td></tr><tr><td>gemini-pro-vision</td><td>16k</td><td>2k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Google_gemini</td><td>This is the vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and migration to the 1.5 series models is recommended.</td></tr><tr><td>grok-2</td><td>128k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Grok_grok</td><td>A new version of the Grok model released by X.ai on 2024.12.12.</td></tr><tr><td>grok-2-1212</td><td>128k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Grok_grok</td><td>A new version of the Grok model released by X.ai on 2024.12.12.</td></tr><tr><td>grok-2-latest</td><td>128k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Grok_grok</td><td>A new version of the Grok model released by X.ai on 2024.12.12.</td></tr><tr><td>grok-2-vision-1212</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Grok_grok</td><td>A Grok vision model released by X.ai on 2024.12.12.</td></tr><tr><td>grok-beta</td><td>100k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Grok_grok</td><td>Performance is comparable to Grok 2, but with improved efficiency, speed, and features.</td></tr><tr><td>grok-vision-beta</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Grok_grok</td><td>The latest image understanding model can process various visual information, including documents, charts, screenshots, and photos.</td></tr><tr><td>internlm/internlm2_5-20b-chat</td><td>32k</td><td>-</td><td>Supported</td><td>Chat</td><td>internlm</td><td>InternLM2.5-20B-Chat is an open-source large-scale chat model developed based on the InternLM2 architecture. The model has 20 billion parameters and excels at mathematical reasoning, outperforming Llama3 and Gemma2-27B models of the same scale. InternLM2.5-20B-Chat has significantly improved tool-calling capabilities, supports collecting information from hundreds of web pages for analysis and reasoning, and has stronger instruction understanding, tool selection, and result reflection capabilities.</td></tr><tr><td>meta-llama/Llama-3.2-11B-Vision-Instruct</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Meta_llama</td><td>Current Llama series models can not only process text data, but also image data. Some Llama 3.2 models include visual understanding capabilities. This model supports both text and image input, can understand images, and outputs text information.</td></tr><tr><td>meta-llama/Llama-3.2-3B-Instruct</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Meta_llama</td><td>The Meta Llama 3.2 multilingual large language model (LLM) has 1B and 3B lightweight variants that can run on edge and mobile devices; this model is the 3B version.</td></tr><tr><td>meta-llama/Llama-3.2-90B-Vision-Instruct</td><td>8k</td><td>-</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Meta_llama</td><td>Current Llama series models can not only process text data, but also image data. Some Llama 3.2 models include visual understanding capabilities. This model supports both text and image input, can understand images, and outputs text information.</td></tr><tr><td>meta-llama/Llama-3.3-70B-Instruct</td><td>131k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Meta_llama</td><td>Meta's latest 70B LLM, with performance comparable to llama 3.1 405B.</td></tr><tr><td>meta-llama/Meta-Llama-3.1-405B-Instruct</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Meta_llama</td><td>The Meta Llama 3.1 multilingual large language model (LLM) collection includes pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes; this model is the 405B version. The Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual conversation and outperform many available open-source and closed-source chat models on common industry benchmarks.</td></tr><tr><td>meta-llama/Meta-Llama-3.1-70B-Instruct</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Meta_llama</td><td>Meta Llama 3.1 is a family of multilingual large language models developed by Meta, including pre-trained and instruction-tuned variants in 8B, 70B, and 405B parameter sizes. This 70B instruction-tuned model is optimized for multilingual conversation scenarios and performs well on multiple industry benchmarks. The model was trained on more than 15 trillion tokens of public data and uses techniques such as supervised fine-tuning and reinforcement learning from human feedback to improve usefulness and safety.</td></tr><tr><td>meta-llama/Meta-Llama-3.1-8B-Instruct</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Meta_llama</td><td>The Meta Llama 3.1 multilingual large language model (LLM) collection includes pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes; this model is the 8B version. The Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual conversation and outperform many available open-source and closed-source chat models on common industry benchmarks.</td></tr><tr><td>abab5.5-chat</td><td>16k</td><td>-</td><td>Supported</td><td>Chat</td><td>Minimax_abab</td><td>Chinese persona chat scenarios</td></tr><tr><td>abab5.5s-chat</td><td>8k</td><td>-</td><td>Supported</td><td>Chat</td><td>Minimax_abab</td><td>Chinese persona chat scenarios</td></tr><tr><td>abab6.5g-chat</td><td>8k</td><td>-</td><td>Supported</td><td>Chat</td><td>Minimax_abab</td><td>English and other multilingual persona chat scenarios</td></tr><tr><td>abab6.5s-chat</td><td>245k</td><td>-</td><td>Supported</td><td>Chat</td><td>Minimax_abab</td><td>General scenarios</td></tr><tr><td>abab6.5t-chat</td><td>8k</td><td>-</td><td>Supported</td><td>Chat</td><td>Minimax_abab</td><td>Chinese persona chat scenarios</td></tr><tr><td>chatgpt-4o-latest</td><td>128k</td><td>16k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>OpenAI</td><td>The chatgpt-4o-latest model version continuously points to the GPT-4o version used in ChatGPT and updates as quickly as possible when major changes occur.</td></tr><tr><td>gpt-4o-2024-11-20</td><td>128k</td><td>16k</td><td>Supported</td><td>Chat</td><td>OpenAI</td><td>The latest gpt-4o snapshot version as of November 20, 2024.</td></tr><tr><td>gpt-4o-audio-preview</td><td>128k</td><td>16k</td><td>Not supported</td><td>Chat</td><td>OpenAI</td><td>OpenAI's real-time voice conversation model</td></tr><tr><td>gpt-4o-audio-preview-2024-10-01</td><td>128k</td><td>16k</td><td>Supported</td><td>Chat</td><td>OpenAI</td><td>OpenAI's real-time voice conversation model</td></tr><tr><td>o1</td><td>128k</td><td>32k</td><td>Not supported</td><td>Chat, Reasoning, Image Recognition</td><td>OpenAI</td><td>OpenAI's new reasoning model for complex tasks requiring broad common sense. The model has a 200k context, is currently the world's strongest model, and supports image recognition.</td></tr><tr><td>o1-mini-2024-09-12</td><td>128k</td><td>64k</td><td>Not supported</td><td>Chat, Reasoning</td><td>OpenAI</td><td>A fixed snapshot version of o1-mini, smaller and faster than o1-preview, 80% lower cost, and performs well in code generation and small-context operations.</td></tr><tr><td>o1-preview-2024-09-12</td><td>128k</td><td>32k</td><td>Not supported</td><td>Chat, Reasoning</td><td>OpenAI</td><td>A fixed snapshot version of o1-preview</td></tr><tr><td>gpt-3.5-turbo</td><td>16k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-3</td><td>Based on GPT-3.5: GPT-3.5 Turbo is an improved version built on the GPT-3.5 model, developed by OpenAI.<br>Performance goal: Designed to improve inference speed, processing efficiency, and resource utilization by optimizing the model structure and algorithms.<br>Improved inference speed: Compared with GPT-3.5, GPT-3.5 Turbo usually provides faster inference speed under the same hardware conditions, which is especially beneficial for applications requiring large-scale text processing.<br>Higher throughput: When processing a large number of requests or data, GPT-3.5 Turbo can achieve higher concurrent processing capability, thereby improving overall system throughput.<br>Optimized resource consumption: While maintaining performance, it may reduce the demand for hardware resources (such as memory and computing resources), helping to lower operating costs and improve system scalability.<br>Wide range of NLP tasks: GPT-3.5 Turbo is suitable for a variety of natural language processing tasks, including but not limited to text generation, semantic understanding, conversational systems, machine translation, etc.<br>Developer tools and API support: Provides API interfaces that are easy for developers to integrate and use, supporting rapid application development and deployment.</td></tr><tr><td>gpt-3.5-turbo-0125</td><td>16k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-3</td><td>An updated GPT-3.5 Turbo with higher accuracy in responding to request formats and a fix for a bug that caused text encoding issues in function calls for non-English languages. Returns up to 4,096 output tokens.</td></tr><tr><td>gpt-3.5-turbo-0613</td><td>16k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-3</td><td>An updated fixed snapshot version of GPT 3.5 Turbo. Now deprecated</td></tr><tr><td>gpt-3.5-turbo-1106</td><td>16k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-3</td><td>Improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns up to 4,096 output tokens.</td></tr><tr><td>gpt-3.5-turbo-16k</td><td>16k</td><td>4k</td><td>Supported</td><td>Chat, Deprecated or to be deprecated</td><td>OpenAI_gpt-3</td><td>(deprecated)</td></tr><tr><td>gpt-3.5-turbo-16k-0613</td><td>16k</td><td>4k</td><td>Supported</td><td>Chat, Deprecated or to be deprecated</td><td>OpenAI_gpt-3</td><td>Snapshot of gpt-3.5-turbo as of June 13, 2023. (deprecated)</td></tr><tr><td>gpt-3.5-turbo-instruct</td><td>4k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-3</td><td>Similar capabilities to GPT-3-era models. Compatible with the legacy Completions endpoint and not suitable for Chat Completions.</td></tr><tr><td>gpt-3.5o</td><td>16k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>OpenAI_gpt-3</td><td>Same as gpt-4o-lite</td></tr><tr><td>gpt-4</td><td>8k</td><td>8k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>Currently points to gpt-4-0613.</td></tr><tr><td>gpt-4-0125-preview</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>The latest GPT-4 model, designed to reduce "laziness," where the model fails to complete tasks. Returns up to 4,096 output tokens.</td></tr><tr><td>gpt-4-0314</td><td>8k</td><td>8k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>Snapshot of gpt-4 from March 14, 2023</td></tr><tr><td>gpt-4-0613</td><td>8k</td><td>8k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>Snapshot of gpt-4 from June 13, 2023, with enhanced function calling support.</td></tr><tr><td>gpt-4-1106-preview</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>GPT-4 Turbo model with improved instruction following, JSON mode, reproducible outputs, function calling, and more. Returns up to 4,096 output tokens. This is a preview model.</td></tr><tr><td>gpt-4-32k</td><td>32k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>gpt-4-32k will be deprecated on 2025-06-06.</td></tr><tr><td>gpt-4-32k-0613</td><td>32k</td><td>4k</td><td>Supported</td><td>Chat, Deprecated or to be deprecated</td><td>OpenAI_gpt-4</td><td>Will be deprecated on 2025-06-06.</td></tr><tr><td>gpt-4-turbo</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>The latest GPT-4 Turbo model now includes vision capabilities and supports processing visual requests through JSON mode and function calling. The current version of this model is gpt-4-turbo-2024-04-09.</td></tr><tr><td>gpt-4-turbo-2024-04-09</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>OpenAI_gpt-4</td><td>GPT-4 Turbo model with vision capabilities. Now, visual requests can be handled through JSON mode and function calling. The current gpt-4-turbo version is this one.</td></tr><tr><td>gpt-4-turbo-preview</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat, Image Recognition</td><td>OpenAI_gpt-4</td><td>Currently points to gpt-4-0125-preview.</td></tr><tr><td>gpt-4o</td><td>128k</td><td>16k</td><td>Supported</td><td>Chat, Image Recognition</td><td>OpenAI_gpt-4</td><td>OpenAI's highly intelligent flagship model, suitable for complex multi-step tasks. GPT-4o is cheaper and faster than GPT-4 Turbo.</td></tr><tr><td>gpt-4o-2024-05-13</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat, Image Recognition</td><td>OpenAI_gpt-4</td><td>The original gpt-4o snapshot from May 13, 2024.</td></tr><tr><td>gpt-4o-2024-08-06</td><td>128k</td><td>16k</td><td>Supported</td><td>Chat, Image Recognition</td><td>OpenAI_gpt-4</td><td>The first snapshot supporting structured outputs. gpt-4o currently points to this version.</td></tr><tr><td>gpt-4o-mini</td><td>128k</td><td>16k</td><td>Supported</td><td>Chat, Image Recognition</td><td>OpenAI_gpt-4</td><td>OpenAI's affordable version of gpt-4o, suitable for fast, lightweight tasks. GPT-4o mini is cheaper and more powerful than GPT-3.5 Turbo. It currently points to gpt-4o-mini-2024-07-18.</td></tr><tr><td>gpt-4o-mini-2024-07-18</td><td>128k</td><td>16k</td><td>Supported</td><td>Chat, Image Recognition</td><td>OpenAI_gpt-4</td><td>A fixed snapshot version of gpt-4o-mini.</td></tr><tr><td>gpt-4o-realtime-preview</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat, Real-time Voice</td><td>OpenAI_gpt-4</td><td>OpenAI's real-time voice conversation model</td></tr><tr><td>gpt-4o-realtime-preview-2024-10-01</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat, Real-time Voice, Image Recognition</td><td>OpenAI_gpt-4</td><td>gpt-4o-realtime-preview currently points to this snapshot version</td></tr><tr><td>o1-mini</td><td>128k</td><td>64k</td><td>Not supported</td><td>Chat, Reasoning</td><td>OpenAI_o1</td><td>Smaller and faster than o1-preview, with 80% lower cost, and performs well in code generation and small-context operations.</td></tr><tr><td>o1-preview</td><td>128k</td><td>32k</td><td>Not supported</td><td>Chat, Reasoning</td><td>OpenAI_o1</td><td>o1-preview is a new reasoning model for complex tasks requiring broad common sense. The model has a 128K context and a knowledge cutoff of October 2023. It focuses on advanced reasoning and solving complex problems, including mathematical and scientific tasks. It is ideal for applications that require deep contextual understanding and autonomous workflows.</td></tr><tr><td>o3-mini</td><td>200k</td><td>100k</td><td>Supported</td><td>Chat, Reasoning</td><td>OpenAI_o1</td><td>o3-mini is OpenAI's latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on science, mathematics, and coding tasks, and supports developer features such as structured outputs, function calling, and batch API. Its knowledge cutoff is October 2023, showing a notable balance between reasoning ability and cost-effectiveness.</td></tr><tr><td>o3-mini-2025-01-31</td><td>200k</td><td>100k</td><td>Supported</td><td>Chat, Reasoning</td><td>OpenAI_o1</td><td>o3-mini currently points to this version. o3-mini-2025-01-31 is OpenAI's latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on science, mathematics, and coding tasks, and supports developer features such as structured outputs, function calling, and batch API. Its knowledge cutoff is October 2023, showing a notable balance between reasoning ability and cost-effectiveness.</td></tr><tr><td>Baichuan2-Turbo</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Baichuan_baichuan</td><td>Compared with similarly sized models in the industry, this model achieves significantly lower pricing while maintaining industry-leading performance.</td></tr><tr><td>Baichuan3-Turbo</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Baichuan_baichuan</td><td>Compared with similarly sized models in the industry, this model achieves significantly lower pricing while maintaining industry-leading performance.</td></tr><tr><td>Baichuan3-Turbo-128k</td><td>128k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Baichuan_baichuan</td><td>The Baichuan model handles complex text through a 128k ultra-long context window and is specially optimized for industries such as finance. While maintaining high performance, it significantly reduces cost, providing enterprises with a high value-for-money solution.</td></tr><tr><td>Baichuan4</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Baichuan_baichuan</td><td>Baichuan's MoE model provides an efficient and cost-effective solution for enterprise applications through specialized optimization, reduced cost, and improved performance.</td></tr><tr><td>Baichuan4-Air</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Baichuan_baichuan</td><td>Baichuan's MoE model provides an efficient and cost-effective solution for enterprise applications through specialized optimization, reduced cost, and improved performance.</td></tr><tr><td>Baichuan4-Turbo</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Baichuan_baichuan</td><td>Trained on massive amounts of high-quality scenario data, the usability in high-frequency enterprise scenarios is improved by over 10% compared with Baichuan4, information summarization is improved by 50%, multilingual capability by 31%, and content generation by 13%.<br>Specially optimized for reasoning performance, first-token response speed is improved by 51% compared with Baichuan4, and token throughput by 73%.</td></tr><tr><td>ERNIE-3.5-128K</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu's self-developed flagship large-scale large language model, covering massive Chinese and English corpora and featuring strong general capabilities. It can meet the requirements of most dialogue, Q&#x26;A, content creation, and plugin application scenarios; it supports automatic connection to Baidu Search plugins to ensure timely answers.</td></tr><tr><td>ERNIE-3.5-8K</td><td>8k</td><td>1k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu's self-developed flagship large-scale large language model, covering massive Chinese and English corpora and featuring strong general capabilities. It can meet the requirements of most dialogue, Q&#x26;A, content creation, and plugin application scenarios; it supports automatic connection to Baidu Search plugins to ensure timely answers.</td></tr><tr><td>ERNIE-3.5-8K-Preview</td><td>8k</td><td>1k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu's self-developed flagship large-scale large language model, covering massive Chinese and English corpora and featuring strong general capabilities. It can meet the requirements of most dialogue, Q&#x26;A, content creation, and plugin application scenarios; it supports automatic connection to Baidu Search plugins to ensure timely answers.</td></tr><tr><td>ERNIE-4.0-8K</td><td>8k</td><td>1k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu's self-developed flagship ultra-large-scale large language model, with全面 upgraded model capabilities compared with ERNIE 3.5, widely suitable for complex task scenarios across various fields; supports automatic connection to Baidu Search plugins to ensure timely answers.</td></tr><tr><td>ERNIE-4.0-8K-Latest</td><td>8k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>ERNIE-4.0-8K-Latest has全面 improved capabilities compared with ERNIE-4.0-8K, with particularly large improvements in role-playing and instruction-following abilities; compared with ERNIE 3.5, it has全面 upgraded model capabilities and is widely suitable for complex task scenarios across various fields; it supports automatic connection to Baidu Search plugins to ensure timely answers, and supports 5K tokens input + 2K tokens output. This article introduces the API calling method for ERNIE-4.0-8K-Latest.</td></tr><tr><td>ERNIE-4.0-8K-Preview</td><td>8k</td><td>1k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu's self-developed flagship ultra-large-scale large language model, with全面 upgraded model capabilities compared with ERNIE 3.5, widely suitable for complex task scenarios across various fields; supports automatic connection to Baidu Search plugins to ensure timely answers.</td></tr><tr><td>ERNIE-4.0-Turbo-128K</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large-scale large language model, with excellent overall performance and broad applicability to complex task scenarios across various fields; it supports automatic connection to Baidu Search plugins to ensure timely answers. It performs better than ERNIE 4.0. ERNIE-4.0-Turbo-128K is one version of the model, and its long-document performance is better than ERNIE-3.5-128K. This article introduces the related APIs and usage.</td></tr><tr><td>ERNIE-4.0-Turbo-8K</td><td>8k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large-scale large language model, with excellent overall performance and broad applicability to complex task scenarios across various fields; it supports automatic connection to Baidu Search plugins to ensure timely answers. It performs better than ERNIE 4.0. ERNIE-4.0-Turbo-8K is one version of the model. This article introduces the related APIs and usage.</td></tr><tr><td>ERNIE-4.0-Turbo-8K-Latest</td><td>8k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large-scale large language model, with excellent overall performance and broad applicability to complex task scenarios across various fields; it supports automatic connection to Baidu Search plugins to ensure timely answers. It performs better than ERNIE 4.0. ERNIE-4.0-Turbo-8K is one version of the model.</td></tr><tr><td>ERNIE-4.0-Turbo-8K-Preview</td><td>8k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large-scale large language model, with excellent overall performance and broad applicability to complex task scenarios across various fields; it supports automatic connection to Baidu Search plugins to ensure timely answers. ERNIE-4.0-Turbo-8K-Preview is one version of the model</td></tr><tr><td>ERNIE-Character-8K</td><td>8k</td><td>1k</td><td>Not supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu's self-developed vertical-scenario large language model, suitable for game NPCs, customer service conversations, role-playing dialogue, and similar application scenarios; it has a more distinctive and consistent persona style, stronger instruction-following ability, and better reasoning performance</td></tr><tr><td>ERNIE-Lite-8K</td><td>8k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu's self-developed lightweight large language model, balancing excellent model performance and inference performance, suitable for inference on low-compute AI accelerator cards.</td></tr><tr><td>ERNIE-Lite-Pro-128K</td><td>128k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu's self-developed lightweight large language model, with better performance than ERNIE Lite, balancing excellent model performance and inference performance, suitable for inference on low-compute AI accelerator cards. ERNIE-Lite-Pro-128K supports a 128K context length and performs better than ERNIE-Lite-128K.</td></tr><tr><td>ERNIE-Novel-8K</td><td>8k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Baidu_ernie</td><td>ERNIE-Novel-8K is Baidu's self-developed general-purpose large language model, with a clear advantage in novel continuation and also suitable for short dramas, films, and similar scenarios.</td></tr><tr><td>ERNIE-Speed-128K</td><td>128k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu's latest self-developed high-performance large language model released in 2024, with excellent general capabilities. It is suitable as a base model for fine-tuning to better handle specific scenario problems, while also offering excellent inference performance.</td></tr><tr><td>ERNIE-Speed-8K</td><td>8k</td><td>1k</td><td>Not supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu's latest self-developed high-performance large language model released in 2024, with excellent general capabilities. It is suitable as a base model for fine-tuning to better handle specific scenario problems, while also offering excellent inference performance.</td></tr><tr><td>ERNIE-Speed-Pro-128K</td><td>128k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Baidu_ernie</td><td>ERNIE Speed Pro is Baidu's latest self-developed high-performance large language model released in 2024, with excellent general capabilities. It is suitable as a base model for fine-tuning to better handle specific scenario problems, while also offering excellent inference performance. ERNIE-Speed-Pro-128K is the initial version released on August 30, 2024, supports a 128K context length, and performs better than ERNIE-Speed-128K.</td></tr><tr><td>ERNIE-Tiny-8K</td><td>8k</td><td>1k</td><td>Not supported</td><td>Chat</td><td>Baidu_ernie</td><td>Baidu's self-developed ultra-high-performance large language model, with the lowest deployment and fine-tuning cost in the ERNIE family of models.</td></tr><tr><td>Doubao-1.5-lite-32k</td><td>32k</td><td>12k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>Doubao1.5-lite is also at a world-class level among lightweight language models, matching or surpassing GPT-4omini and Claude 3.5 Haiku on authoritative benchmarks for general performance (MMLU_pro), reasoning (BBH), math (MATH), and professional knowledge (GPQA).<br></td></tr><tr><td>Doubao-1.5-pro-256k</td><td>256k</td><td>12k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>Doubao-1.5-Pro-256k, a fully upgraded version based on Doubao-1.5-Pro. Compared with Doubao-pro-256k/241115, overall performance is improved by 10%. Output length is greatly increased, supporting up to 12k tokens.</td></tr><tr><td>Doubao-1.5-pro-32k</td><td>32k</td><td>12k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>Doubao-1.5-pro is a new-generation flagship model with全面 upgraded performance and excellent results in knowledge, coding, reasoning, and more. It has reached world-leading levels on multiple public evaluation benchmarks, especially achieving the best results on authoritative Chinese benchmarks for knowledge, coding, and reasoning, with overall scores superior to industry-leading models such as GPT-4o and Claude 3.5 Sonnet.</td></tr><tr><td>Doubao-1.5-vision-pro</td><td>32k</td><td>12k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Doubao_doubao</td><td>Doubao-1.5-vision-pro is a newly upgraded multimodal large model that supports image recognition at any resolution and extreme aspect ratios, with enhanced visual reasoning, document recognition, detailed information understanding, and instruction-following capabilities.</td></tr><tr><td>Doubao-embedding</td><td>4k</td><td>-</td><td>Supported</td><td>Embedding</td><td>Doubao_doubao</td><td>Doubao-embedding is a semantic vectorization model developed by ByteDance, mainly for vector retrieval use cases. It supports Chinese and English bilingual input and a maximum context length of 4K. The following versions are currently available:<br><br>text-240715: maximum vector dimension 2560, supports 512, 1024, and 2048 dimensionality reduction. Chinese-English retrieval performance is significantly improved over the text-240515 version; this version is recommended.<br>text-240515: maximum vector dimension 2048, supports 512 and 1024 dimensionality reduction.</td></tr><tr><td>Doubao-embedding-large</td><td>4k</td><td>-</td><td>Not supported</td><td>Embedding</td><td>Doubao_doubao</td><td><br>Chinese-English retrieval performance is significantly improved compared with Doubao-embedding/text-240715 version</td></tr><tr><td>Doubao-embedding-vision</td><td>8k</td><td>-</td><td>Not supported</td><td>Embedding</td><td>Doubao_doubao</td><td>Doubao-embedding-vision, a newly upgraded image-text multimodal vectorization model, mainly for image-text multimodal vector retrieval use cases, supports image input and Chinese-English bilingual text input, with a maximum context length of 8K.</td></tr><tr><td>Doubao-lite-128k</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>Doubao-lite offers extremely fast response speed and better cost-effectiveness, providing more flexible choices for customers in different scenarios. Supports inference and fine-tuning with a 128k context window.</td></tr><tr><td>Doubao-lite-32k</td><td>32k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>Doubao-lite offers extremely fast response speed and better cost-effectiveness, providing more flexible choices for customers in different scenarios. Supports inference and fine-tuning with a 32k context window.</td></tr><tr><td>Doubao-lite-4k</td><td>4k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>Doubao-lite offers extremely fast response speed and better cost-effectiveness, providing more flexible choices for customers in different scenarios. Supports inference and fine-tuning with a 4k context window.</td></tr><tr><td>Doubao-pro-128k</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>The flagship model with the best performance, suitable for handling complex tasks. It performs very well in reference Q&#x26;A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 128k context window.</td></tr><tr><td>Doubao-pro-32k</td><td>32k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>The flagship model with the best performance, suitable for handling complex tasks. It performs very well in reference Q&#x26;A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 32k context window.</td></tr><tr><td>Doubao-pro-4k</td><td>4k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Doubao_doubao</td><td>The flagship model with the best performance, suitable for handling complex tasks. It performs very well in reference Q&#x26;A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 4k context window.</td></tr><tr><td>step-1-128k</td><td>128k</td><td>-</td><td>Supported</td><td>Chat</td><td>StepFun</td><td>The step-1-128k model is an ultra-large language model capable of handling inputs up to 128,000 tokens. This capability gives it a significant advantage in generating long-form content and performing complex reasoning, making it suitable for applications such as novel and script creation that require rich context.</td></tr><tr><td>step-1-256k</td><td>256k</td><td>-</td><td>Supported</td><td>Chat</td><td>StepFun</td><td>The step-1-256k model is currently one of the largest language models, supporting inputs of 256,000 tokens. It is designed to meet extremely complex task requirements, such as large-scale data analysis and multi-turn dialog systems, and can provide high-quality output across a variety of fields.</td></tr><tr><td>step-1-32k</td><td>32k</td><td>-</td><td>Supported</td><td>Chat</td><td>StepFun</td><td>The step-1-32k model expands the context window and supports inputs of 32,000 tokens. This makes it excellent for handling long articles and complex conversations, and suitable for tasks requiring deep understanding and analysis, such as legal documents and academic research.</td></tr><tr><td>step-1-8k</td><td>8k</td><td>-</td><td>Supported</td><td>Chat</td><td>StepFun</td><td>The step-1-8k model is an efficient language model designed specifically for shorter texts. It can perform reasoning within an 8,000-token context and is suitable for scenarios requiring quick responses, such as chatbots and real-time translation.</td></tr><tr><td>step-1-flash</td><td>8k</td><td>-</td><td>Supported</td><td>Chat</td><td>StepFun</td><td>The step-1-flash model focuses on fast response and efficient processing, making it suitable for real-time applications. Its design allows it to provide strong language understanding and generation capabilities even with limited computing resources, making it suitable for mobile devices and edge computing scenarios.</td></tr><tr><td>step-1.5v-mini</td><td>32k</td><td>-</td><td>Supported</td><td>Chat, Image Recognition</td><td>StepFun</td><td>The step-1.5v-mini model is a lightweight version designed to run in resource-constrained environments. Although compact, it still retains good language processing capabilities, making it suitable for embedded systems and low-power devices.</td></tr><tr><td>step-1v-32k</td><td>32k</td><td>-</td><td>Supported</td><td>Chat, Image Recognition</td><td>StepFun</td><td>The step-1v-32k model supports 32,000-token inputs and is suitable for applications that require longer context. It performs well in handling complex conversations and long texts, making it suitable for fields such as customer service and content creation.</td></tr><tr><td>step-1v-8k</td><td>8k</td><td>-</td><td>Supported</td><td>Chat, Image Recognition</td><td>StepFun</td><td>The step-1v-8k model is an optimized version designed specifically for 8,000-token inputs and is suitable for fast generation and processing of short texts. It achieves a good balance between speed and accuracy, making it suitable for real-time applications.</td></tr><tr><td>step-2-16k</td><td>16k</td><td>-</td><td>Supported</td><td>Chat</td><td>StepFun</td><td>The step-2-16k model is a medium-sized language model that supports 16,000-token inputs. It performs well across a variety of tasks and is suitable for applications such as education, training, and knowledge management.<br></td></tr><tr><td>yi-lightning</td><td>16k</td><td>-</td><td>Supported</td><td>Chat</td><td>01.AI_yi</td><td>The latest high-performance model, ensuring high-quality output while greatly improving inference speed.<br>Suitable for real-time interaction and highly complex reasoning scenarios, with extremely high cost-effectiveness that provides excellent product support for commercial products.</td></tr><tr><td>yi-vision-v2</td><td>16K</td><td>-</td><td>Supported</td><td>Chat, Image Recognition</td><td>01.AI_yi</td><td>Suitable for scenarios that require analyzing and explaining images and charts, such as image Q&#x26;A, chart understanding, OCR, visual reasoning, education, research report comprehension, or multilingual document reading.</td></tr><tr><td>qwen-14b-chat</td><td>8k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>Alibaba Cloud's official Tongyi Qianwen open-source version.</td></tr><tr><td>qwen-72b-chat</td><td>32k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>Alibaba Cloud's official Tongyi Qianwen open-source version.</td></tr><tr><td>qwen-7b-chat</td><td>7.5k</td><td>1.5k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>Alibaba Cloud's official Tongyi Qianwen open-source version.</td></tr><tr><td>qwen-coder-plus</td><td>128k</td><td>8k</td><td>Supported</td><td>Chat, Code</td><td>Qwen_qwen</td><td>Qwen-Coder-Plus is a programming-focused model in the Qwen family, designed to improve code generation and comprehension. Trained on large-scale programming data, the model can handle multiple programming languages and supports functions such as code completion, bug detection, and code refactoring. Its design goal is to provide developers with more efficient programming assistance and improve development efficiency.</td></tr><tr><td>qwen-coder-plus-latest</td><td>128k</td><td>8k</td><td>Supported</td><td>Chat, Code</td><td>Qwen_qwen</td><td>Qwen-Coder-Plus-Latest is the latest version of Qwen-Coder-Plus, incorporating the latest algorithm optimizations and dataset updates. The model has significantly improved performance, can more accurately understand context, and generates code that better meets developers' needs. It also introduces support for more programming languages, enhancing multilingual programming capabilities.</td></tr><tr><td>qwen-coder-turbo</td><td>128k</td><td>8k</td><td>Supported</td><td>Chat, Code</td><td>Qwen_qwen</td><td>The Tongyi Qianwen series code and programming model is a language model specially designed for programming and code generation, with fast inference and low cost. This version always points to the latest stable snapshot</td></tr><tr><td>qwen-coder-turbo-latest</td><td>128k</td><td>8k</td><td>Supported</td><td>Chat, Code</td><td>Qwen_qwen</td><td>The Tongyi Qianwen series code and programming model is a language model specially designed for programming and code generation, with fast inference and low cost. This version always points to the latest snapshot</td></tr><tr><td>qwen-long</td><td>10m</td><td>6k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen-Long is Tongyi Qianwen's large language model for ultra-long context processing scenarios. It supports Chinese, English, and other language inputs, and supports ultra-long context conversations of up to 10 million tokens (about 15 million Chinese characters or 15,000 pages of documents). Together with the document service launched at the same time, it can support parsing and conversation for various document formats such as Word, PDF, Markdown, EPUB, and MOBI. Note: When submitting requests directly via HTTP, 1M tokens is supported; for lengths beyond this, submission via file is recommended.</td></tr><tr><td>qwen-math-plus</td><td>4k</td><td>3k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen-Math-Plus is a model focused on solving mathematical problems, designed to provide efficient mathematical reasoning and computation capabilities. Trained on a large number of math question banks, the model can handle complex mathematical expressions and problems, supporting a wide range of computational needs from basic arithmetic to advanced mathematics. Its application scenarios include education, research, and engineering.</td></tr><tr><td>qwen-math-plus-latest</td><td>4k</td><td>3k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen-Math-Plus-Latest is the latest version of Qwen-Math-Plus, integrating the latest mathematical reasoning techniques and algorithmic improvements. The model performs better when handling complex mathematical problems and can provide more accurate answers and reasoning processes. It also expands understanding of mathematical symbols and formulas, making it suitable for a broader range of mathematical application scenarios.</td></tr><tr><td>qwen-math-turbo</td><td>4k</td><td>3k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen-Math-Turbo is a high-performance math model designed for fast computation and real-time reasoning. The model optimizes computational speed and can process a large number of mathematical problems in a very short time, making it suitable for applications that require quick feedback, such as online education and real-time data analysis. Its efficient algorithms enable users to obtain instant results for complex calculations.</td></tr><tr><td>qwen-math-turbo-latest</td><td>4k</td><td>3k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen-Math-Turbo-Latest is the latest version of Qwen-Math-Turbo, further improving computational efficiency and accuracy. The model has undergone multiple algorithmic optimizations, enabling it to handle more complex mathematical problems while maintaining efficiency in real-time reasoning. It is suitable for mathematical applications that require quick responses, such as financial analysis and scientific computing.</td></tr><tr><td>qwen-max</td><td>32k</td><td>8k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>Tongyi Qianwen 2.5 series hundred-billion-parameter ultra-large-scale language model, supporting Chinese, English, and other language inputs. As the model is upgraded, qwen-max will be updated progressively.</td></tr><tr><td>qwen-max-latest</td><td>32k</td><td>8k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>The best-performing model in the Tongyi Qianwen series. This model is dynamically updated, and model updates are not announced in advance. It is suitable for complex, multi-step tasks. Its overall Chinese-English capabilities have been significantly improved, human preference alignment has been significantly improved, reasoning ability and complex instruction understanding have been significantly enhanced, performance on difficult tasks is better, math and code capabilities have been significantly improved, and understanding and generation of structured data such as tables and JSON have been improved.</td></tr><tr><td>qwen-plus</td><td>128k</td><td>8k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>A balanced-capability model in the Tongyi Qianwen series, with inference quality and speed between Tongyi Qianwen-Max and Tongyi Qianwen-Turbo, suitable for moderately complex tasks. Its overall Chinese-English capabilities have been significantly improved, human preference alignment has been significantly improved, reasoning ability and complex instruction understanding have been significantly enhanced, performance on difficult tasks is better, and math and code capabilities have been significantly improved.</td></tr><tr><td>qwen-plus-latest</td><td>128k</td><td>8k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen-Plus is the enhanced vision-language model in the Tongyi Qianwen series, designed to improve detail recognition and text recognition. The model supports images with resolutions over one million pixels and arbitrary aspect ratios, and can perform excellently in a wide range of vision-language tasks, making it suitable for applications requiring high-precision image understanding.</td></tr><tr><td>qwen-turbo</td><td>128k</td><td>8k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>The fastest and lowest-cost model in the Tongyi Qianwen series, suitable for simple tasks. Its overall Chinese-English capabilities have been significantly improved, human preference alignment has been significantly improved, reasoning ability and complex instruction understanding have been significantly enhanced, performance on difficult tasks is better, and math and code capabilities have been significantly improved.</td></tr><tr><td>qwen-turbo-latest</td><td>1m</td><td>8k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen-Turbo is an efficient model designed for simple tasks, emphasizing speed and cost-effectiveness. It performs well on basic vision-language tasks and is suitable for applications with strict response-time requirements, such as real-time image recognition and simple Q&#x26;A systems.</td></tr><tr><td>qwen-vl-max</td><td>32k</td><td>2k</td><td>Supported</td><td>Chat</td><td>Qwen_qwen</td><td>Tongyi Qianwen VL-Max (qwen-vl-max), the ultra-large-scale vision-language model of Tongyi Qianwen. Compared with the enhanced version, it further improves visual reasoning and instruction-following capabilities, providing a higher level of visual perception and cognition. It offers the best performance on more complex tasks.</td></tr><tr><td>qwen-vl-max-latest</td><td>32k</td><td>2k</td><td>Supported</td><td>Chat, Image Recognition</td><td>Qwen_qwen</td><td>Qwen-VL-Max is the most advanced version in the Qwen-VL series, designed to solve complex multimodal tasks. It combines advanced vision and language processing technologies, can understand and analyze high-resolution images, has very strong reasoning ability, and is suitable for application scenarios requiring deep understanding and complex reasoning.</td></tr><tr><td>qwen-vl-ocr</td><td>34k</td><td>4k</td><td>Supported</td><td>Chat, Image Recognition</td><td>Qwen_qwen</td><td>Supports OCR only, not chat.</td></tr><tr><td>qwen-vl-ocr-latest</td><td>34k</td><td>4k</td><td>Supported</td><td>Chat, Image Recognition</td><td>Qwen_qwen</td><td>Supports OCR only, not chat.</td></tr><tr><td>qwen-vl-plus</td><td>8k</td><td>2k</td><td>Supported</td><td>Chat, Image Recognition</td><td>Qwen_qwen</td><td>Tongyi Qianwen VL-Plus (qwen-vl-plus), the enhanced version of Tongyi Qianwen's large-scale vision-language model. It greatly improves detail recognition and text recognition, and supports images with resolutions over one million pixels and arbitrary aspect ratios. It delivers excellent performance across a wide range of vision tasks.</td></tr><tr><td>qwen-vl-plus-latest</td><td>32k</td><td>2k</td><td>Supported</td><td>Chat, Image Recognition</td><td>Qwen_qwen</td><td>Qwen-VL-Plus-Latest is the latest version of Qwen-VL-Plus, with enhanced multimodal understanding capabilities. It performs excellently in combined image and text processing, making it suitable for applications that need to handle various input formats efficiently, such as intelligent customer service and content generation.</td></tr><tr><td>Qwen/Qwen2-1.5B-Instruct</td><td>32k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen2-1.5B-Instruct is an instruction-tuned large language model in the Qwen2 series, with 1.5B parameters. The model is based on the Transformer architecture and uses techniques such as SwiGLU activation, attention QKV bias, and grouped query attention. It performs excellently on multiple benchmarks in language understanding, generation, multilingual capability, coding, mathematics, and reasoning, surpassing most open-source models.</td></tr><tr><td>Qwen/Qwen2-72B-Instruct</td><td>128k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen2-72B-Instruct is an instruction-tuned large language model in the Qwen2 series, with 72B parameters. The model is based on the Transformer architecture and uses techniques such as SwiGLU activation, attention QKV bias, and grouped query attention. It can handle large-scale inputs. The model performs excellently on multiple benchmarks in language understanding, generation, multilingual capability, coding, mathematics, and reasoning, surpassing most open-source models.</td></tr><tr><td>Qwen/Qwen2-7B-Instruct</td><td>128k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen2-7B-Instruct is an instruction-tuned large language model in the Qwen2 series, with 7B parameters. The model is based on the Transformer architecture and uses techniques such as SwiGLU activation, attention QKV bias, and grouped query attention. It can handle large-scale inputs. The model performs excellently on multiple benchmarks in language understanding, generation, multilingual capability, coding, mathematics, and reasoning, surpassing most open-source models.</td></tr><tr><td>Qwen/Qwen2-VL-72B-Instruct</td><td>32k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen2-VL is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can understand videos longer than 20 minutes and is used for high-quality video-based Q&#x26;A, conversation, and content creation. It also has complex reasoning and decision-making capabilities and can be integrated with mobile devices, robots, and more to perform automatic operations based on visual environments and text instructions.</td></tr><tr><td>Qwen/Qwen2-VL-7B-Instruct</td><td>32k</td><td>-</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen2-VL-7B-Instruct is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can be used for high-quality video-based Q&#x26;A, conversation, and content creation, and it also has complex reasoning and decision-making capabilities. It can be integrated with mobile devices, robots, and more to perform automatic operations based on visual environments and text instructions.</td></tr><tr><td>Qwen/Qwen2.5-72B-Instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 72B model has significantly improved capabilities in coding and mathematics. It supports inputs up to 128K tokens and can generate long texts of more than 8K tokens.</td></tr><tr><td>Qwen/Qwen2.5-72B-Instruct-128K</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 72B model has significantly improved capabilities in coding and mathematics. It supports inputs up to 128K tokens and can generate long texts of more than 8K tokens.</td></tr><tr><td>Qwen/Qwen2.5-7B-Instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model has significantly improved capabilities in coding and mathematics. The model also provides multilingual support covering more than 29 languages, including Chinese and English. The model has significantly improved instruction following, structured data understanding, and structured output generation, especially JSON.</td></tr><tr><td>Qwen/Qwen2.5-Coder-32B-Instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat, Code</td><td>Qwen_qwen</td><td>Qwen2.5-32B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 32B model has significantly improved capabilities in coding and mathematics. The model also provides multilingual support covering more than 29 languages, including Chinese and English. The model has significantly improved instruction following, structured data understanding, and structured output generation, especially JSON.</td></tr><tr><td>Qwen/Qwen2.5-Coder-7B-Instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model has significantly improved capabilities in coding and mathematics. The model also provides multilingual support covering more than 29 languages, including Chinese and English. The model has significantly improved instruction following, structured data understanding, and structured output generation, especially JSON.</td></tr><tr><td>Qwen/QwQ-32B-Preview</td><td>32k</td><td>16k</td><td>Not supported</td><td>Chat, Reasoning</td><td>Qwen_qwen</td><td>QwQ-32B-Preview is an experimental research model developed by the Qwen team to improve AI reasoning ability. As a preview version, it demonstrates excellent analytical ability, but also has some important limitations:<br>1. Language mixing and code switching: The model may mix languages or unexpectedly switch between languages, affecting the clarity of responses.<br>2. Recursive reasoning loops: The model may enter a looped reasoning mode, leading to lengthy answers without a clear conclusion.<br>3. Safety and ethical considerations: The model requires stronger safety measures to ensure reliable and secure performance, and users should use it with caution.<br>4. Performance and benchmark limitations: The model performs well in mathematics and programming, but still has room for improvement in other areas such as common-sense reasoning and nuanced language understanding.</td></tr><tr><td>qwen1.5-110b-chat</td><td>32k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen1.5-14b-chat</td><td>8k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen1.5-32b-chat</td><td>32k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen1.5-72b-chat</td><td>32k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen1.5-7b-chat</td><td>8k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen2-57b-a14b-instruct</td><td>65k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>Qwen2-72B-Instruct</td><td>-</td><td>-</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen2-7b-instruct</td><td>128k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen2-math-72b-instruct</td><td>4k</td><td>3k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen2-math-7b-instruct</td><td>4k</td><td>3k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen2.5-14b-instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen2.5-32b-instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen2.5-72b-instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen2.5-7b-instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen2.5-coder-14b-instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat, Code</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen2.5-coder-32b-instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat, Code</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen2.5-coder-7b-instruct</td><td>128k</td><td>8k</td><td>Not supported</td><td>Chat, Code</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen2.5-math-72b-instruct</td><td>4k</td><td>3k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>qwen2.5-math-7b-instruct</td><td>4k</td><td>3k</td><td>Not supported</td><td>Chat</td><td>Qwen_qwen</td><td>-</td></tr><tr><td>deepseek-ai/DeepSeek-R1</td><td>64k</td><td>-</td><td>Not supported</td><td>Chat, Reasoning</td><td>DeepSeek</td><td>The DeepSeek-R1 model is an open-source reasoning model based on pure reinforcement learning. It performs exceptionally well on tasks such as mathematics, code, and natural language reasoning, with performance comparable to OpenAI's o1 model, and has achieved excellent results on multiple benchmarks.</td></tr><tr><td>deepseek-ai/DeepSeek-V2-Chat</td><td>128k</td><td>-</td><td>Not supported</td><td>Chat</td><td>DeepSeek</td><td>DeepSeek-V2 is a powerful, cost-effective Mixture of Experts (MoE) language model. It was pre-trained on a high-quality corpus of 8.1 trillion tokens and further improved through supervised fine-tuning (SFT) and reinforcement learning (RL). Compared with DeepSeek 67B, DeepSeek-V2 delivers stronger performance while saving 42.5% in training costs, reducing KV cache by 93.3%, and increasing maximum generation throughput by 5.76x.</td></tr><tr><td>deepseek-ai/DeepSeek-V2.5</td><td>32k</td><td>-</td><td>Supported</td><td>Chat</td><td>DeepSeek</td><td>DeepSeek-V2.5 is an upgraded version of DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating the general and coding capabilities of the two previous versions. The model has been optimized in multiple aspects, including writing and instruction-following abilities, and is better aligned with human preferences.</td></tr><tr><td>deepseek-ai/DeepSeek-V3</td><td>128k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>DeepSeek</td><td>The open-source version of DeepSeek has a longer context than the official version and no issues with refusing to answer due to sensitive words.</td></tr><tr><td>deepseek-chat</td><td>64k</td><td>8k</td><td>Supported</td><td>Chat</td><td>DeepSeek</td><td>236B parameters, 64K context (API), and the top open-source ranking in comprehensive Chinese capabilities (AlignBench); on par with closed-source models such as GPT-4-Turbo and Wenxin 4.0 in evaluations</td></tr><tr><td>deepseek-coder</td><td>64k</td><td>8k</td><td>Supported</td><td>Chat, Code</td><td>DeepSeek</td><td>236B parameters, 64K context (API), and the top open-source ranking in comprehensive Chinese capabilities (AlignBench); on par with closed-source models such as GPT-4-Turbo and Wenxin 4.0 in evaluations</td></tr><tr><td>deepseek-reasoner</td><td>64k</td><td>8k</td><td>Supported</td><td>Chat, Reasoning</td><td>DeepSeek</td><td>DeepSeek-Reasoner (DeepSeek-R1) is the latest reasoning model launched by DeepSeek, designed to improve reasoning capabilities through reinforcement learning training. The model's reasoning process includes extensive reflection and verification, enabling it to handle complex logical reasoning tasks, with a chain-of-thought length reaching tens of thousands of words. DeepSeek-R1 performs excellently in solving mathematics, code, and other complex problems, and has been widely used in various scenarios, demonstrating its strong reasoning ability and flexibility. Compared with other models, DeepSeek-R1's reasoning performance is close to top closed-source models, showcasing the potential and competitiveness of open-source models in the reasoning field.</td></tr><tr><td>hunyuan-code</td><td>4k</td><td>4k</td><td>Not supported</td><td>Chat, Code</td><td>Tencent Hunyuan</td><td>Hunyuan's latest code generation model, trained further on a base model with 200B high-quality code data and iterated for half a year with high-quality SFT data. The long context window has been increased to 8K, and it ranks among the top in automatic evaluation metrics for code generation in five major languages; in high-quality human evaluations of comprehensive code tasks across 10 aspects in five major languages, its performance is in the first tier.</td></tr><tr><td>hunyuan-functioncall</td><td>28k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Tencent Hunyuan</td><td>Hunyuan's latest MoE-architecture FunctionCall model, trained on high-quality FunctionCall data, with a context window of 32K, leading in benchmark metrics across multiple dimensions.</td></tr><tr><td>hunyuan-large</td><td>28k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Tencent Hunyuan</td><td>The Hunyuan-large model has a total parameter count of about 389B and an activated parameter count of about 52B. It is currently the largest open-source MoE model with the Transformer architecture and one of the best-performing in the industry.</td></tr><tr><td>hunyuan-large-longcontext</td><td>128k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Tencent Hunyuan</td><td>Specialized in long-document tasks such as document summarization and document Q&#x26;A, while also capable of handling general text generation tasks. It excels at analyzing and generating long text, effectively addressing complex and detailed long-form content processing needs.</td></tr><tr><td>hunyuan-lite</td><td>250k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Tencent Hunyuan</td><td>Upgraded to an MoE structure with a context window of 256k, leading many open-source models across multiple benchmark sets in NLP, code, math, and industry-specific tasks.</td></tr><tr><td>hunyuan-pro</td><td>28k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Tencent Hunyuan</td><td>A trillion-parameter-scale MoE-32K long-context model. It achieves absolute领先 performance on various benchmarks, supports complex instructions and reasoning, has strong mathematical capabilities, supports function call, and is specially optimized for multilingual translation and applications in finance, law, and medicine.</td></tr><tr><td>hunyuan-role</td><td>28k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Tencent Hunyuan</td><td>The latest Hunyuan role-playing model, officially fine-tuned and launched by Hunyuan. It is trained based on the Hunyuan model combined with role-playing scenario datasets, and delivers better baseline performance in role-playing scenarios.</td></tr><tr><td>hunyuan-standard</td><td>30k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Tencent Hunyuan</td><td>Uses a better routing strategy while also alleviating the problems of load balancing and expert convergence.<br>MOE-32K offers a relatively higher cost-performance ratio; while balancing effectiveness and price, it can handle long-text inputs.</td></tr><tr><td>hunyuan-standard-256K</td><td>250k</td><td>6k</td><td>Not supported</td><td>Chat</td><td>Tencent Hunyuan</td><td>Uses a better routing strategy while also alleviating the problems of load balancing and expert convergence. In long-document tasks, the needle-in-a-haystack metric reaches 99.9%. MOE-256K makes further breakthroughs in length and performance, greatly expanding the maximum input length.</td></tr><tr><td>hunyuan-translation-lite</td><td>4k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Tencent Hunyuan</td><td>The Hunyuan translation model supports natural-language conversational translation; it supports mutual translation among Chinese and 15 languages: English, Japanese, French, Portuguese, Spanish, Turkish, Russian, Arabic, Korean, Italian, German, Vietnamese, Malay, and Indonesian.</td></tr><tr><td>hunyuan-turbo</td><td>28k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Tencent Hunyuan</td><td>The default version of the Hunyuan-turbo model uses a brand-new Mixture of Experts (MoE) structure. Compared with hunyuan-pro, it has faster inference efficiency and stronger performance.</td></tr><tr><td>hunyuan-turbo-latest</td><td>28k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Tencent Hunyuan</td><td>The dynamically updated version of the Hunyuan-turbo model is the best-performing version in the Hunyuan model series and is consistent with the consumer app version (Tencent Yuanbao).</td></tr><tr><td>hunyuan-turbo-vision</td><td>8k</td><td>2k</td><td>Supported</td><td>Image recognition, conversation</td><td>Tencent Hunyuan</td><td>Hunyuan's new-generation flagship vision-language large model uses a brand-new Mixture of Experts (MoE) structure. Compared with the previous generation, it has全面 improved capabilities in image-text understanding, including basic recognition, content creation, knowledge Q&#x26;A, analysis, and reasoning. Maximum input: 6K, maximum output: 2K</td></tr><tr><td>hunyuan-vision</td><td>8k</td><td>2k</td><td>Supported</td><td>Chat, Image Recognition</td><td>Tencent Hunyuan</td><td>Hunyuan's latest multimodal model supports generating text content from image + text inputs.<br>Basic image recognition: identify subjects, elements, scenes, etc. in images<br>Image content creation: summarize images, create ad copy, social media posts, poems, etc.<br>Multi-turn image dialogue: perform multi-turn interactive Q&#x26;A on a single image<br>Image analysis and reasoning: conduct statistical analysis of logical relationships, math problems, code, and charts in images<br>Image knowledge Q&#x26;A: ask and answer questions about knowledge points contained in images, such as historical events and movie posters<br>Image OCR: recognize text in images from natural-life scenes and non-natural scenes.</td></tr><tr><td>SparkDesk-Lite</td><td>4k</td><td>-</td><td>Not supported</td><td>Chat</td><td>SparkDesk</td><td>Supports online web search, with fast and convenient responses, suitable for customized scenarios such as low-compute inference and model fine-tuning</td></tr><tr><td>SparkDesk-Max</td><td>128k</td><td>-</td><td>Supported</td><td>Chat</td><td>SparkDesk</td><td>Quantized from the latest Spark large model engine 4.0 Turbo, it supports multiple built-in plugins such as web search, weather, and date. Core capabilities have been全面 upgraded, application performance has generally improved across scenarios, and it supports System persona settings and FunctionCall function calls</td></tr><tr><td>SparkDesk-Max-32k</td><td>32k</td><td>-</td><td>Supported</td><td>Chat</td><td>SparkDesk</td><td>Stronger reasoning: better context understanding and logical reasoning; longer input: supports text input of up to 32K tokens, suitable for scenarios such as long-document reading and private knowledge Q&#x26;A</td></tr><tr><td>SparkDesk-Pro</td><td>128k</td><td>-</td><td>Not supported</td><td>Chat</td><td>SparkDesk</td><td>Specially optimized for scenarios such as math, code, medicine, and education. Supports multiple built-in plugins such as web search, weather, and date, covering most scenarios including knowledge Q&#x26;A, language understanding, and text creation</td></tr><tr><td>SparkDesk-Pro-128K</td><td>128k</td><td>-</td><td>Not supported</td><td>Chat</td><td>SparkDesk</td><td>A professional-grade large language model with tens of billions of parameters, specially optimized for scenarios such as medicine, education, and code. Search scenarios have lower latency. Suitable for business scenarios such as text and intelligent Q&#x26;A that require higher performance and response speed.</td></tr><tr><td>moonshot-v1-128k</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Moonshot</td><td>A model with a length of 8k, suitable for generating short text.</td></tr><tr><td>moonshot-v1-32k</td><td>32k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Moonshot</td><td>A model with a length of 32k, suitable for generating long text.</td></tr><tr><td>moonshot-v1-8k</td><td>8k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Moonshot</td><td>A model with a length of 128k, suitable for generating ultra-long text.</td></tr><tr><td>codegeex-4</td><td>128k</td><td>4k</td><td>Not supported</td><td>Chat, Code</td><td>Zhipu CodeGeex</td><td>Zhipu's code model: suitable for code auto-completion tasks</td></tr><tr><td>charglm-3</td><td>4k</td><td>2k</td><td>Not supported</td><td>Chat</td><td>Zhipu GLM</td><td>Anthropomorphic model</td></tr><tr><td>emohaa</td><td>8k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Zhipu GLM</td><td>Psychological model: equipped with professional counseling capabilities to help users understand emotions and cope with emotional issues</td></tr><tr><td>glm-3-turbo</td><td>128k</td><td>4k</td><td>Not supported</td><td>Chat</td><td>Zhipu GLM</td><td>Will soon be deprecated (June 30, 2025)</td></tr><tr><td>glm-4</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu GLM</td><td>Old flagship: released on January 16, 2024, and has now been replaced by GLM-4-0520</td></tr><tr><td>glm-4-0520</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu GLM</td><td>High-intelligence model: suitable for handling highly complex and diverse tasks</td></tr><tr><td>glm-4-air</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu GLM</td><td>High cost-performance: the most balanced model between reasoning capability and price</td></tr><tr><td>glm-4-airx</td><td>8k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu GLM</td><td>Ultra-fast reasoning: extremely fast inference speed and strong reasoning performance</td></tr><tr><td>glm-4-flash</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu GLM</td><td>High speed and low cost: ultra-fast inference speed</td></tr><tr><td>glm-4-flashx</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu GLM</td><td>High speed and low cost: Flash enhanced version, ultra-fast inference speed</td></tr><tr><td>glm-4-long</td><td>1m</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu GLM</td><td>Ultra-long input: designed specifically for ultra-long text and memory-intensive tasks</td></tr><tr><td>glm-4-plus</td><td>128k</td><td>4k</td><td>Supported</td><td>Chat</td><td>Zhipu GLM</td><td>High-intelligence flagship:全面 improved performance, with significantly enhanced long-text and complex-task capabilities</td></tr><tr><td>glm-4v</td><td>2k</td><td>-</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Zhipu GLM</td><td>Image understanding: has image understanding and reasoning capabilities</td></tr><tr><td>glm-4v-flash</td><td>2k</td><td>1k</td><td>Not supported</td><td>Chat, Image Recognition</td><td>Zhipu GLM</td><td>Free model: has powerful image understanding capabilities</td></tr></tbody></table>
