Model Data

The following information is for reference only. If there are any errors, please contact us for correction. Some models may have different context sizes and model information depending on the service provider; When entering data on the client side, "k" needs to be converted into an actual numerical value (theoretically 1k=1024 tokens; 1m=1024k tokens). For example, 8k is 8×1024=8192 tokens. It is recommended to use ×1000 in actual use to prevent errors, e.g., 8k as 8×1000=8000, 1m=1×1000000=1000000; Models with a maximum output of "-" indicate that explicit maximum output information was not found from official sources.

Model Name
Max Input
Max Output
Function Calling
Model Capabilities
Service Provider
Description

360gpt-pro

8k

-

Not supported

Conversation

360AI_360gpt

The most effective flagship billion-parameter large model in the 360 AI Brain series, widely applicable to complex task scenarios in various fields.

360gpt-turbo

7k

-

Not supported

Conversation

360AI_360gpt

A 10-billion-parameter large model that balances performance and effectiveness, suitable for scenarios with high performance/cost requirements.

360gpt-turbo-responsibility-8k

8k

-

Not supported

Conversation

360AI_360gpt

A 10-billion-parameter large model that balances performance and effectiveness, suitable for scenarios with high performance/cost requirements.

360gpt2-pro

8k

-

Not supported

Conversation

360AI_360gpt

The most effective flagship billion-parameter large model in the 360 AI Brain series, widely applicable to complex task scenarios in various fields.

claude-3-5-sonnet-20240620

200k

16k

Not supported

Conversation, Image recognition

Anthropic_claude

A snapshot version released on June 20, 2024. Claude 3.5 Sonnet is a model that balances performance and speed, offering top-tier performance while maintaining high speed, and supports multimodal input.

claude-3-5-haiku-20241022

200k

16k

Not supported

Conversation

Anthropic_claude

A snapshot version released on October 22, 2024. Claude 3.5 Haiku has improved in various skills, including coding, tool use, and reasoning. As the fastest model in the Anthropic series, it provides rapid response times, suitable for applications requiring high interactivity and low latency, such as user-facing chatbots and instant code completion. It also excels in specialized tasks like data extraction and real-time content moderation, making it a versatile tool for various industries. It does not support image input.

claude-3-5-sonnet-20241022

200k

8K

Not supported

Conversation, Image recognition

Anthropic_claude

A snapshot version released on October 22, 2024. Claude 3.5 Sonnet offers capabilities beyond Opus and faster speed than Sonnet, while maintaining the same price as Sonnet. Sonnet is particularly good at programming, data science, visual processing, and agent tasks.

claude-3-5-sonnet-latest

200K

8k

Not supported

Conversation, Image recognition

Anthropic_claude

Dynamically points to the latest Claude 3.5 Sonnet version. Claude 3.5 Sonnet offers capabilities beyond Opus and faster speed than Sonnet, while maintaining the same price as Sonnet. Sonnet is particularly good at programming, data science, visual processing, and agent tasks. This model points to the latest version.

claude-3-haiku-20240307

200k

4k

Not supported

Conversation, Image recognition

Anthropic_claude

Claude 3 Haiku is Anthropic's fastest and most compact model, designed for near-instant responses. It offers fast and accurate targeted performance.

claude-3-opus-20240229

200k

4k

Not supported

Conversation, Image recognition

Anthropic_claude

Claude 3 Opus is Anthropic's most powerful model for handling highly complex tasks. It excels in performance, intelligence, fluency, and understanding.

claude-3-sonnet-20240229

200k

8k

Not supported

Conversation, Image recognition

Anthropic_claude

A snapshot version released on February 29, 2024. Sonnet particularly excels at: - Coding: Ability to autonomously write, edit, and run code, with reasoning and troubleshooting capabilities. - Data Science: Enhances human data science expertise; able to process unstructured data when acquiring insights using multiple tools. - Visual Processing: Proficient at interpreting charts, graphs, and images, accurately transcribing text to gain insights beyond the text itself. - Agent Tasks: Excellent tool usage, very suitable for handling agent tasks (i.e., complex multi-step problem-solving tasks requiring interaction with other systems).

google/gemma-2-27b-it

8k

-

Not supported

Conversation

Google_gamma

Gemma is a family of lightweight, state-of-the-art open models developed by Google, built with the same research and technology used for the Gemini models. These models are decoder-only large language models that support English, offering open weights in both pre-trained and instruction-tuned variants. Gemma models are suitable for various text generation tasks, including question answering, summarization, and reasoning.

google/gemma-2-9b-it

8k

-

Not supported

Conversation

Google_gamma

Gemma is one of the lightweight, state-of-the-art open model families developed by Google. It is a decoder-only large language model that supports English, offering open weights, pre-trained variants, and instruction-tuned variants. Gemma models are suitable for various text generation tasks, including question answering, summarization, and reasoning. This 9B model was trained with 8 trillion tokens.

gemini-1.5-pro

2m

8k

Not supported

Conversation

Google_gemini

The latest stable version of Gemini 1.5 Pro. As a powerful multimodal model, it can process up to 60,000 lines of code or 2,000 pages of text. Particularly suitable for tasks requiring complex reasoning.

gemini-1.0-pro-001

33k

8k

Not supported

Conversation

Google_gemini

This is the stable version of Gemini 1.0 Pro. As an NLP model, it specializes in handling multi-turn text and code chat as well as code generation tasks. This model will be deprecated on February 15, 2025. It is recommended to migrate to the 1.5 series models.

gemini-1.0-pro-002

32k

8k

Not supported

Conversation

Google_gemini

This is the stable version of Gemini 1.0 Pro. As an NLP model, it specializes in handling multi-turn text and code chat as well as code generation tasks. This model will be deprecated on February 15, 2025. It is recommended to migrate to the 1.5 series models.

gemini-1.0-pro-latest

33k

8k

Not supported

Conversation, Deprecated or soon to be deprecated

Google_gemini

This is the latest version of Gemini 1.0 Pro. As an NLP model, it specializes in handling multi-turn text and code chat as well as code generation tasks. This model will be deprecated on February 15, 2025. It is recommended to migrate to the 1.5 series models.

gemini-1.0-pro-vision-001

16k

2k

Not supported

Conversation

Google_gemini

This is the vision version of Gemini 1.0 Pro. This model will be deprecated on February 15, 2025. It is recommended to migrate to the 1.5 series models.

gemini-1.0-pro-vision-latest

16k

2k

Not supported

Image recognition

Google_gemini

This is the latest vision version of Gemini 1.0 Pro. This model will be deprecated on February 15, 2025. It is recommended to migrate to the 1.5 series models.

gemini-1.5-flash

1m

8k

Not supported

Conversation, Image recognition

Google_gemini

This is the latest stable version of Gemini 1.5 Flash. As a balanced multimodal model, it can process audio, images, video, and text inputs.

gemini-1.5-flash-001

1m

8k

Not supported

Conversation, Image recognition

Google_gemini

This is the stable version of Gemini 1.5 Flash. They offer the same basic functions as gemini-1.5-flash, but with a fixed version, suitable for production environments.

gemini-1.5-flash-002

1m

8k

Not supported

Conversation, Image recognition

Google_gemini

This is the stable version of Gemini 1.5 Flash. They offer the same basic functions as gemini-1.5-flash, but with a fixed version, suitable for production environments.

gemini-1.5-flash-8b

1m

8k

Not supported

Conversation, Image recognition

Google_gemini

Gemini 1.5 Flash-8B is a new multimodal AI model from Google, designed for efficient processing of large-scale tasks. With 8 billion parameters, it supports text, image, audio, and video input, suitable for various applications such as chat, transcription, and translation. Compared to other Gemini models, Flash-8B is optimized for speed and cost-effectiveness, particularly appealing to cost-sensitive users. Its rate limit has doubled, enabling developers to process large-scale tasks more efficiently. Furthermore, Flash-8B uses "knowledge distillation" technology to extract key knowledge from larger models, ensuring lightweight and efficient performance while maintaining core capabilities.

gemini-1.5-flash-exp-0827

1m

8k

Not supported

Conversation, Image recognition

Google_gemini

This is an experimental version of Gemini 1.5 Flash, regularly updated to include the latest improvements. Suitable for exploratory testing and prototype development, not recommended for production environments.

gemini-1.5-flash-latest

1m

8k

Not supported

Conversation, Image recognition

Google_gemini

This is the cutting-edge version of Gemini 1.5 Flash, regularly updated to include the latest improvements. Suitable for exploratory testing and prototype development, not recommended for production environments.

gemini-1.5-pro-001

2m

8k

Not supported

Conversation, Image recognition

Google_gemini

This is the stable version of Gemini 1.5 Pro, offering fixed model behavior and performance characteristics. Suitable for production environments requiring stability.

gemini-1.5-pro-002

2m

8k

Not supported

Conversation, Image recognition

Google_gemini

This is the stable version of Gemini 1.5 Pro, offering fixed model behavior and performance characteristics. Suitable for production environments requiring stability.

gemini-1.5-pro-exp-0801

2m

8k

Not supported

Conversation, Image recognition

Google_gemini

Experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can process up to 60,000 lines of code or 2,000 pages of text. Particularly suitable for tasks requiring complex reasoning.

gemini-1.5-pro-exp-0827

2m

8k

Not supported

Conversation, Image recognition

Google_gemini

Experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can process up to 60,000 lines of code or 2,000 pages of text. Particularly suitable for tasks requiring complex reasoning.

gemini-1.5-pro-latest

2m

8k

Not supported

Conversation, Image recognition

Google_gemini

This is the latest version of Gemini 1.5 Pro, dynamically pointing to the latest snapshot version.

gemini-2.0-flash

1m

8k

Not supported

Conversation, Image recognition

Google_gemini

Gemini 2.0 Flash is Google's newly launched model, offering faster first token generation speed (TTFT) compared to version 1.5, while maintaining a quality level comparable to Gemini Pro 1.5; this model has significantly improved in multimodal understanding, code capabilities, complex instruction execution, and function calling, thus providing a smoother and more powerful intelligent experience.

gemini-2.0-flash-exp

100k

8k

Supported

Conversation, Image recognition

Google_gemini

Gemini 2.0 Flash introduces multimodal real-time APIs, improved speed and performance, enhanced quality, augmented agent capabilities, and added image generation and speech conversion features.

gemini-2.0-flash-lite-preview-02-05

1M

8k

Not supported

Conversation, Image recognition

Google_gemini

Gemini 2.0 Flash-Lite is Google's newly released cost-effective AI model, offering better quality while maintaining the same speed as 1.5 Flash; it supports a 1 million token context window and can handle multimodal tasks such as images, audio, and code; as Google's most cost-effective model, it adopts a simplified single pricing strategy, particularly suitable for large-scale applications that need to control costs.

gemini-2.0-flash-thinking-exp

40k

8k

Not supported

Conversation, Reasoning

Google_gemini

gemini-2.0-flash-thinking-exp is an experimental model that can generate the "thought process" it undergoes when responding. Therefore, responses in "thinking mode" exhibit stronger reasoning capabilities compared to the basic Gemini 2.0 Flash model.

gemini-2.0-flash-thinking-exp-01-21

1m

64k

Not supported

Conversation, Reasoning

Google_gemini

Gemini 2.0 Flash Thinking EXP-01-21 is Google's latest AI model, focusing on improving reasoning capabilities and user interaction experience. This model has strong reasoning capabilities, especially excelling in mathematics and programming, and supports a context window of up to 1 million tokens, suitable for complex tasks and in-depth analysis scenarios. Its unique feature is the ability to generate thought processes, improving the comprehensibility of AI thinking, while also supporting native code execution, enhancing the flexibility and practicality of interaction. Through optimized algorithms, the model reduces logical inconsistencies, further improving the accuracy and consistency of answers.

gemini-2.0-flash-thinking-exp-1219

40k

8k

Not supported

Conversation, Reasoning, Image recognition

Google_gemini

gemini-2.0-flash-thinking-exp-1219 is an experimental model that can generate the "thought process" it undergoes when responding. Therefore, responses in "thinking mode" exhibit stronger reasoning capabilities compared to the basic Gemini 2.0 Flash model.

gemini-2.0-pro-exp-01-28

2m

64k

Not supported

Conversation, Image recognition

Google_gemini

Pre-added model, not yet launched.

gemini-2.0-pro-exp-02-05

2m

8k

Not supported

Conversation, Image recognition

Google_gemini

Gemini 2.0 Pro Exp 02-05 is Google's latest experimental model released in February 2024, excelling in world knowledge, code generation, and long text understanding; this model supports an ultra-long context window of 2 million tokens, capable of processing 2 hours of video, 22 hours of audio, over 60,000 lines of code, and over 1.4 million words of content; as part of the Gemini 2.0 series, this model uses a new Flash Thinking training strategy, significantly improving performance and ranking high in multiple LLM scoring lists, demonstrating strong comprehensive capabilities.

gemini-exp-1114

8k

4k

Not supported

Conversation, Image recognition

Google_gemini

This is an experimental model, released on November 14, 2024, primarily focused on quality improvements.

gemini-exp-1121

8k

4k

Not supported

Conversation, Image recognition, Code

Google_gemini

This is an experimental model, released on November 21, 2024, with improved coding, reasoning, and visual capabilities.

gemini-exp-1206

8k

4k

Not supported

Conversation, Image recognition

Google_gemini

This is an experimental model, released on December 6, 2024, with improved coding, reasoning, and visual capabilities.

gemini-exp-latest

8k

4k

Not supported

Conversation, Image recognition

Google_gemini

This is an experimental model, dynamically pointing to the latest version.

gemini-pro

33k

8k

Not supported

Conversation

Google_gemini

Same as gemini-1.0-pro, an alias for gemini-1.0-pro.

gemini-pro-vision

16k

2k

Not supported

Conversation, Image recognition

Google_gemini

This is the vision version of Gemini 1.0 Pro. This model will be deprecated on February 15, 2025. It is recommended to migrate to the 1.5 series models.

grok-2

128k

-

Not supported

Conversation

Grok_grok

New version of the Grok model released by X.ai on 2024.12.12.

grok-2-1212

128k

-

Not supported

Conversation

Grok_grok

New version of the Grok model released by X.ai on 2024.12.12.

grok-2-latest

128k

-

Not supported

Conversation

Grok_grok

New version of the Grok model released by X.ai on 2024.12.12.

grok-2-vision-1212

32k

-

Not supported

Conversation, Image recognition

Grok_grok

Vision version of the Grok model released by X.ai on 2024.12.12.

grok-beta

100k

-

Not supported

Conversation

Grok_grok

Performance comparable to Grok 2, but with improved efficiency, speed, and functionality.

grok-vision-beta

8k

-

Not supported

Conversation, Image recognition

Grok_grok

The latest image understanding model can process various visual information, including documents, charts, screenshots, and photos.

internlm/internlm2_5-20b-chat

32k

-

Supported

Conversation

internlm

InternLM2.5-20B-Chat is an open-source large-scale conversational model developed based on the InternLM2 architecture. This model has 20 billion parameters and excels in mathematical reasoning, outperforming Llama3 and Gemma2-27B models of similar size. InternLM2.5-20B-Chat has significantly improved tool-calling capabilities, supporting information collection from hundreds of web pages for analysis and reasoning, and possessing stronger instruction understanding, tool selection, and result reflection capabilities.

meta-llama/Llama-3.2-11B-Vision-Instruct

8k

-

Not supported

Conversation, Image recognition

Meta_llama

Currently, the Llama series models can not only process text data but also image data; some models of Llama3.2 have added visual understanding capabilities. This model supports simultaneous input of text and image data, understanding images, and outputting text information.

meta-llama/Llama-3.2-3B-Instruct

32k

-

Not supported

Conversation

Meta_llama

Meta Llama 3.2 multilingual large language model (LLM), where 1B and 3B are lightweight models that can run on edge and mobile devices. This model is the 3B version.

meta-llama/Llama-3.2-90B-Vision-Instruct

8k

-

Not supported

Conversation, Image recognition

Meta_llama

Currently, the Llama series models can not only process text data but also image data; some models of Llama3.2 have added visual understanding capabilities. This model supports simultaneous input of text and image data, understanding images, and outputting text information.

meta-llama/Llama-3.3-70B-Instruct

131k

-

Not supported

Conversation

Meta_llama

Meta's latest 70B LLM, with performance comparable to Llama 3.1 405B.

meta-llama/Meta-Llama-3.1-405B-Instruct

32k

-

Not supported

Conversation

Meta_llama

The Meta Llama 3.1 multilingual large language model (LLM) collection is a set of pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. This model is the 405B version. Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual conversations and outperform many available open-source and closed-source chat models on common industry benchmarks.

meta-llama/Meta-Llama-3.1-70B-Instruct

32k

-

Not supported

Conversation

Meta_llama

Meta Llama 3.1 is a family of multilingual large language models developed by Meta, including pre-trained and instruction-tuned variants with 8B, 70B, and 405B parameters. This 70B instruction-tuned model is optimized for multilingual conversation scenarios and performs excellently across multiple industry benchmarks. The model was trained using over 15 trillion tokens of publicly available data and employs techniques such as supervised fine-tuning and human feedback reinforcement learning to enhance its usefulness and safety.

meta-llama/Meta-Llama-3.1-8B-Instruct

32k

-

Not supported

Conversation

Meta_llama

The Meta Llama 3.1 multilingual large language model (LLM) collection is a set of pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. This model is the 8B version. Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual conversations and outperform many available open-source and closed-source chat models on common industry benchmarks.

abab5.5-chat

16k

-

Supported

Conversation

Minimax_abab

Chinese persona-based dialogue scenarios

abab5.5s-chat

8k

-

Supported

Conversation

Minimax_abab

Chinese persona-based dialogue scenarios

abab6.5g-chat

8k

-

Supported

Conversation

Minimax_abab

English and other multilingual persona-based dialogue scenarios

abab6.5s-chat

245k

-

Supported

Conversation

Minimax_abab

General scenarios

abab6.5t-chat

8k

-

Supported

Conversation

Minimax_abab

Chinese persona-based dialogue scenarios

chatgpt-4o-latest

128k

16k

Not supported

Conversation, Image recognition

OpenAI

The chatgpt-4o-latest model version continuously points to the GPT-4o version used in ChatGPT and updates as quickly as possible when significant changes occur.

gpt-4o-2024-11-20

128k

16k

Supported

Conversation

OpenAI

The latest gpt-4o snapshot version from November 20, 2024.

gpt-4o-audio-preview

128k

16k

Not supported

Conversation

OpenAI

OpenAI's real-time speech conversation model.

gpt-4o-audio-preview-2024-10-01

128k

16k

Supported

Conversation

OpenAI

OpenAI's real-time speech conversation model.

o1

128k

32k

Not supported

Conversation, Reasoning, Image recognition

OpenAI

OpenAI's new reasoning model for complex tasks requiring extensive common sense. This model has a 200k context, is currently the strongest model globally, and supports image recognition.

o1-mini-2024-09-12

128k

64k

Not supported

Conversation, Reasoning

OpenAI

Fixed snapshot version of o1-mini, smaller and faster than o1-preview, 80% cheaper, performs well in code generation and small context operations.

o1-preview-2024-09-12

128k

32k

Not supported

Conversation, Reasoning

OpenAI

Fixed snapshot version of o1-preview.

gpt-3.5-turbo

16k

4k

Supported

Conversation

OpenAI_gpt-3

Based on GPT-3.5: GPT-3.5 Turbo is an improved version built upon the GPT-3.5 model, developed by OpenAI. Performance Goals: Designed to improve the model's inference speed, processing efficiency, and resource utilization by optimizing its structure and algorithms. Enhanced Inference Speed: Compared to GPT-3.5, GPT-3.5 Turbo typically offers faster inference speeds under the same hardware conditions, which is particularly beneficial for applications requiring large-scale text processing. Higher Throughput: When processing a large number of requests or data, GPT-3.5 Turbo can achieve higher concurrent processing capabilities, thereby improving overall system throughput. Optimized Resource Consumption: While maintaining performance, it may reduce hardware resource requirements (such as memory and computing resources), which helps lower operating costs and increase system scalability. Broad Natural Language Processing Tasks: GPT-3.5 Turbo is suitable for various natural language processing tasks, including but not limited to text generation, semantic understanding, dialogue systems, machine translation, etc. Developer Tools and API Support: Provides easy-to-integrate API interfaces for developers, supporting rapid application development and deployment.

gpt-3.5-turbo-0125

16k

4k

Supported

Conversation

OpenAI_gpt-3

An updated GPT 3.5 Turbo, with higher accuracy in response request formats and a fix for a bug causing non-English function call text encoding issues. Returns up to 4,096 output tokens.

gpt-3.5-turbo-0613

16k

4k

Supported

Conversation

OpenAI_gpt-3

Updated GPT 3.5 Turbo fixed snapshot version. Currently deprecated.

gpt-3.5-turbo-1106

16k

4k

Supported

Conversation

OpenAI_gpt-3

Features improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns up to 4,096 output tokens.

gpt-3.5-turbo-16k

16k

4k

Supported

Conversation, Deprecated or soon to be deprecated

OpenAI_gpt-3

(Deprecated)

gpt-3.5-turbo-16k-0613

16k

4k

Supported

Conversation, Deprecated or soon to be deprecated

OpenAI_gpt-3

Snapshot of gpt-3.5-turbo from June 13, 2023. (Deprecated)

gpt-3.5-turbo-instruct

4k

4k

Supported

Conversation

OpenAI_gpt-3

Capabilities similar to GPT-3 era models. Compatible with the legacy Completions endpoint, not for Chat Completions.

gpt-3.5o

16k

4k

Not supported

Conversation

OpenAI_gpt-3

Same as gpt-4o-lite.

gpt-4

8k

8k

Supported

Conversation

OpenAI_gpt-4

Currently points to gpt-4-0613.

gpt-4-0125-preview

128k

4k

Supported

Conversation

OpenAI_gpt-4

The latest GPT-4 model, designed to reduce "laziness" where the model doesn't complete tasks. Returns up to 4,096 output tokens.

gpt-4-0314

8k

8k

Supported

Conversation

OpenAI_gpt-4

Snapshot of gpt-4 from March 14, 2023.

gpt-4-0613

8k

8k

Supported

Conversation

OpenAI_gpt-4

Snapshot of gpt-4 from June 13, 2023, with enhanced function calling support.

gpt-4-1106-preview

128k

4k

Supported

Conversation

OpenAI_gpt-4

GPT-4 Turbo model with improved instruction following, JSON mode, reproducible outputs, function calling, etc. Returns up to 4,096 output tokens. This is a preview model.

gpt-4-32k

32k

4k

Supported

Conversation

OpenAI_gpt-4

gpt-4-32k will be deprecated on 2025-06-06.

gpt-4-32k-0613

32k

4k

Supported

Conversation, Deprecated or soon to be deprecated

OpenAI_gpt-4

Will be deprecated on 2025-06-06.

gpt-4-turbo

128k

4k

Supported

Conversation

OpenAI_gpt-4

The latest version of the GPT-4 Turbo model adds visual capabilities and supports visual requests via JSON mode and function calling. The current version of this model is gpt-4-turbo-2024-04-09.

gpt-4-turbo-2024-04-09

128k

4k

Supported

Conversation

OpenAI_gpt-4

GPT-4 Turbo model with visual capabilities. Visual requests can now be handled via JSON mode and function calling. The current version of gpt-4-turbo is this version.

gpt-4-turbo-preview

128k

4k

Supported

Conversation, Image recognition

OpenAI_gpt-4

Currently points to gpt-4-0125-preview.

gpt-4o

128k

16k

Supported

Conversation, Image recognition

OpenAI_gpt-4

OpenAI's highly intelligent flagship model, suitable for complex multi-step tasks. GPT-4o is cheaper and faster than GPT-4 Turbo.

gpt-4o-2024-05-13

128k

4k

Supported

Conversation, Image recognition

OpenAI_gpt-4

The original gpt-4o snapshot from May 13, 2024.

gpt-4o-2024-08-06

128k

16k

Supported

Conversation, Image recognition

OpenAI_gpt-4

The first snapshot supporting structured output. gpt-4o currently points to this version.

gpt-4o-mini

128k

16k

Supported

Conversation, Image recognition

OpenAI_gpt-4

OpenAI's affordable gpt-4o version for fast, lightweight tasks. GPT-4o mini is cheaper and more powerful than GPT-3.5 Turbo. Currently points to gpt-4o-mini-2024-07-18.

gpt-4o-mini-2024-07-18

128k

16k

Supported

Conversation, Image recognition

OpenAI_gpt-4

Fixed snapshot version of gpt-4o-mini.

gpt-4o-realtime-preview

128k

4k

Supported

Conversation, Real-time speech

OpenAI_gpt-4

OpenAI's real-time speech conversation model.

gpt-4o-realtime-preview-2024-10-01

128k

4k

Supported

Conversation, Real-time speech, Image recognition

OpenAI_gpt-4

gpt-4o-realtime-preview currently points to this snapshot version.

o1-mini

128k

64k

Not supported

Conversation, Reasoning

OpenAI_o1

Smaller and faster than o1-preview, 80% cheaper, performs well in code generation and small context operations.

o1-preview

128k

32k

Not supported

Conversation, Reasoning

OpenAI_o1

o1-preview is a new reasoning model for complex tasks requiring extensive common sense. This model has a 128K context and a knowledge cutoff of October 2023. Focuses on advanced reasoning and solving complex problems, including mathematical and scientific tasks. Ideal for applications requiring deep contextual understanding and autonomous workflows.

o3-mini

200k

100k

Supported

Conversation, Reasoning

OpenAI_o1

o3-mini is OpenAI's latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on science, mathematics, and coding tasks, supports structured output, function calling, batch API, and other developer features, with a knowledge base cutoff of October 2023, demonstrating a significant balance between reasoning capabilities and cost-effectiveness.

o3-mini-2025-01-31

200k

100k

Supported

Conversation, Reasoning

OpenAI_o1

o3-mini currently points to this version. o3-mini-2025-01-31 is OpenAI's latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on science, mathematics, and coding tasks, supports structured output, function calling, batch API, and other developer features, with a knowledge base cutoff of October 2023, demonstrating a significant balance between reasoning capabilities and cost-effectiveness.

Baichuan2-Turbo

32k

-

Not supported

Conversation

百川_baichuan

Compared to equivalent models in the industry, the model's performance remains leading while significantly reducing prices.

Baichuan3-Turbo

32k

-

Not supported

Conversation

百川_baichuan

Compared to equivalent models in the industry, the model's performance remains leading while significantly reducing prices.

Baichuan3-Turbo-128k

128k

-

Not supported

Conversation

百川_baichuan

Baichuan model processes complex text with a 128k ultra-long context window, specifically optimized for industries like finance, and significantly reduces costs while maintaining high performance, providing cost-effective solutions for enterprises.

Baichuan4

32k

-

Not supported

Conversation

百川_baichuan

Baichuan's MoE model provides efficient and cost-effective solutions for enterprise applications through specialized optimization, cost reduction, and performance enhancement.

Baichuan4-Air

32k

-

Not supported

Conversation

百川_baichuan

Baichuan's MoE model provides efficient and cost-effective solutions for enterprise applications through specialized optimization, cost reduction, and performance enhancement.

Baichuan4-Turbo

32k

-

Not supported

Conversation

百川_baichuan

Trained on massive amounts of high-quality scenario data, the availability in high-frequency enterprise scenarios is improved by 10%+ compared to Baichuan4, information summarization by 50%, multilingual capability by 31%, and content generation by 13%. Specialized optimization for inference performance, first token response speed increased by 51% compared to Baichuan4, and token streaming speed increased by 73%.

ERNIE-3.5-128K

128k

4k

Supported

Conversation

百度_ernie

Baidu's self-developed flagship large language model, covering massive Chinese and English corpora, with powerful general capabilities, meeting most requirements for dialogue Q&A, creative generation, and plugin application scenarios; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.

ERNIE-3.5-8K

8k

1k

Supported

Conversation

百度_ernie

Baidu's self-developed flagship large language model, covering massive Chinese and English corpora, with powerful general capabilities, meeting most requirements for dialogue Q&A, creative generation, and plugin application scenarios; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.

ERNIE-3.5-8K-Preview

8k

1k

Supported

Conversation

百度_ernie

Baidu's self-developed flagship large language model, covering massive Chinese and English corpora, with powerful general capabilities, meeting most requirements for dialogue Q&A, creative generation, and plugin application scenarios; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.

ERNIE-4.0-8K

8k

1k

Supported

Conversation

百度_ernie

Baidu's self-developed flagship ultra-large language model, achieving a comprehensive upgrade in model capabilities compared to ERNIE 3.5, widely applicable to complex task scenarios in various fields; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.

ERNIE-4.0-8K-Latest

8k

2k

Supported

Conversation

百度_ernie

ERNIE-4.0-8K-Latest offers comprehensive capability improvements compared to ERNIE-4.0-8K, with significant enhancements in role-playing and instruction following capabilities; it achieves a full upgrade in model capabilities compared to ERNIE 3.5, widely applicable to complex task scenarios in various fields; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information, supporting 5K tokens input + 2K tokens output. This document introduces the ERNIE-4.0-8K-Latest API calling method.

ERNIE-4.0-8K-Preview

8k

1k

Supported

Conversation

百度_ernie

Baidu's self-developed flagship ultra-large language model, achieving a comprehensive upgrade in model capabilities compared to ERNIE 3.5, widely applicable to complex task scenarios in various fields; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.

ERNIE-4.0-Turbo-128K

128k

4k

Supported

Conversation

百度_ernie

ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large language model, demonstrating excellent overall performance, widely applicable to complex task scenarios in various fields; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information. Compared to ERNIE 4.0, it performs better. ERNIE-4.0-Turbo-128K is a version of the model, and its long document performance is superior to ERNIE-3.5-128K. This document introduces the relevant APIs and usage.

ERNIE-4.0-Turbo-8K

8k

2k

Supported

Conversation

百度_ernie

ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large language model, demonstrating excellent overall performance, widely applicable to complex task scenarios in various fields; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information. Compared to ERNIE 4.0, it performs better. ERNIE-4.0-Turbo-8K is a version of the model. This document introduces the relevant APIs and usage.

ERNIE-4.0-Turbo-8K-Latest

8k

2k

Supported

Conversation

百度_ernie

ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large language model, demonstrating excellent overall performance, widely applicable to complex task scenarios in various fields; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information. Compared to ERNIE 4.0, it performs better. ERNIE-4.0-Turbo-8K is a version of the model.

ERNIE-4.0-Turbo-8K-Preview

8k

2k

Supported

Conversation

百度_ernie

ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large language model, demonstrating excellent overall performance, widely applicable to complex task scenarios in various fields; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information. ERNIE-4.0-Turbo-8K-Preview is a version of the model.

ERNIE-Character-8K

8k

1k

Not supported

Conversation

百度_ernie

Baidu's self-developed vertical-scenario large language model, suitable for applications such as game NPCs, customer service dialogues, and dialogue role-playing. It features a more distinct and consistent persona style, stronger instruction following, and superior reasoning performance.

ERNIE-Lite-8K

8k

4k

Not supported

Conversation

百度_ernie

Baidu's self-developed lightweight large language model, balancing excellent model performance with inference efficiency, suitable for inference on low-compute AI accelerator cards.

ERNIE-Lite-Pro-128K

128k

2k

Supported

Conversation

百度_ernie

Baidu's self-developed lightweight large language model, performing better than ERNIE Lite, balancing excellent model performance with inference efficiency, suitable for inference on low-compute AI accelerator cards. ERNIE-Lite-Pro-128K supports a 128K context length and performs better than ERNIE-Lite-128K.

ERNIE-Novel-8K

8k

2k

Not supported

Conversation

百度_ernie

ERNIE-Novel-8K is Baidu's self-developed general-purpose large language model, with a significant advantage in novel continuation capabilities, and can also be used in scenarios such as short dramas and movies.

ERNIE-Speed-128K

128k

4k

Not supported

Conversation

百度_ernie

Baidu's newly released self-developed high-performance large language model in 2024. It has excellent general capabilities, suitable for fine-tuning as a base model to better handle specific scenario problems, and possesses excellent inference performance.

ERNIE-Speed-8K

8k

1k

Not supported

Conversation

百度_ernie

Baidu's newly released self-developed high-performance large language model in 2024. It has excellent general capabilities, suitable for fine-tuning as a base model to better handle specific scenario problems, and possesses excellent inference performance.

ERNIE-Speed-Pro-128K

128k

4k

Not supported

Conversation

百度_ernie

ERNIE Speed Pro is Baidu's newly released self-developed high-performance large language model in 2024. It has excellent general capabilities, suitable for fine-tuning as a base model to better handle specific scenario problems, and possesses excellent inference performance. ERNIE-Speed-Pro-128K is the initial version released on August 30, 2024, supporting a 128K context length and performing better than ERNIE-Speed-128K.

ERNIE-Tiny-8K

8k

1k

Not supported

Conversation

百度_ernie

Baidu's self-developed ultra-high-performance large language model, with the lowest deployment and fine-tuning costs among the Wenxin series models.

Doubao-1.5-lite-32k

32k

12k

Supported

Conversation

豆包_doubao

Doubao1.5-lite is also a world-class lightweight language model, with performance on par with or surpassing GPT-4omini and Claude 3.5 Haiku in comprehensive (MMLU_pro), reasoning (BBH), mathematics (MATH), and professional knowledge (GPQA) authoritative evaluation metrics.

Doubao-1.5-pro-256k

256k

12k

Supported

Conversation

豆包_doubao

Doubao-1.5-Pro-256k, a fully upgraded version based on Doubao-1.5-Pro. Compared to Doubao-pro-256k/241115, overall performance has improved by 10%. Output length has significantly increased, supporting a maximum of 12k tokens.

Doubao-1.5-pro-32k

32k

12k

Supported

Conversation

豆包_doubao

Doubao-1.5-pro, a new generation flagship model with comprehensively upgraded performance, excelling in knowledge, code, reasoning, and other aspects. It achieves world-leading levels on multiple public evaluation benchmarks, especially ranking best in knowledge, code, reasoning, and Chinese authoritative evaluation benchmarks, with an overall score superior to industry-leading models like GPT4o and Claude 3.5 Sonnet.

Doubao-1.5-vision-pro

32k

12k

Not supported

Conversation, Image recognition

豆包_doubao

Doubao-1.5-vision-pro, a newly upgraded multimodal large model, supports image recognition of arbitrary resolutions and extreme aspect ratios, enhancing visual reasoning, document recognition, detailed information understanding, and instruction following capabilities.

Doubao-embedding

4k

-

Supported

Embedding

豆包_doubao

Doubao-embedding is a semantic vectorization model developed by ByteDance, primarily for vector retrieval scenarios, supporting Chinese and English with a maximum context length of 4K. Currently, the following versions are available: text-240715: Highest dimension vector 2560, supports 512, 1024, 2048 dimensionality reduction. Chinese and English Retrieval performance significantly improved compared to text-240515 version, this version is recommended. text-240515: Highest dimension vector 2048, supports 512, 1024 dimensionality reduction.

Doubao-embedding-large

4k

-

Not supported

Embedding

豆包_doubao

Chinese and English Retrieval performance significantly improved compared to Doubao-embedding/text-240715 version.

Doubao-embedding-vision

8k

-

Not supported

Embedding

豆包_doubao

Doubao-embedding-vision, a newly upgraded image-text multimodal vectorization model, primarily for image-text multimodal vector retrieval scenarios, supporting image input and Chinese/English text input, with a maximum context length of 8K.

Doubao-lite-128k

128k

4k

Supported

Conversation

豆包_doubao

Doubao-lite offers extreme response speed and better cost-effectiveness, providing more flexible choices for customers' different scenarios. Supports inference and fine-tuning with a 128k context window.

Doubao-lite-32k

32k

4k

Supported

Conversation

豆包_doubao

Doubao-lite offers extreme response speed and better cost-effectiveness, providing more flexible choices for customers' different scenarios. Supports inference and fine-tuning with a 32k context window.

Doubao-lite-4k

4k

4k

Supported

Conversation

豆包_doubao

Doubao-lite offers extreme response speed and better cost-effectiveness, providing more flexible choices for customers' different scenarios. Supports inference and fine-tuning with a 4k context window.

Doubao-pro-128k

128k

4k

Supported

Conversation

豆包_doubao

The best performing flagship model, suitable for complex tasks, with excellent results in reference Q&A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 128k context window.

Doubao-pro-32k

32k

4k

Supported

Conversation

豆包_doubao

The best performing flagship model, suitable for complex tasks, with excellent results in reference Q&A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 32k context window.

Doubao-pro-4k

4k

4k

Supported

Conversation

豆包_doubao

The best performing flagship model, suitable for complex tasks, with excellent results in reference Q&A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 4k context window.

step-1-128k

128k

-

Supported

Conversation

阶跃星辰

The step-1-128k model is an ultra-large language model capable of processing inputs up to 128,000 tokens. This capability gives it significant advantages in generating long-form content and performing complex reasoning, making it suitable for applications requiring rich context, such as writing novels and scripts.

step-1-256k

256k

-

Supported

Conversation

阶跃星辰

The step-1-256k model is one of the largest language models currently available, supporting inputs of 256,000 tokens. It is designed to meet extremely complex task requirements, such as large-scale data analysis and multi-turn dialogue systems, and can provide high-quality outputs across various domains.

step-1-32k

32k

-

Supported

Conversation

阶跃星辰

The step-1-32k model extends the context window, supporting inputs of 32,000 tokens. This makes it perform exceptionally well when processing long articles and complex conversations, suitable for tasks requiring deep understanding and analysis, such as legal documents and academic research.

step-1-8k

8k

-

Supported

Conversation

阶跃星辰

The step-1-8k model is an efficient language model designed specifically for processing shorter texts. It can perform reasoning within an 8,000-token context, making it suitable for applications requiring rapid responses, such as chatbots and real-time translation.

step-1-flash

8k

-

Supported

Conversation

阶跃星辰

The step-1-flash model focuses on rapid response and efficient processing, suitable for real-time applications. Its design allows it to provide high-quality language understanding and generation capabilities even with limited computing resources, making it suitable for mobile devices and edge computing scenarios.

step-1.5v-mini

32k

-

Supported

Conversation, Image recognition

阶跃星辰

The step-1.5v-mini model is a lightweight version designed to run in resource-constrained environments. Despite its small size, it retains good language processing capabilities, making it suitable for embedded systems and low-power devices.

step-1v-32k

32k

-

Supported

Conversation, Image recognition

阶跃星辰

The step-1v-32k model supports inputs of 32,000 tokens, suitable for applications requiring longer contexts. It performs excellently in handling complex conversations and long texts, making it suitable for areas such as customer service and content creation.

step-1v-8k

8k

-

Supported

Conversation, Image recognition

阶跃星辰

The step-1v-8k model is an optimized version designed for 8,000-token inputs, suitable for rapid generation and processing of short texts. It strikes a good balance between speed and accuracy, making it suitable for real-time applications.

step-2-16k

16k

-

Supported

Conversation

阶跃星辰

The step-2-16k model is a medium-sized language model that supports inputs of 16,000 tokens. It performs well in various tasks and is suitable for applications such as education, training, and knowledge management.

yi-lightning

16k

-

Supported

Conversation

零一万物_yi

Latest high-performance model, ensuring high-quality output while significantly increasing inference speed. Suitable for real-time interaction, complex reasoning scenarios, and offers excellent cost-effectiveness to support commercial products.

yi-vision-v2

16K

-

Supported

Conversation, Image recognition

零一万物_yi

Suitable for scenarios requiring analysis and interpretation of images and charts, such as image Q&A, chart understanding, OCR, visual reasoning, education, research report understanding, or multilingual document reading.

qwen-14b-chat

8k

2k

Supported

Conversation

千问_qwen

Alibaba Cloud's official Qwen - Open Source Edition.

qwen-72b-chat

32k

2k

Supported

Conversation

千问_qwen

Alibaba Cloud's official Qwen - Open Source Edition.

qwen-7b-chat

7.5k

1.5k

Supported

Conversation

千问_qwen

Alibaba Cloud's official Qwen - Open Source Edition.

qwen-coder-plus

128k

8k

Supported

Conversation, Code

千问_qwen

Qwen-Coder-Plus is a programming-specific model in the Qwen series, designed to enhance code generation and understanding capabilities. This model is trained on a large scale of programming data, capable of handling various programming languages, and supports code completion, error detection, and code refactoring. Its design goal is to provide developers with more efficient programming assistance and improve development efficiency.

qwen-coder-plus-latest

128k

8k

Supported

Conversation, Code

千问_qwen

Qwen-Coder-Plus-Latest is the newest version of Qwen-Coder-Plus, incorporating the latest algorithm optimizations and dataset updates. This model shows significant performance improvements, capable of understanding context more accurately and generating code that better meets developer needs. It also introduces support for more programming languages, enhancing its multilingual programming capabilities.

qwen-coder-turbo

128k

8k

Supported

Conversation, Code

千问_qwen

The Qwen series of code and programming models are language models specifically designed for programming and code generation, offering fast inference speed and low cost. This version always points to the latest stable snapshot.

qwen-coder-turbo-latest

128k

8k

Supported

Conversation, Code

千问_qwen

The Qwen series of code and programming models are language models specifically designed for programming and code generation, offering fast inference speed and low cost. This version always points to the latest snapshot.

qwen-long

10m

6k

Supported

Conversation

千问_qwen

Qwen-Long is a large language model from the Qwen series targeting ultra-long context processing scenarios. It supports Chinese, English, and other languages, and allows ultra-long context conversations of up to 10 million tokens (approximately 15 million characters or 15,000 pages of documents). Coupled with the synchronously launched document service, it supports parsing and conversation for various document formats such as Word, PDF, Markdown, EPUB, and MOBI. Note: For requests submitted directly via HTTP, it supports a length of 1M tokens; for lengths exceeding this, it is recommended to submit via file.

qwen-math-plus

4k

3k

Supported

Conversation

千问_qwen

Qwen-Math-Plus is a model focused on solving mathematical problems, aiming to provide efficient mathematical reasoning and computation capabilities. This model is trained on a large number of math problem sets, capable of handling complex mathematical expressions and problems, supporting various computational needs from basic arithmetic to advanced mathematics. Its application scenarios include education, scientific research, and engineering.

qwen-math-plus-latest

4k

3k

Supported

Conversation

千问_qwen

Qwen-Math-Plus-Latest is the newest version of Qwen-Math-Plus, integrating the latest mathematical reasoning technologies and algorithmic improvements. This model performs even better in handling complex mathematical problems, capable of providing more accurate solutions and reasoning processes. It also extends its understanding of mathematical symbols and formulas, suitable for a wider range of mathematical application scenarios.

qwen-math-turbo

4k

3k

Supported

Conversation

千问_qwen

Qwen-Math-Turbo is a high-performance mathematical model designed for rapid computation and real-time reasoning. This model optimizes computation speed, capable of processing a large number of mathematical problems in a very short time, suitable for applications requiring quick feedback, such as online education and real-time data analysis. Its efficient algorithm enables users to obtain instant results in complex calculations.

qwen-math-turbo-latest

4k

3k

Supported

Conversation

千问_qwen

Qwen-Math-Turbo-Latest is the newest version of Qwen-Math-Turbo, further enhancing computational efficiency and accuracy. This model features multiple algorithmic optimizations, capable of handling more complex mathematical problems and maintaining efficiency in real-time reasoning. It is suitable for mathematical applications requiring fast responses, such as financial analysis and scientific computing.

qwen-max

32k

8k

Supported

Conversation

千问_qwen

Qwen-Max is an ultra-large language model of the Qwen 2.5 series with trillions of parameters, supporting Chinese, English, and other languages. As the model is upgraded, qwen-max will be continuously updated.

qwen-max-latest

32k

8k

Supported

Conversation

千问_qwen

The best-performing model in the Qwen series. This model is dynamically updated, and model updates will not be announced in advance. It is suitable for complex, multi-step tasks. Its comprehensive Chinese and English capabilities are significantly improved, human preferences are notably enhanced, reasoning capabilities and complex instruction understanding are substantially strengthened, performance on difficult tasks is better, and mathematical and coding capabilities are significantly improved. It also enhances the ability to understand and generate structured data like Tables and JSON.

qwen-plus

128k

8k

Supported

Conversation

千问_qwen

A balanced model in the Qwen series, with inference performance and speed between Qwen-Max and Qwen-Turbo, suitable for moderately complex tasks. Its comprehensive Chinese and English capabilities are significantly improved, human preferences are notably enhanced, reasoning capabilities and complex instruction understanding are substantially strengthened, performance on difficult tasks is better, and mathematical and coding capabilities are significantly improved.

qwen-plus-latest

128k

8k

Supported

Conversation

千问_qwen

Qwen-Plus is an enhanced visual language model in the Qwen series, designed to improve detail recognition and text recognition capabilities. This model supports ultra-million pixel resolutions and arbitrary aspect ratio images, performing excellently in various visual language tasks, suitable for applications requiring high-precision image understanding.

qwen-turbo

128k

8k

Supported

Conversation

千问_qwen

The fastest and most cost-effective model in the Qwen series, suitable for simple tasks. Its comprehensive Chinese and English capabilities are significantly improved, human preferences are notably enhanced, reasoning capabilities and complex instruction understanding are substantially strengthened, performance on difficult tasks is better, and mathematical and coding capabilities are significantly improved.

qwen-turbo-latest

1m

8k

Supported

Conversation

千问_qwen

Qwen-Turbo is an efficient model designed for simple tasks, emphasizing speed and cost-effectiveness. It performs excellently in handling basic visual language tasks, suitable for applications with strict response time requirements, such as real-time image recognition and simple Q&A systems.

qwen-vl-max

32k

2k

Supported

Conversation

千问_qwen

Qwen-VL-Max (qwen-vl-max), the ultra-large visual language model of Qwen. Compared to the enhanced version, it further improves visual reasoning and instruction following capabilities, providing higher visual perception and cognitive levels. It offers optimal performance on more complex tasks.

qwen-vl-max-latest

32k

2k

Supported

Conversation, Image recognition

千问_qwen

Qwen-VL-Max is the highest-tier version in the Qwen-VL series, specifically designed to solve complex multimodal tasks. It combines advanced visual and language processing technologies, capable of understanding and analyzing high-resolution images, with extremely strong reasoning capabilities, suitable for applications requiring deep understanding and complex reasoning.

qwen-vl-ocr

34k

4k

Supported

Conversation, Image recognition

千问_qwen

Only supports OCR, not conversation.

qwen-vl-ocr-latest

34k

4k

Supported

Conversation, Image recognition

千问_qwen

Only supports OCR, not conversation.

qwen-vl-plus

8k

2k

Supported

Conversation, Image recognition

千问_qwen

Qwen-VL-Plus (qwen-vl-plus), the enhanced version of Qwen large-scale visual language model. Significantly improves detail recognition and text recognition capabilities, supporting ultra-million pixel resolutions and arbitrary aspect ratio images. Provides excellent performance across a wide range of visual tasks.

qwen-vl-plus-latest

32k

2k

Supported

Conversation, Image recognition

千问_qwen

Qwen-VL-Plus-Latest is the newest version of Qwen-VL-Plus, enhancing the model's multimodal understanding capabilities. It excels in combining image and text processing, suitable for applications requiring efficient handling of various input formats, such as intelligent customer service and content generation.

Qwen/Qwen2-1.5B-Instruct

32k

6k

Not supported

Conversation

千问_qwen

Qwen2-1.5B-Instruct is an instruction-tuned large language model in the Qwen2 series, with 1.5 billion parameters. This model is based on the Transformer architecture and incorporates techniques such as SwiGLU activation function, attention QKV bias, and Grouped Query Attention. It performs excellently in various benchmarks for language understanding, generation, multilingual capabilities, coding, mathematics, and reasoning, surpassing most open-source models.

Qwen/Qwen2-72B-Instruct

128k

6k

Not supported

Conversation

千问_qwen

Qwen2-72B-Instruct is an instruction-tuned large language model in the Qwen2 series, with 72 billion parameters. This model is based on the Transformer architecture and incorporates techniques such as SwiGLU activation function, attention QKV bias, and Grouped Query Attention. It can handle large-scale inputs. This model performs excellently in various benchmarks for language understanding, generation, multilingual capabilities, coding, mathematics, and reasoning, surpassing most open-source models.

Qwen/Qwen2-7B-Instruct

128k

6k

Not supported

Conversation

千问_qwen

Qwen2-7B-Instruct is an instruction-tuned large language model in the Qwen2 series, with 7 billion parameters. This model is based on the Transformer architecture and incorporates techniques such as SwiGLU activation function, attention QKV bias, and Grouped Query Attention. It can handle large-scale inputs. This model performs excellently in various benchmarks for language understanding, generation, multilingual capabilities, coding, mathematics, and reasoning, surpassing most open-source models.

Qwen/Qwen2-VL-72B-Instruct

32k

2k

Not supported

Conversation

千问_qwen

Qwen2-VL is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance in visual understanding benchmarks including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can understand videos over 20 minutes long for high-quality video-based Q&A, dialogue, and content creation. It also possesses complex reasoning and decision-making abilities, allowing integration with mobile devices, robots, etc., to perform autonomous operations based on visual environments and text instructions.

Qwen/Qwen2-VL-7B-Instruct

32k

-

Not supported

Conversation

千问_qwen

Qwen2-VL-7B-Instruct is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance in visual understanding benchmarks including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can be used for high-quality video-based Q&A, dialogue, and content creation, and also possesses complex reasoning and decision-making abilities, allowing integration with mobile devices, robots, etc., to perform autonomous operations based on visual environments and text instructions.

Qwen/Qwen2.5-72B-Instruct

128k

8k

Not supported

Conversation

千问_qwen

Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 72B model has significantly improved capabilities in areas such as coding and mathematics. It supports inputs up to 128K tokens and can generate long texts exceeding 8K tokens.

Qwen/Qwen2.5-72B-Instruct-128K

128k

8k

Not supported

Conversation

千问_qwen

Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 72B model has significantly improved capabilities in areas such as coding and mathematics. It supports inputs up to 128K tokens and can generate long texts exceeding 8K tokens.

Qwen/Qwen2.5-7B-Instruct

128k

8k

Not supported

Conversation

千问_qwen

Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model has significantly improved capabilities in areas such as coding and mathematics. This model also provides multilingual support, covering over 29 languages, including Chinese and English. The model shows significant improvements in instruction following, understanding structured data, and generating structured outputs (especially JSON).

Qwen/Qwen2.5-Coder-32B-Instruct

128k

8k

Not supported

Conversation, Code

千问_qwen

Qwen2.5-32B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 32B model has significantly improved capabilities in areas such as coding and mathematics. This model also provides multilingual support, covering over 29 languages, including Chinese and English. The model shows significant improvements in instruction following, understanding structured data, and generating structured outputs (especially JSON).

Qwen/Qwen2.5-Coder-7B-Instruct

128k

8k

Not supported

Conversation

千问_qwen

Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model has significantly improved capabilities in areas such as coding and mathematics. This model also provides multilingual support, covering over 29 languages, including Chinese and English. The model shows significant improvements in instruction following, understanding structured data, and generating structured outputs (especially JSON).

Qwen/QwQ-32B-Preview

32k

16k

Not supported

Conversation, Reasoning

千问_qwen

QwQ-32B-Preview is an experimental research model developed by the Qwen team, aiming to enhance AI's reasoning capabilities. As a preview version, it demonstrates excellent analytical abilities but also has some important limitations: 1. Language Mixing and Code Switching: The model may mix languages or unexpectedly switch between languages, affecting response clarity. 2. Recursive Reasoning Loops: The model may enter a loop of reasoning, leading to lengthy answers without clear conclusions. 3. Safety and Ethical Considerations: The model needs strengthened safety measures to ensure reliable and secure performance, and users should exercise caution when using it. 4. Performance and Benchmark Limitations: The model performs well in mathematics and programming but still has room for improvement in other areas such as common sense reasoning and nuanced language understanding.

qwen1.5-110b-chat

32k

8k

Not supported

Conversation

千问_qwen

-

qwen1.5-14b-chat

8k

2k

Not supported

Conversation

千问_qwen

-

qwen1.5-32b-chat

32k

2k

Not supported

Conversation

千问_qwen

-

qwen1.5-72b-chat

32k

2k

Not supported

Conversation

千问_qwen

-

qwen1.5-7b-chat

8k

2k

Not supported

Conversation

千问_qwen

-

qwen2-57b-a14b-instruct

65k

6k

Not supported

Conversation

千问_qwen

-

Qwen2-72B-Instruct

-

-

Not supported

Conversation

千问_qwen

-

qwen2-7b-instruct

128k

6k

Not supported

Conversation

千问_qwen

-

qwen2-math-72b-instruct

4k

3k

Not supported

Conversation

千问_qwen

-

qwen2-math-7b-instruct

4k

3k

Not supported

Conversation

千问_qwen

-

qwen2.5-14b-instruct

128k

8k

Not supported

Conversation

千问_qwen

-

qwen2.5-32b-instruct

128k

8k

Not supported

Conversation

千问_qwen

-

qwen2.5-72b-instruct

128k

8k

Not supported

Conversation

千问_qwen

-

qwen2.5-7b-instruct

128k

8k

Not supported

Conversation

千问_qwen

-

qwen2.5-coder-14b-instruct

128k

8k

Not supported

Conversation, Code

千问_qwen

-

qwen2.5-coder-32b-instruct

128k

8k

Not supported

Conversation, Code

千问_qwen

-

qwen2.5-coder-7b-instruct

128k

8k

Not supported

Conversation, Code

千问_qwen

-

qwen2.5-math-72b-instruct

4k

3k

Not supported

Conversation

千问_qwen

-

qwen2.5-math-7b-instruct

4k

3k

Not supported

Conversation

千问_qwen

-

deepseek-ai/DeepSeek-R1

64k

-

Not supported

Conversation, Reasoning

深度求索_deepseek

The DeepSeek-R1 model is an open-source reasoning model based purely on reinforcement learning, excelling in tasks such as mathematics, code, and natural language reasoning. Its performance is comparable to OpenAI's o1 model, achieving excellent results in multiple benchmarks.

deepseek-ai/DeepSeek-V2-Chat

128k

-

Not supported

Conversation

深度求索_deepseek

DeepSeek-V2 is a powerful, cost-efficient Mixture-of-Experts (MoE) language model. It was pre-trained on an 8.1 trillion token high-quality corpus and further enhanced through supervised fine-tuning (SFT) and reinforcement learning (RL). Compared to DeepSeek 67B, DeepSeek-V2 achieves stronger performance while saving 42.5% in training costs, reducing KV cache by 93.3%, and increasing maximum generation throughput by 5.76 times.

deepseek-ai/DeepSeek-V2.5

32k

-

Supported

Conversation

深度求索_deepseek

DeepSeek-V2.5 is an upgraded version of DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating the general and coding capabilities of the two previous versions. This model has been optimized in several aspects, including writing and instruction-following abilities, better aligning with human preferences.

deepseek-ai/DeepSeek-V3

128k

4k

Not supported

Conversation

深度求索_deepseek

DeepSeek open-source version, with a longer context than the official version and no issues like sensitive word refusal.

deepseek-chat

64k

8k

Supported

Conversation

深度求索_deepseek

236B parameters, 64K context (API), Chinese comprehensive capability (AlignBench) ranks first among open-source models, and is in the same tier as closed-source models like GPT-4-Turbo and Wenxin 4.0 in evaluations.

deepseek-coder

64k

8k

Supported

Conversation, Code

深度求索_deepseek

236B parameters, 64K context (API), Chinese comprehensive capability (AlignBench) ranks first among open-source models, and is in the same tier as closed-source models like GPT-4-Turbo and Wenxin 4.0 in evaluations.

deepseek-reasoner

64k

8k

Supported

Conversation, Reasoning

深度求索_deepseek

DeepSeek-Reasoner (DeepSeek-R1) is DeepSeek's newly launched reasoning model, designed to improve reasoning capabilities through reinforcement learning training. Its reasoning process involves extensive reflection and verification, capable of handling complex logical reasoning tasks, with a chain of thought length up to tens of thousands of characters. DeepSeek-R1 excels in solving mathematical, coding, and other complex problems, and has been widely applied in various scenarios, demonstrating its powerful reasoning capabilities and flexibility. Compared to other models, DeepSeek-R1's reasoning performance approaches that of top-tier closed-source models, showcasing the potential and competitiveness of open-source models in the reasoning domain.

hunyuan-code

4k

4k

Not supported

Conversation, Code

腾讯_hunyuan

Hunyuan's latest code generation model, based on a foundation model enhanced with 200B high-quality code data and trained with half a year of high-quality SFT data. The context window length is increased to 8K, and it ranks among the top in five major language code generation automated evaluation metrics; in ten comprehensive code task evaluations across five major languages by human experts, its performance is in the first tier.

hunyuan-functioncall

28k

4k

Supported

Conversation

腾讯_hunyuan

Hunyuan's latest MoE architecture FunctionCall model, trained with high-quality FunctionCall data, with a context window of 32K, and leading in multiple evaluation metrics.

hunyuan-large

28k

4k

Not supported

Conversation

腾讯_hunyuan

The Hunyuan-large model has approximately 389B total parameters and 52B active parameters, making it the industry's largest and most effective open-source MoE model with a Transformer architecture.

hunyuan-large-longcontext

128k

6k

Not supported

Conversation

腾讯_hunyuan

Excels at long document tasks such as document summarization and Q&A, and also has the ability to handle general text generation tasks. It performs excellently in the analysis and generation of long texts, effectively meeting the needs for processing complex and detailed long-form content.

hunyuan-lite

250k

6k

Not supported

Conversation

腾讯_hunyuan

Upgraded to MoE structure, with a context window of 256k, leading many open-source models in multiple NLP, code, mathematics, and industry evaluation sets.

hunyuan-pro

28k

4k

Supported

Conversation

腾讯_hunyuan

Trillion-parameter MoE-32K long-text model. Achieves absolute leading levels on various benchmarks, with complex instructions and reasoning, possesses complex mathematical capabilities, supports function calling, and is optimized for multilingual translation and applications in finance, law, and medicine.

hunyuan-role

28k

4k

Not supported

Conversation

腾讯_hunyuan

Hunyuan's latest role-playing model, fine-tuned and trained by Hunyuan officials, based on the Hunyuan model combined with role-playing scenario datasets, resulting in better foundational performance in role-playing scenarios.

hunyuan-standard

30k

2k

Not supported

Conversation

腾讯_hunyuan

Adopts a better routing strategy, while alleviating issues of load balancing and expert convergence. MOE-32K offers higher cost-effectiveness, and while balancing performance and price, it can handle long text inputs.

hunyuan-standard-256K

250k

6k

Not supported

Conversation

腾讯_hunyuan

Adopts a better routing strategy, while alleviating issues of load balancing and expert convergence. For long texts, the needle-in-a-haystack metric reaches 99.9%. MOE-256K further breaks through in length and performance, greatly expanding the input length.

hunyuan-translation-lite

4k

4k

Not supported

Conversation

腾讯_hunyuan

Hunyuan translation model supports natural language conversational translation; supports mutual translation between Chinese and 15 languages: English, Japanese, French, Portuguese, Spanish, Turkish, Russian, Arabic, Korean, Italian, German, Vietnamese, Malay, Indonesian.

hunyuan-turbo

28k

4k

Supported

Conversation

腾讯_hunyuan

Default version of the Hunyuan-turbo model, adopting a new Mixture-of-Experts (MoE) structure, with faster inference efficiency and stronger performance compared to hunyuan-pro.

hunyuan-turbo-latest

28k

4k

Supported

Conversation

腾讯_hunyuan

Dynamically updated version of the Hunyuan-turbo model, the best-performing version in the Hunyuan model series, consistent with the C-side (Tencent Yuanbao).

hunyuan-turbo-vision

8k

2k

Supported

Image recognition, Conversation

腾讯_hunyuan

Hunyuan's new generation flagship visual language large model, adopting a new Mixture-of-Experts (MoE) structure, with comprehensive improvements in basic recognition, content creation, knowledge Q&A, and analytical reasoning related to image-text understanding compared to the previous generation model. Max input 6k, max output 2k.

hunyuan-vision

8k

2k

Supported

Conversation, Image recognition

腾讯_hunyuan

Hunyuan's latest multimodal model, supporting image + text input to generate text content. Image basic recognition: Recognizes subjects, elements, scenes in images. Image content creation: Summarizes images, creates ad copy, WeChat Moments posts, poems, etc. Image multi-turn dialogue: Outputs a single image for multi-turn interactive Q&A. Image analysis and reasoning: Performs statistical analysis of logical relationships, math problems, code, and charts in images. Image knowledge Q&A: Answers knowledge-related questions based on images, such as historical events, movie posters. Image OCR: Recognizes text in images from natural life scenes and non-natural scenes.

SparkDesk-Lite

4k

-

Not supported

Conversation

星火_SparkDesk

Supports online search functionality, fast and convenient response, suitable for low-compute inference and customized scenarios like model fine-tuning.

SparkDesk-Max

128k

-

Supported

Conversation

星火_SparkDesk

Quantized from the latest SparkDesk 4.0 Turbo large model engine, supports multiple built-in plugins like online search, weather, and date. Core capabilities are comprehensively upgraded, and application effects in various scenarios are generally improved. Supports System role persona and FunctionCall function calling.

SparkDesk-Max-32k

32k

-

Supported

Conversation

星火_SparkDesk

Stronger reasoning: Stronger context understanding and logical reasoning capabilities. Longer input: Supports 32K tokens of text input, suitable for long document reading, private knowledge Q&A, and other scenarios.

SparkDesk-Pro

128k

-

Not supported

Conversation

星火_SparkDesk

Optimized for specific scenarios such as mathematics, code, medical, and education. Supports multiple built-in plugins like online search, weather, and date, covering most knowledge Q&A, language understanding, and text generation scenarios.

SparkDesk-Pro-128K

128k

-

Not supported

Conversation

星火_SparkDesk

Professional-grade large language model with tens of billions of parameters, specifically optimized for medical, educational, and coding scenarios. Lower latency in search scenarios. Suitable for text, intelligent Q&A, and other business scenarios with higher demands for performance and response speed.

moonshot-v1-128k

128k

4k

Supported

Conversation

月之暗面_moonshot

Model with a length of 8k, suitable for generating short texts.

moonshot-v1-32k

32k

4k

Supported

Conversation

月之暗面_moonshot

Model with a length of 32k, suitable for generating long texts.

moonshot-v1-8k

8k

4k

Supported

Conversation

月之暗面_moonshot

Model with a length of 128k, suitable for generating ultra-long texts.

codegeex-4

128k

4k

Not supported

Conversation, Code

智谱_codegeex

Zhipu's code model: suitable for automatic code completion tasks.

charglm-3

4k

2k

Not supported

Conversation

智谱_glm

Persona model.

emohaa

8k

4k

Not supported

Conversation

智谱_glm

Psychological model: possesses professional counseling abilities to help users understand emotions and cope with emotional issues.

glm-3-turbo

128k

4k

Not supported

Conversation

智谱_glm

Will be deprecated (June 30, 2025).

glm-4

128k

4k

Supported

Conversation

智谱_glm

Old flagship: Released on January 16, 2024, now replaced by GLM-4-0520.

glm-4-0520

128k

4k

Supported

Conversation

智谱_glm

High-intelligence model: suitable for handling highly complex and diverse tasks.

glm-4-air

128k

4k

Supported

Conversation

智谱_glm

High cost-performance: the most balanced model between reasoning ability and price.

glm-4-airx

8k

4k

Supported

Conversation

智谱_glm

Ultra-fast inference: features ultra-fast inference speed and powerful reasoning effects.

glm-4-flash

128k

4k

Supported

Conversation

智谱_glm

High speed, low cost: ultra-fast inference speed.

glm-4-flashx

128k

4k

Supported

Conversation

智谱_glm

High speed, low cost: Enhanced Flash version, ultra-fast inference speed.

glm-4-long

1m

4k

Supported

Conversation

智谱_glm

Ultra-long input: specifically designed for processing ultra-long texts and memory-intensive tasks.

glm-4-plus

128k

4k

Supported

Conversation

智谱_glm

High-intelligence flagship: comprehensively improved performance, significantly enhanced long text and complex task capabilities.

glm-4v

2k

-

Not supported

Conversation, Image recognition

智谱_glm

Image understanding: possesses image understanding and reasoning capabilities.

glm-4v-flash

2k

1k

Not supported

Conversation, Image recognition

智谱_glm

Free model: possesses powerful image understanding capabilities.

Last updated

Was this helpful?