Common Models Reference Information
This document was translated from Chinese by AI and has not yet been reviewed.
Common Model Reference Information
360gpt-pro
8k
-
Not Supported
Conversation
360AI_360gpt
The flagship hundred-billion-parameter large model in the 360 AI Brain series, with the best performance, widely applicable to complex task scenarios in various fields.
360gpt-turbo
7k
-
Not Supported
Conversation
360AI_360gpt
A ten-billion-parameter large model that balances performance and effectiveness, suitable for scenarios with high requirements for performance/cost.
360gpt-turbo-responsibility-8k
8k
-
Not Supported
Conversation
360AI_360gpt
A ten-billion-parameter large model that balances performance and effectiveness, suitable for scenarios with high requirements for performance/cost.
360gpt2-pro
8k
-
Not Supported
Conversation
360AI_360gpt
The flagship hundred-billion-parameter large model in the 360 AI Brain series, with the best performance, widely applicable to complex task scenarios in various fields.
claude-3-5-sonnet-20240620
200k
16k
Not Supported
Conversation, Vision
Anthropic_claude
A snapshot version released on June 20, 2024. Claude 3.5 Sonnet is a model that balances performance and speed, offering top-tier performance while maintaining high speed, and supports multimodal input.
claude-3-5-haiku-20241022
200k
16k
Not Supported
Conversation
Anthropic_claude
A snapshot version released on October 22, 2024. Claude 3.5 Haiku has improved across various skills, including coding, tool use, and reasoning. As the fastest model in the Anthropic family, it provides rapid response times, suitable for applications requiring high interactivity and low latency, such as user-facing chatbots and instant code completion. It also excels in specialized tasks like data extraction and real-time content moderation, making it a versatile tool for wide application across industries. It does not support image input.
claude-3-5-sonnet-20241022
200k
8K
Not Supported
Conversation, Vision
Anthropic_claude
A snapshot version released on October 22, 2024. Claude 3.5 Sonnet offers capabilities surpassing Opus and faster speeds than Sonnet, while maintaining the same price as Sonnet. Sonnet is particularly adept at programming, data science, visual processing, and agentic tasks.
claude-3-5-sonnet-latest
200K
8k
Not Supported
Conversation, Vision
Anthropic_claude
Dynamically points to the latest Claude 3.5 Sonnet version. Claude 3.5 Sonnet offers capabilities surpassing Opus and faster speeds than Sonnet, while maintaining the same price as Sonnet. Sonnet is particularly adept at programming, data science, visual processing, and agentic tasks. This model points to the latest version.
claude-3-haiku-20240307
200k
4k
Not Supported
Conversation, Vision
Anthropic_claude
Claude 3 Haiku is Anthropic's fastest and most compact model, designed for near-instantaneous responses. It features fast and accurate targeted performance.
claude-3-opus-20240229
200k
4k
Not Supported
Conversation, Vision
Anthropic_claude
Claude 3 Opus is Anthropic's most powerful model for handling highly complex tasks. It excels in performance, intelligence, fluency, and comprehension.
claude-3-sonnet-20240229
200k
8k
Not Supported
Conversation, Vision
Anthropic_claude
A snapshot version released on February 29, 2024. Sonnet is particularly adept at: - Coding: Can autonomously write, edit, and run code, with reasoning and troubleshooting capabilities - Data Science: Enhances human data science expertise; can process unstructured data when using multiple tools to gain insights - Visual Processing: Excels at interpreting charts, graphs, and images, accurately transcribing text to extract insights beyond the text itself - Agentic Tasks: Excellent tool use, making it ideal for handling agentic tasks (i.e., complex, multi-step problem-solving that requires interaction with other systems)
google/gemma-2-27b-it
8k
-
Not Supported
Conversation
Google_gamma
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are decoder-only large language models that support English and come with open weights, pre-trained, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning.
google/gemma-2-9b-it
8k
-
Not Supported
Conversation
Google_gamma
Gemma is one of the lightweight, state-of-the-art open model series developed by Google. It is a decoder-only large language model that supports English, with open weights, pre-trained, and instruction-tuned variants available. Gemma models are suitable for various text generation tasks, including question answering, summarization, and reasoning. This 9B model was trained on 8 trillion tokens.
gemini-1.5-pro
2m
8k
Not Supported
Conversation
Google_gemini
The latest stable version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. It is particularly suitable for tasks requiring complex reasoning.
gemini-1.0-pro-001
33k
8k
Not Supported
Conversation
Google_gemini
This is a stable version of Gemini 1.0 Pro. As an NLP model, it specializes in tasks like multi-turn text and code chat, as well as code generation. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.
gemini-1.0-pro-002
32k
8k
Not Supported
Conversation
Google_gemini
This is a stable version of Gemini 1.0 Pro. As an NLP model, it specializes in tasks like multi-turn text and code chat, as well as code generation. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.
gemini-1.0-pro-latest
33k
8k
Not Supported
Conversation, Deprecated or soon to be deprecated
Google_gemini
This is the latest version of Gemini 1.0 Pro. As an NLP model, it specializes in tasks like multi-turn text and code chat, as well as code generation. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.
gemini-1.0-pro-vision-001
16k
2k
Not Supported
Conversation
Google_gemini
This is the vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.
gemini-1.0-pro-vision-latest
16k
2k
Not Supported
Vision
Google_gemini
This is the latest vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.
gemini-1.5-flash
1m
8k
Not Supported
Conversation, Vision
Google_gemini
This is the latest stable version of Gemini 1.5 Flash. As a balanced multimodal model, it can process audio, image, video, and text inputs.
gemini-1.5-flash-001
1m
8k
Not Supported
Conversation, Vision
Google_gemini
This is a stable version of Gemini 1.5 Flash. It offers the same basic features as gemini-1.5-flash but is version-pinned, making it suitable for production environments.
gemini-1.5-flash-002
1m
8k
Not Supported
Conversation, Vision
Google_gemini
This is a stable version of Gemini 1.5 Flash. It offers the same basic features as gemini-1.5-flash but is version-pinned, making it suitable for production environments.
gemini-1.5-flash-8b
1m
8k
Not Supported
Conversation, Vision
Google_gemini
Gemini 1.5 Flash-8B is Google's latest multimodal AI model, designed for efficient handling of large-scale tasks. With 8 billion parameters, the model supports text, image, audio, and video inputs, making it suitable for various application scenarios such as chat, transcription, and translation. Compared to other Gemini models, Flash-8B is optimized for speed and cost-effectiveness, especially for cost-sensitive users. Its rate limit is doubled, allowing developers to handle large-scale tasks more efficiently. Additionally, Flash-8B uses "knowledge distillation" technology to extract key knowledge from larger models, ensuring it is lightweight and efficient while retaining core capabilities.
gemini-1.5-flash-exp-0827
1m
8k
Not Supported
Conversation, Vision
Google_gemini
This is an experimental version of Gemini 1.5 Flash, which is regularly updated with the latest improvements. It is suitable for exploratory testing and prototyping, but not recommended for production environments.
gemini-1.5-flash-latest
1m
8k
Not Supported
Conversation, Vision
Google_gemini
This is the cutting-edge version of Gemini 1.5 Flash, which is regularly updated with the latest improvements. It is suitable for exploratory testing and prototyping, but not recommended for production environments.
gemini-1.5-pro-001
2m
8k
Not Supported
Conversation, Vision
Google_gemini
This is a stable version of Gemini 1.5 Pro, offering fixed model behavior and performance characteristics. It is suitable for production environments that require stability.
gemini-1.5-pro-002
2m
8k
Not Supported
Conversation, Vision
Google_gemini
This is a stable version of Gemini 1.5 Pro, offering fixed model behavior and performance characteristics. It is suitable for production environments that require stability.
gemini-1.5-pro-exp-0801
2m
8k
Not Supported
Conversation, Vision
Google_gemini
An experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. It is particularly suitable for tasks requiring complex reasoning.
gemini-1.5-pro-exp-0827
2m
8k
Not Supported
Conversation, Vision
Google_gemini
An experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. It is particularly suitable for tasks requiring complex reasoning.
gemini-1.5-pro-latest
2m
8k
Not Supported
Conversation, Vision
Google_gemini
This is the latest version of Gemini 1.5 Pro, dynamically pointing to the most recent snapshot version.
gemini-2.0-flash
1m
8k
Not Supported
Conversation, Vision
Google_gemini
Gemini 2.0 Flash is Google's latest model, featuring a faster Time to First Token (TTFT) compared to the 1.5 version, while maintaining a quality level comparable to Gemini Pro 1.5. This model shows significant improvements in multimodal understanding, coding ability, complex instruction following, and function calling, thereby providing a smoother and more powerful intelligent experience.
gemini-2.0-flash-exp
100k
8k
Supported
Conversation, Vision
Google_gemini
Gemini 2.0 Flash introduces a real-time multimodal API, improved speed and performance, enhanced quality, stronger agent capabilities, and adds image generation and voice conversion functions.
gemini-2.0-flash-lite-preview-02-05
1M
8k
Not Supported
Conversation, Vision
Google_gemini
Gemini 2.0 Flash-Lite is Google's latest cost-effective AI model, offering better quality at the same speed as 1.5 Flash. It supports a 1 million token context window and can handle multimodal tasks involving images, audio, and code. As Google's most cost-effective model currently, it uses a simplified single pricing strategy, making it particularly suitable for large-scale application scenarios that require cost control.
gemini-2.0-flash-thinking-exp
40k
8k
Not Supported
Conversation, Reasoning
Google_gemini
gemini-2.0-flash-thinking-exp is an experimental model that can generate the "thinking process" it goes through when formulating a response. Therefore, "thinking mode" responses have stronger reasoning capabilities compared to the basic Gemini 2.0 Flash model.
gemini-2.0-flash-thinking-exp-01-21
1m
64k
Not Supported
Conversation, Reasoning
Google_gemini
Gemini 2.0 Flash Thinking EXP-01-21 is Google's latest AI model, focusing on enhancing reasoning abilities and user interaction experience. The model has strong reasoning capabilities, especially in math and programming, and supports a context window of up to 1 million tokens, suitable for complex tasks and in-depth analysis scenarios. Its unique feature is the ability to generate its thinking process, improving the comprehensibility of AI thinking. It also supports native code execution, enhancing the flexibility and practicality of interactions. By optimizing algorithms, the model reduces logical contradictions, further improving the accuracy and consistency of its answers.
gemini-2.0-flash-thinking-exp-1219
40k
8k
Not Supported
Conversation, Reasoning, Vision
Google_gemini
gemini-2.0-flash-thinking-exp-1219 is an experimental model that can generate the "thinking process" it goes through when formulating a response. Therefore, "thinking mode" responses have stronger reasoning capabilities compared to the basic Gemini 2.0 Flash model.
gemini-2.0-pro-exp-01-28
2m
64k
Not Supported
Conversation, Vision
Google_gemini
Pre-announced model, not yet online.
gemini-2.0-pro-exp-02-05
2m
8k
Not Supported
Conversation, Vision
Google_gemini
Gemini 2.0 Pro Exp 02-05 is Google's latest experimental model released in February 2024, excelling in world knowledge, code generation, and long-text understanding. The model supports an ultra-long context window of 2 million tokens, capable of processing content equivalent to 2 hours of video, 22 hours of audio, over 60,000 lines of code, and more than 1.4 million words. As part of the Gemini 2.0 series, this model adopts a new Flash Thinking training strategy, significantly improving its performance and ranking high on several LLM leaderboards, demonstrating strong comprehensive capabilities.
gemini-exp-1114
8k
4k
Not Supported
Conversation, Vision
Google_gemini
This is an experimental model released on November 14, 2024, primarily focusing on quality improvements.
gemini-exp-1121
8k
4k
Not Supported
Conversation, Vision, Code
Google_gemini
This is an experimental model released on November 21, 2024, with improvements in coding, reasoning, and visual capabilities.
gemini-exp-1206
8k
4k
Not Supported
Conversation, Vision
Google_gemini
This is an experimental model released on December 6, 2024, with improvements in coding, reasoning, and visual capabilities.
gemini-exp-latest
8k
4k
Not Supported
Conversation, Vision
Google_gemini
This is an experimental model, dynamically pointing to the latest version.
gemini-pro
33k
8k
Not Supported
Conversation
Google_gemini
Same as gemini-1.0-pro, it is an alias for gemini-1.0-pro.
gemini-pro-vision
16k
2k
Not Supported
Conversation, Vision
Google_gemini
This is the vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.
grok-2
128k
-
Not Supported
Conversation
Grok_grok
A new version of the grok model released by X.ai on December 12, 2024.
grok-2-1212
128k
-
Not Supported
Conversation
Grok_grok
A new version of the grok model released by X.ai on December 12, 2024.
grok-2-latest
128k
-
Not Supported
Conversation
Grok_grok
A new version of the grok model released by X.ai on December 12, 2024.
grok-2-vision-1212
32k
-
Not Supported
Conversation, Vision
Grok_grok
The grok vision version model released by X.ai on December 12, 2024.
grok-beta
100k
-
Not Supported
Conversation
Grok_grok
Performance comparable to Grok 2, but with improved efficiency, speed, and functionality.
grok-vision-beta
8k
-
Not Supported
Conversation, Vision
Grok_grok
The latest image understanding model can process various visual information, including documents, charts, screenshots, and photos.
internlm/internlm2_5-20b-chat
32k
-
Supported
Conversation
internlm
InternLM2.5-20B-Chat is an open-source large-scale conversational model developed based on the InternLM2 architecture. With 20 billion parameters, this model excels in mathematical reasoning, surpassing comparable models like Llama3 and Gemma2-27B. InternLM2.5-20B-Chat has significantly improved tool-calling capabilities, supporting information collection from hundreds of web pages for analysis and reasoning, and possessing stronger instruction understanding, tool selection, and result reflection abilities.
meta-llama/Llama-3.2-11B-Vision-Instruct
8k
-
Not Supported
Conversation, Vision
Meta_llama
The current Llama series models can not only process text data but also image data. Some models in Llama 3.2 have added visual understanding functions. This model supports simultaneous input of text and image data, understands the image, and outputs text information.
meta-llama/Llama-3.2-3B-Instruct
32k
-
Not Supported
Conversation
Meta_llama
Meta Llama 3.2 multilingual Large Language Models (LLMs), where 1B and 3B are lightweight models that can run on edge and mobile devices. This model is the 3B version.
meta-llama/Llama-3.2-90B-Vision-Instruct
8k
-
Not Supported
Conversation, Vision
Meta_llama
The current Llama series models can not only process text data but also image data. Some models in Llama 3.2 have added visual understanding functions. This model supports simultaneous input of text and image data, understands the image, and outputs text information.
meta-llama/Llama-3.3-70B-Instruct
131k
-
Not Supported
Conversation
Meta_llama
Meta's latest 70B LLM, with performance comparable to Llama 3.1 405B.
meta-llama/Meta-Llama-3.1-405B-Instruct
32k
-
Not Supported
Conversation
Meta_llama
The Meta Llama 3.1 multilingual Large Language Model (LLM) collection is a set of pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. This model is the 405B version. The Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual conversations and outperform many available open-source and closed-source chat models on common industry benchmarks.
meta-llama/Meta-Llama-3.1-70B-Instruct
32k
-
Not Supported
Conversation
Meta_llama
Meta Llama 3.1 is a family of multilingual large language models developed by Meta, including pre-trained and instruction-tuned variants in 8B, 70B, and 405B parameter sizes. This 70B instruction-tuned model is optimized for multilingual conversation scenarios and performs excellently on several industry benchmarks. The model was trained on over 15 trillion tokens of public data and uses techniques like supervised fine-tuning and reinforcement learning with human feedback to enhance its usefulness and safety.
meta-llama/Meta-Llama-3.1-8B-Instruct
32k
-
Not Supported
Conversation
Meta_llama
The Meta Llama 3.1 multilingual Large Language Model (LLM) collection is a set of pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. This model is the 8B version. The Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual conversations and outperform many available open-source and closed-source chat models on common industry benchmarks.
abab5.5-chat
16k
-
Supported
Conversation
Minimax_abab
Chinese persona conversation scenarios.
abab5.5s-chat
8k
-
Supported
Conversation
Minimax_abab
Chinese persona conversation scenarios.
abab6.5g-chat
8k
-
Supported
Conversation
Minimax_abab
Persona conversation scenarios in English and other languages.
abab6.5s-chat
245k
-
Supported
Conversation
Minimax_abab
General scenarios.
abab6.5t-chat
8k
-
Supported
Conversation
Minimax_abab
Chinese persona conversation scenarios.
chatgpt-4o-latest
128k
16k
Not Supported
Conversation, Vision
OpenAI
The chatgpt-4o-latest model version continuously points to the GPT-4o version used in ChatGPT and is updated the fastest when there are significant changes.
gpt-4o-2024-11-20
128k
16k
Supported
Conversation
OpenAI
The latest gpt-4o snapshot version from November 20, 2024.
gpt-4o-audio-preview
128k
16k
Not Supported
Conversation
OpenAI
OpenAI's real-time voice conversation model.
gpt-4o-audio-preview-2024-10-01
128k
16k
Supported
Conversation
OpenAI
OpenAI's real-time voice conversation model.
o1
128k
32k
Not Supported
Conversation, Reasoning, Vision
OpenAI
OpenAI's new reasoning model for complex tasks that require extensive common sense. The model has a 200k context, is currently the most powerful model in the world, and supports image recognition.
o1-mini-2024-09-12
128k
64k
Not Supported
Conversation, Reasoning
OpenAI
A fixed snapshot version of o1-mini. It is smaller, faster, and 80% cheaper than o1-preview, performing well in code generation and small-context operations.
o1-preview-2024-09-12
128k
32k
Not Supported
Conversation, Reasoning
OpenAI
A fixed snapshot version of o1-preview.
gpt-3.5-turbo
16k
4k
Supported
Conversation
OpenAI_gpt-3
Based on GPT-3.5: GPT-3.5 Turbo is an improved version built on the GPT-3.5 model, developed by OpenAI. Performance Goals: Designed to improve model inference speed, processing efficiency, and resource utilization through optimized model structure and algorithms. Increased Inference Speed: Compared to GPT-3.5, GPT-3.5 Turbo typically offers faster inference speeds on the same hardware, which is particularly beneficial for applications requiring large-scale text processing. Higher Throughput: When processing a large number of requests or data, GPT-3.5 Turbo can achieve higher concurrent processing capabilities, thereby increasing overall system throughput. Optimized Resource Consumption: While maintaining performance, it may have reduced demand for hardware resources (such as memory and computing resources), which helps lower operating costs and improve system scalability. Wide Range of NLP Tasks: GPT-3.5 Turbo is suitable for a variety of natural language processing tasks, including but not limited to text generation, semantic understanding, dialogue systems, and machine translation. Developer Tools and API Support: Provides API interfaces that are easy for developers to integrate and use, supporting rapid application development and deployment.
gpt-3.5-turbo-0125
16k
4k
Supported
Conversation
OpenAI_gpt-3
An updated GPT 3.5 Turbo model with higher accuracy in responding to requested formats and a fix for a bug that caused text encoding issues for non-English language function calls. Returns a maximum of 4,096 output tokens.
gpt-3.5-turbo-0613
16k
4k
Supported
Conversation
OpenAI_gpt-3
Updated fixed snapshot version of GPT 3.5 Turbo. Now deprecated.
gpt-3.5-turbo-1106
16k
4k
Supported
Conversation
OpenAI_gpt-3
Features improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens.
gpt-3.5-turbo-16k
16k
4k
Supported
Conversation, Deprecated or soon to be deprecated
OpenAI_gpt-3
(Deprecated)
gpt-3.5-turbo-16k-0613
16k
4k
Supported
Conversation, Deprecated or soon to be deprecated
OpenAI_gpt-3
A snapshot of gpt-3.5-turbo from June 13, 2023. (Deprecated)
gpt-3.5-turbo-instruct
4k
4k
Supported
Conversation
OpenAI_gpt-3
Capabilities similar to GPT-3 era models. Compatible with the legacy Completions endpoint, not for Chat Completions.
gpt-3.5o
16k
4k
Not Supported
Conversation
OpenAI_gpt-3
Same as gpt-4o-lite.
gpt-4
8k
8k
Supported
Conversation
OpenAI_gpt-4
Currently points to gpt-4-0613.
gpt-4-0125-preview
128k
4k
Supported
Conversation
OpenAI_gpt-4
The latest GPT-4 model, designed to reduce "laziness" where the model does not complete tasks. Returns a maximum of 4,096 output tokens.
gpt-4-0314
8k
8k
Supported
Conversation
OpenAI_gpt-4
A snapshot of gpt-4 from March 14, 2023.
gpt-4-0613
8k
8k
Supported
Conversation
OpenAI_gpt-4
A snapshot of gpt-4 from June 13, 2023, with enhanced function calling support.
gpt-4-1106-preview
128k
4k
Supported
Conversation
OpenAI_gpt-4
A GPT-4 Turbo model with improved instruction following, JSON mode, reproducible outputs, function calling, and more. Returns a maximum of 4,096 output tokens. This is a preview model.
gpt-4-32k
32k
4k
Supported
Conversation
OpenAI_gpt-4
gpt-4-32k will be deprecated on 2025-06-06.
gpt-4-32k-0613
32k
4k
Supported
Conversation, Deprecated or soon to be deprecated
OpenAI_gpt-4
Will be deprecated on 2025-06-06.
gpt-4-turbo
128k
4k
Supported
Conversation
OpenAI_gpt-4
The latest version of the GPT-4 Turbo model adds vision capabilities, supporting visual requests via JSON mode and function calling. The current version of this model is gpt-4-turbo-2024-04-09.
gpt-4-turbo-2024-04-09
128k
4k
Supported
Conversation
OpenAI_gpt-4
GPT-4 Turbo model with vision capabilities. Vision requests can now be made via JSON mode and function calling. gpt-4-turbo currently points to this version.
gpt-4-turbo-preview
128k
4k
Supported
Conversation, Vision
OpenAI_gpt-4
Currently points to gpt-4-0125-preview.
gpt-4o
128k
16k
Supported
Conversation, Vision
OpenAI_gpt-4
OpenAI's highly intelligent flagship model, suitable for complex, multi-step tasks. GPT-4o is cheaper and faster than GPT-4 Turbo.
gpt-4o-2024-05-13
128k
4k
Supported
Conversation, Vision
OpenAI_gpt-4
The original gpt-4o snapshot from May 13, 2024.
gpt-4o-2024-08-06
128k
16k
Supported
Conversation, Vision
OpenAI_gpt-4
The first snapshot to support structured outputs. gpt-4o currently points to this version.
gpt-4o-mini
128k
16k
Supported
Conversation, Vision
OpenAI_gpt-4
OpenAI's affordable version of gpt-4o, suitable for fast, lightweight tasks. GPT-4o mini is cheaper and more powerful than GPT-3.5 Turbo. Currently points to gpt-4o-mini-2024-07-18.
gpt-4o-mini-2024-07-18
128k
16k
Supported
Conversation, Vision
OpenAI_gpt-4
A fixed snapshot version of gpt-4o-mini.
gpt-4o-realtime-preview
128k
4k
Supported
Conversation, Real-time Voice
OpenAI_gpt-4
OpenAI's real-time voice conversation model.
gpt-4o-realtime-preview-2024-10-01
128k
4k
Supported
Conversation, Real-time Voice, Vision
OpenAI_gpt-4
gpt-4o-realtime-preview currently points to this snapshot version.
o1-mini
128k
64k
Not Supported
Conversation, Reasoning
OpenAI_o1
Smaller, faster, and 80% cheaper than o1-preview, performing well in code generation and small-context operations.
o1-preview
128k
32k
Not Supported
Conversation, Reasoning
OpenAI_o1
o1-preview is a new reasoning model for complex tasks that require extensive common sense. The model has a 128K context and a knowledge cutoff of October 2023. It focuses on advanced reasoning and solving complex problems, including mathematical and scientific tasks. It is ideal for applications requiring deep contextual understanding and autonomous workflows.
o3-mini
200k
100k
Supported
Conversation, Reasoning
OpenAI_o1
o3-mini is OpenAI's latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on science, math, and coding tasks, supports developer features like structured output, function calling, and batch API, with a knowledge cutoff of October 2023, demonstrating a significant balance in reasoning capability and cost-effectiveness.
o3-mini-2025-01-31
200k
100k
Supported
Conversation, Reasoning
OpenAI_o1
o3-mini currently points to this version. o3-mini-2025-01-31 is OpenAI's latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on science, math, and coding tasks, supports developer features like structured output, function calling, and batch API, with a knowledge cutoff of October 2023, demonstrating a significant balance in reasoning capability and cost-effectiveness.
Baichuan2-Turbo
32k
-
Not Supported
Conversation
Baichuan_baichuan
Compared to similarly sized models in the industry, this model maintains a leading performance while significantly reducing the price.
Baichuan3-Turbo
32k
-
Not Supported
Conversation
Baichuan_baichuan
Compared to similarly sized models in the industry, this model maintains a leading performance while significantly reducing the price.
Baichuan3-Turbo-128k
128k
-
Not Supported
Conversation
Baichuan_baichuan
The Baichuan model processes complex text with a 128k ultra-long context window, is specifically optimized for industries like finance, and significantly reduces costs while maintaining high performance, providing a cost-effective solution for enterprises.
Baichuan4
32k
-
Not Supported
Conversation
Baichuan_baichuan
Baichuan's MoE model provides a highly efficient and cost-effective solution for enterprise applications through specialized optimization, cost reduction, and performance enhancement.
Baichuan4-Air
32k
-
Not Supported
Conversation
Baichuan_baichuan
Baichuan's MoE model provides a highly efficient and cost-effective solution for enterprise applications through specialized optimization, cost reduction, and performance enhancement.
Baichuan4-Turbo
32k
-
Not Supported
Conversation
Baichuan_baichuan
Trained on massive high-quality scenario data, usability in high-frequency enterprise scenarios is improved by 10%+ compared to Baichuan4, information summarization by 50%, multilingual capabilities by 31%, and content generation by 13%. Specially optimized for inference performance, the first token response speed is increased by 51% and token stream speed by 73% compared to Baichuan4.
ERNIE-3.5-128K
128k
4k
Supported
Conversation
Baidu_ernie
Baidu's self-developed flagship large language model, covering massive Chinese and English corpora, with powerful general capabilities to meet most dialogue, Q&A, creative generation, and plugin application requirements. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information.
ERNIE-3.5-8K
8k
1k
Supported
Conversation
Baidu_ernie
Baidu's self-developed flagship large language model, covering massive Chinese and English corpora, with powerful general capabilities to meet most dialogue, Q&A, creative generation, and plugin application requirements. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information.
ERNIE-3.5-8K-Preview
8k
1k
Supported
Conversation
Baidu_ernie
Baidu's self-developed flagship large language model, covering massive Chinese and English corpora, with powerful general capabilities to meet most dialogue, Q&A, creative generation, and plugin application requirements. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information.
ERNIE-4.0-8K
8k
1k
Supported
Conversation
Baidu_ernie
Baidu's self-developed flagship ultra-large-scale language model. Compared to ERNIE 3.5, it has a comprehensive upgrade in model capabilities, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information.
ERNIE-4.0-8K-Latest
8k
2k
Supported
Conversation
Baidu_ernie
ERNIE-4.0-8K-Latest has fully improved capabilities compared to ERNIE-4.0-8K, with significant enhancements in role-playing and instruction-following abilities. Compared to ERNIE 3.5, it has a comprehensive upgrade in model capabilities, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information, and supports 5K tokens input + 2K tokens output. This document introduces the method for calling the ERNIE-4.0-8K-Latest API.
ERNIE-4.0-8K-Preview
8k
1k
Supported
Conversation
Baidu_ernie
Baidu's self-developed flagship ultra-large-scale language model. Compared to ERNIE 3.5, it has a comprehensive upgrade in model capabilities, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information.
ERNIE-4.0-Turbo-128K
128k
4k
Supported
Conversation
Baidu_ernie
ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large-scale language model with outstanding overall performance, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information. It has better performance compared to ERNIE 4.0. ERNIE-4.0-Turbo-128K is a version of the model with better overall performance on long documents than ERNIE-3.5-128K. This document introduces the relevant API and its usage.
ERNIE-4.0-Turbo-8K
8k
2k
Supported
Conversation
Baidu_ernie
ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large-scale language model with outstanding overall performance, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information. It has better performance compared to ERNIE 4.0. ERNIE-4.0-Turbo-8K is a version of the model. This document introduces the relevant API and its usage.
ERNIE-4.0-Turbo-8K-Latest
8k
2k
Supported
Conversation
Baidu_ernie
ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large-scale language model with outstanding overall performance, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information. It has better performance compared to ERNIE 4.0. ERNIE-4.0-Turbo-8K is a version of the model.
ERNIE-4.0-Turbo-8K-Preview
8k
2k
Supported
Conversation
Baidu_ernie
ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large-scale language model with outstanding overall performance, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information. ERNIE-4.0-Turbo-8K-Preview is a version of the model.
ERNIE-Character-8K
8k
1k
Not Supported
Conversation
Baidu_ernie
Baidu's self-developed vertical large language model, suitable for application scenarios such as game NPCs, customer service dialogues, and dialogue role-playing. It has a more distinct and consistent persona style, stronger instruction-following ability, and better inference performance.
ERNIE-Lite-8K
8k
4k
Not Supported
Conversation
Baidu_ernie
Baidu's self-developed lightweight large language model, balancing excellent model performance with inference efficiency, suitable for inference on low-power AI accelerator cards.
ERNIE-Lite-Pro-128K
128k
2k
Supported
Conversation
Baidu_ernie
Baidu's self-developed lightweight large language model, with better performance than ERNIE Lite, balancing excellent model performance with inference efficiency, suitable for inference on low-power AI accelerator cards. ERNIE-Lite-Pro-128K supports a 128K context length and has better performance than ERNIE-Lite-128K.
ERNIE-Novel-8K
8k
2k
Not Supported
Conversation
Baidu_ernie
ERNIE-Novel-8K is Baidu's self-developed general-purpose large language model, with a significant advantage in novel continuation capabilities. It can also be used in scenarios like short dramas and movies.
ERNIE-Speed-128K
128k
4k
Not Supported
Conversation
Baidu_ernie
Baidu's latest self-developed high-performance large language model released in 2024, with excellent general capabilities. It is suitable as a base model for fine-tuning to better handle specific scenario problems, while also having excellent inference performance.
ERNIE-Speed-8K
8k
1k
Not Supported
Conversation
Baidu_ernie
Baidu's latest self-developed high-performance large language model released in 2024, with excellent general capabilities. It is suitable as a base model for fine-tuning to better handle specific scenario problems, while also having excellent inference performance.
ERNIE-Speed-Pro-128K
128k
4k
Not Supported
Conversation
Baidu_ernie
ERNIE Speed Pro is Baidu's latest self-developed high-performance large language model released in 2024, with excellent general capabilities. It is suitable as a base model for fine-tuning to better handle specific scenario problems, while also having excellent inference performance. ERNIE-Speed-Pro-128K is the initial version released on August 30, 2024, supporting a 128K context length and having better performance than ERNIE-Speed-128K.
ERNIE-Tiny-8K
8k
1k
Not Supported
Conversation
Baidu_ernie
Baidu's self-developed ultra-high-performance large language model, with the lowest deployment and fine-tuning costs in the ERNIE series.
Doubao-1.5-lite-32k
32k
12k
Supported
Conversation
Doubao_doubao
Doubao1.5-lite is also among the world's top-tier lightweight language models, matching or surpassing GPT-4o mini and Claude 3.5 Haiku on authoritative evaluation benchmarks for general knowledge (MMLU_pro), reasoning (BBH), math (MATH), and professional knowledge (GPQA).
Doubao-1.5-pro-256k
256k
12k
Supported
Conversation
Doubao_doubao
Doubao-1.5-Pro-256k, a fully upgraded version based on Doubao-1.5-Pro. Compared to Doubao-pro-256k/241115, the overall performance is significantly improved by 10%. The output length is greatly increased, supporting up to 12k tokens.
Doubao-1.5-pro-32k
32k
12k
Supported
Conversation
Doubao_doubao
Doubao-1.5-pro, a new generation flagship model with comprehensive performance upgrades, excelling in knowledge, code, reasoning, and more. It achieves world-leading performance on multiple public evaluation benchmarks, especially achieving the best scores on knowledge, code, reasoning, and Chinese authoritative benchmarks, with a composite score superior to top industry models like GPT4o and Claude 3.5 Sonnet.
Doubao-1.5-vision-pro
32k
12k
Not Supported
Conversation, Vision
Doubao_doubao
Doubao-1.5-vision-pro, a newly upgraded multimodal large model, supports image recognition of any resolution and extreme aspect ratios, enhancing visual reasoning, document recognition, detailed information understanding, and instruction-following capabilities.
Doubao-embedding
4k
-
Supported
Embedding
Doubao_doubao
Doubao-embedding is a semantic vectorization model developed by ByteDance, primarily for vector retrieval scenarios. It supports Chinese and English, with a maximum context length of 4K. The following versions are currently available: text-240715: Maximum vector dimension of 2560, supports dimensionality reduction to 512, 1024, and 2048. Chinese and English retrieval performance is significantly improved compared to the text-240515 version, and this version is recommended. text-240515: Maximum vector dimension of 2048, supports dimensionality reduction to 512 and 1024.
Doubao-embedding-large
4k
-
Not Supported
Embedding
Doubao_doubao
Chinese and English retrieval performance is significantly improved compared to the Doubao-embedding/text-240715 version.
Doubao-embedding-vision
8k
-
Not Supported
Embedding
Doubao_doubao
Doubao-embedding-vision, a newly upgraded image-text multimodal vectorization model, is primarily for image-text multi-vector retrieval scenarios. It supports image input and Chinese/English text input, with a maximum context length of 8K.
Doubao-lite-128k
128k
4k
Supported
Conversation
Doubao_doubao
Doubao-lite offers extremely fast response speeds and better cost-effectiveness, providing more flexible choices for customers in different scenarios. Supports inference and fine-tuning with a 128k context window.
Doubao-lite-32k
32k
4k
Supported
Conversation
Doubao_doubao
Doubao-lite offers extremely fast response speeds and better cost-effectiveness, providing more flexible choices for customers in different scenarios. Supports inference and fine-tuning with a 32k context window.
Doubao-lite-4k
4k
4k
Supported
Conversation
Doubao_doubao
Doubao-lite offers extremely fast response speeds and better cost-effectiveness, providing more flexible choices for customers in different scenarios. Supports inference and fine-tuning with a 4k context window.
Doubao-pro-128k
128k
4k
Supported
Conversation
Doubao_doubao
The flagship model with the best performance, suitable for handling complex tasks, with excellent results in reference Q&A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 128k context window.
Doubao-pro-32k
32k
4k
Supported
Conversation
Doubao_doubao
The flagship model with the best performance, suitable for handling complex tasks, with excellent results in reference Q&A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 32k context window.
Doubao-pro-4k
4k
4k
Supported
Conversation
Doubao_doubao
The flagship model with the best performance, suitable for handling complex tasks, with excellent results in reference Q&A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 4k context window.
step-1-128k
128k
-
Supported
Conversation
StepFun
The step-1-128k model is an ultra-large-scale language model capable of processing inputs of up to 128,000 tokens. This capability gives it a significant advantage in generating long-form content and performing complex reasoning, making it suitable for applications that require rich context, such as writing novels and scripts.
step-1-256k
256k
-
Supported
Conversation
StepFun
The step-1-256k model is one of the largest language models available, supporting inputs of 256,000 tokens. It is designed to meet extremely complex task requirements, such as large-scale data analysis and multi-turn dialogue systems, and can provide high-quality output in various domains.
step-1-32k
32k
-
Supported
Conversation
StepFun
The step-1-32k model extends the context window to support 32,000 tokens of input. This makes it perform excellently when handling long articles and complex conversations, suitable for tasks that require deep understanding and analysis, such as legal documents and academic research.
step-1-8k
8k
-
Supported
Conversation
StepFun
The step-1-8k model is an efficient language model designed for processing shorter texts. It can perform reasoning within a context of 8,000 tokens, making it suitable for application scenarios that require quick responses, such as chatbots and real-time translation.
step-1-flash
8k
-
Supported
Conversation
StepFun
The step-1-flash model focuses on rapid response and efficient processing, suitable for real-time applications. Its design allows it to provide high-quality language understanding and generation capabilities even with limited computing resources, making it suitable for mobile devices and edge computing scenarios.
step-1.5v-mini
32k
-
Supported
Conversation, Vision
StepFun
The step-1.5v-mini model is a lightweight version designed to run in resource-constrained environments. Despite its small size, it still retains good language processing capabilities, making it suitable for embedded systems and low-power devices.
step-1v-32k
32k
-
Supported
Conversation, Vision
StepFun
The step-1v-32k model supports inputs of 32,000 tokens, suitable for applications requiring longer context. It performs excellently in handling complex dialogues and long texts, making it suitable for fields such as customer service and content creation.
step-1v-8k
8k
-
Supported
Conversation, Vision
StepFun
The step-1v-8k model is an optimized version designed for 8,000-token inputs, suitable for fast generation and processing of short texts. It strikes a good balance between speed and accuracy, making it suitable for real-time applications.
step-2-16k
16k
-
Supported
Conversation
StepFun
The step-2-16k model is a medium-sized language model supporting 16,000 tokens of input. It performs well in various tasks and is suitable for application scenarios such as education, training, and knowledge management.
yi-lightning
16k
-
Supported
Conversation
01.AI_yi
The latest high-performance model, ensuring high-quality output while significantly increasing inference speed. Suitable for real-time interaction and highly complex reasoning scenarios, its extremely high cost-effectiveness can provide excellent support for commercial products.
yi-vision-v2
16K
-
Supported
Conversation, Vision
01.AI_yi
Suitable for scenarios that require analyzing and interpreting images and charts, such as image Q&A, chart understanding, OCR, visual reasoning, education, research report understanding, or multilingual document reading.
qwen-14b-chat
8k
2k
Supported
Conversation
Qwen_qwen
Alibaba Cloud's official open-source version of Tongyi Qianwen.
qwen-72b-chat
32k
2k
Supported
Conversation
Qwen_qwen
Alibaba Cloud's official open-source version of Tongyi Qianwen.
qwen-7b-chat
7.5k
1.5k
Supported
Conversation
Qwen_qwen
Alibaba Cloud's official open-source version of Tongyi Qianwen.
qwen-coder-plus
128k
8k
Supported
Conversation, Code
Qwen_qwen
Qwen-Coder-Plus is a programming-specific model in the Qwen series, designed to enhance code generation and understanding capabilities. Trained on a large scale of programming data, this model can handle multiple programming languages and supports functions like code completion, error detection, and code refactoring. Its design goal is to provide developers with more efficient programming assistance and improve development efficiency.
qwen-coder-plus-latest
128k
8k
Supported
Conversation, Code
Qwen_qwen
Qwen-Coder-Plus-Latest is the newest version of Qwen-Coder-Plus, incorporating the latest algorithm optimizations and dataset updates. This model shows significant performance improvements, enabling it to understand context more accurately and generate code that better meets developers' needs. It also introduces support for more programming languages, enhancing its multilingual programming capabilities.
qwen-coder-turbo
128k
8k
Supported
Conversation, Code
Qwen_qwen
The Tongyi Qianwen series of code and programming models are language models specifically for programming and code generation, featuring fast inference speed and low cost. This version always points to the latest stable snapshot.
qwen-coder-turbo-latest
128k
8k
Supported
Conversation, Code
Qwen_qwen
The Tongyi Qianwen series of code and programming models are language models specifically for programming and code generation, featuring fast inference speed and low cost. This version always points to the latest snapshot.
qwen-long
10m
6k
Supported
Conversation
Qwen_qwen
Qwen-Long is a large language model from Tongyi Qianwen for ultra-long context processing scenarios. It supports input in different languages such as Chinese and English, and supports ultra-long context dialogues of up to 10 million tokens (about 15 million words or 15,000 pages of documents). Combined with the synchronously launched document service, it can parse and have dialogues on various document formats such as Word, PDF, Markdown, EPUB, and MOBI. Note: For requests submitted directly via HTTP, it supports a length of 1M tokens. For lengths exceeding this, it is recommended to submit via file.
qwen-math-plus
4k
3k
Supported
Conversation
Qwen_qwen
Qwen-Math-Plus is a model focused on solving mathematical problems, designed to provide efficient mathematical reasoning and calculation capabilities. Trained on a large number of math problems, this model can handle complex mathematical expressions and problems, supporting a variety of calculation needs from basic arithmetic to higher mathematics. Its application scenarios include education, scientific research, and engineering.
qwen-math-plus-latest
4k
3k
Supported
Conversation
Qwen_qwen
Qwen-Math-Plus-Latest is the newest version of Qwen-Math-Plus, integrating the latest mathematical reasoning techniques and algorithm improvements. This model performs better in handling complex mathematical problems, providing more accurate solutions and reasoning processes. It also expands its understanding of mathematical symbols and formulas, making it suitable for a wider range of mathematical applications.
qwen-math-turbo
4k
3k
Supported
Conversation
Qwen_qwen
Qwen-Math-Turbo is a high-performance mathematical model designed for fast calculation and real-time inference. This model optimizes calculation speed, enabling it to process a large number of mathematical problems in a very short time, suitable for application scenarios that require quick feedback, such as online education and real-time data analysis. Its efficient algorithms allow users to get instant results in complex calculations.
qwen-math-turbo-latest
4k
3k
Supported
Conversation
Qwen_qwen
Qwen-Math-Turbo-Latest is the newest version of Qwen-Math-Turbo, further improving calculation efficiency and accuracy. This model has undergone multiple algorithmic optimizations, enabling it to handle more complex mathematical problems and maintain high efficiency in real-time inference. It is suitable for mathematical applications that require rapid response, such as financial analysis and scientific computing.
qwen-max
32k
8k
Supported
Conversation
Qwen_qwen
The Tongyi Qianwen 2.5 series hundred-billion-level ultra-large-scale language model supports input in different languages such as Chinese and English. As the model is upgraded, qwen-max will be updated on a rolling basis.
qwen-max-latest
32k
8k
Supported
Conversation
Qwen_qwen
The best-performing model in the Tongyi Qianwen series. This model is a dynamically updated version, and model updates will not be announced in advance. It is suitable for complex, multi-step tasks. The model's comprehensive abilities in Chinese and English are significantly improved, human preference is significantly enhanced, reasoning ability and complex instruction understanding are significantly strengthened, performance on difficult tasks is better, and math and code abilities are significantly improved. It also has enhanced understanding and generation capabilities for structured data like tables and JSON.
qwen-plus
128k
8k
Supported
Conversation
Qwen_qwen
A well-balanced model in the Tongyi Qianwen series, with inference performance and speed between Tongyi Qianwen-Max and Tongyi Qianwen-Turbo, suitable for moderately complex tasks. The model's comprehensive abilities in Chinese and English are significantly improved, human preference is significantly enhanced, reasoning ability and complex instruction understanding are significantly strengthened, performance on difficult tasks is better, and math and code abilities are significantly improved.
qwen-plus-latest
128k
8k
Supported
Conversation
Qwen_qwen
Qwen-Plus is an enhanced version of the visual language model in the Tongyi Qianwen series, designed to improve detail recognition and text recognition capabilities. This model supports images with resolutions over one million pixels and any aspect ratio, performing excellently in a wide range of visual language tasks, making it suitable for applications requiring high-precision image understanding.
qwen-turbo
128k
8k
Supported
Conversation
Qwen_qwen
The fastest and most cost-effective model in the Tongyi Qianwen series, suitable for simple tasks. The model's comprehensive abilities in Chinese and English are significantly improved, human preference is significantly enhanced, reasoning ability and complex instruction understanding are significantly strengthened, performance on difficult tasks is better, and math and code abilities are significantly improved.
qwen-turbo-latest
1m
8k
Supported
Conversation
Qwen_qwen
Qwen-Turbo is an efficient model designed for simple tasks, emphasizing speed and cost-effectiveness. It performs excellently in basic visual language tasks and is suitable for applications with strict response time requirements, such as real-time image recognition and simple Q&A systems.
qwen-vl-max
32k
2k
Supported
Conversation
Qwen_qwen
Tongyi Qianwen VL-Max (qwen-vl-max), the ultra-large-scale visual language model from Tongyi Qianwen. Compared to the enhanced version, it further improves visual reasoning and instruction-following capabilities, providing a higher level of visual perception and cognition. It offers the best performance on more complex tasks.
qwen-vl-max-latest
32k
2k
Supported
Conversation, Vision
Qwen_qwen
Qwen-VL-Max is the most advanced version in the Qwen-VL series, designed to solve complex multimodal tasks. It combines advanced visual and language processing technologies, capable of understanding and analyzing high-resolution images with extremely strong reasoning abilities, suitable for applications requiring deep understanding and complex reasoning.
qwen-vl-ocr
34k
4k
Supported
Conversation, Vision
Qwen_qwen
Only supports OCR, not conversation.
qwen-vl-ocr-latest
34k
4k
Supported
Conversation, Vision
Qwen_qwen
Only supports OCR, not conversation.
qwen-vl-plus
8k
2k
Supported
Conversation, Vision
Qwen_qwen
Tongyi Qianwen VL-Plus (qwen-vl-plus), the enhanced version of the Tongyi Qianwen large-scale visual language model. It significantly improves detail recognition and text recognition capabilities, supports images with resolutions over one million pixels and any aspect ratio. It provides excellent performance on a wide range of visual tasks.
qwen-vl-plus-latest
32k
2k
Supported
Conversation, Vision
Qwen_qwen
Qwen-VL-Plus-Latest is the newest version of Qwen-VL-Plus, enhancing the model's multimodal understanding capabilities. It excels in the combined processing of images and text, making it suitable for applications that need to efficiently handle multiple input formats, such as intelligent customer service and content generation.
Qwen/Qwen2-1.5B-Instruct
32k
6k
Not Supported
Conversation
Qwen_qwen
Qwen2-1.5B-Instruct is an instruction-tuned large language model in the Qwen2 series with a parameter size of 1.5B. Based on the Transformer architecture, the model uses SwiGLU activation functions, attention QKV biases, and group query attention. It excels in multiple benchmark tests for language understanding, generation, multilingual capabilities, coding, math, and reasoning, surpassing most open-source models.
Qwen/Qwen2-72B-Instruct
128k
6k
Not Supported
Conversation
Qwen_qwen
Qwen2-72B-Instruct is an instruction-tuned large language model in the Qwen2 series with a parameter size of 72B. Based on the Transformer architecture, the model uses SwiGLU activation functions, attention QKV biases, and group query attention. It can handle large-scale inputs. The model excels in multiple benchmark tests for language understanding, generation, multilingual capabilities, coding, math, and reasoning, surpassing most open-source models.
Qwen/Qwen2-7B-Instruct
128k
6k
Not Supported
Conversation
Qwen_qwen
Qwen2-7B-Instruct is an instruction-tuned large language model in the Qwen2 series with a parameter size of 7B. Based on the Transformer architecture, the model uses SwiGLU activation functions, attention QKV biases, and group query attention. It can handle large-scale inputs. The model excels in multiple benchmark tests for language understanding, generation, multilingual capabilities, coding, math, and reasoning, surpassing most open-source models.
Qwen/Qwen2-VL-72B-Instruct
32k
2k
Not Supported
Conversation
Qwen_qwen
Qwen2-VL is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance in visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can understand videos over 20 minutes long for high-quality video-based Q&A, dialogue, and content creation. It also has complex reasoning and decision-making capabilities, and can be integrated with mobile devices, robots, etc., for automated operations based on visual environments and text instructions.
Qwen/Qwen2-VL-7B-Instruct
32k
-
Not Supported
Conversation
Qwen_qwen
Qwen2-VL-7B-Instruct is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance in visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can be used for high-quality video-based Q&A, dialogue, and content creation, and also has complex reasoning and decision-making capabilities, and can be integrated with mobile devices, robots, etc., for automated operations based on visual environments and text instructions.
Qwen/Qwen2.5-72B-Instruct
128k
8k
Not Supported
Conversation
Qwen_qwen
Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 72B model has significantly improved capabilities in areas such as coding and mathematics. It supports inputs of up to 128K tokens and can generate long texts of over 8K tokens.
Qwen/Qwen2.5-72B-Instruct-128K
128k
8k
Not Supported
Conversation
Qwen_qwen
Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 72B model has significantly improved capabilities in areas such as coding and mathematics. It supports inputs of up to 128K tokens and can generate long texts of over 8K tokens.
Qwen/Qwen2.5-7B-Instruct
128k
8k
Not Supported
Conversation
Qwen_qwen
Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model has significantly improved capabilities in areas such as coding and mathematics. The model also provides multilingual support, covering over 29 languages, including Chinese and English. The model has significant improvements in instruction following, understanding structured data, and generating structured output (especially JSON).
Qwen/Qwen2.5-Coder-32B-Instruct
128k
8k
Not Supported
Conversation, Code
Qwen_qwen
Qwen2.5-32B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 32B model has significantly improved capabilities in areas such as coding and mathematics. The model also provides multilingual support, covering over 29 languages, including Chinese and English. The model has significant improvements in instruction following, understanding structured data, and generating structured output (especially JSON).
Qwen/Qwen2.5-Coder-7B-Instruct
128k
8k
Not Supported
Conversation
Qwen_qwen
Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model has significantly improved capabilities in areas such as coding and mathematics. The model also provides multilingual support, covering over 29 languages, including Chinese and English. The model has significant improvements in instruction following, understanding structured data, and generating structured output (especially JSON).
Qwen/QwQ-32B-Preview
32k
16k
Not Supported
Conversation, Reasoning
Qwen_qwen
QwQ-32B-Preview is an experimental research model developed by the Qwen team, aimed at enhancing the reasoning capabilities of artificial intelligence. As a preview version, it demonstrates excellent analytical abilities, but also has some important limitations: 1. Language mixing and code-switching: The model may mix languages or switch between languages unexpectedly, affecting the clarity of the response. 2. Recursive reasoning loops: The model may enter a cyclic reasoning mode, leading to lengthy answers without a clear conclusion. 3. Safety and ethical considerations: The model requires strengthened safety measures to ensure reliable and safe performance, and users should exercise caution when using it. 4. Performance and benchmark limitations: The model performs excellently in mathematics and programming, but there is still room for improvement in other areas such as common sense reasoning and nuanced language understanding.
qwen1.5-110b-chat
32k
8k
Not Supported
Conversation
Qwen_qwen
-
qwen1.5-14b-chat
8k
2k
Not Supported
Conversation
Qwen_qwen
-
qwen1.5-32b-chat
32k
2k
Not Supported
Conversation
Qwen_qwen
-
qwen1.5-72b-chat
32k
2k
Not Supported
Conversation
Qwen_qwen
-
qwen1.5-7b-chat
8k
2k
Not Supported
Conversation
Qwen_qwen
-
qwen2-57b-a14b-instruct
65k
6k
Not Supported
Conversation
Qwen_qwen
-
Qwen2-72B-Instruct
-
-
Not Supported
Conversation
Qwen_qwen
-
qwen2-7b-instruct
128k
6k
Not Supported
Conversation
Qwen_qwen
-
qwen2-math-72b-instruct
4k
3k
Not Supported
Conversation
Qwen_qwen
-
qwen2-math-7b-instruct
4k
3k
Not Supported
Conversation
Qwen_qwen
-
qwen2.5-14b-instruct
128k
8k
Not Supported
Conversation
Qwen_qwen
-
qwen2.5-32b-instruct
128k
8k
Not Supported
Conversation
Qwen_qwen
-
qwen2.5-72b-instruct
128k
8k
Not Supported
Conversation
Qwen_qwen
-
qwen2.5-7b-instruct
128k
8k
Not Supported
Conversation
Qwen_qwen
-
qwen2.5-coder-14b-instruct
128k
8k
Not Supported
Conversation, Code
Qwen_qwen
-
qwen2.5-coder-32b-instruct
128k
8k
Not Supported
Conversation, Code
Qwen_qwen
-
qwen2.5-coder-7b-instruct
128k
8k
Not Supported
Conversation, Code
Qwen_qwen
-
qwen2.5-math-72b-instruct
4k
3k
Not Supported
Conversation
Qwen_qwen
-
qwen2.5-math-7b-instruct
4k
3k
Not Supported
Conversation
Qwen_qwen
-
deepseek-ai/DeepSeek-R1
64k
-
Not Supported
Conversation, Reasoning
DeepSeek_deepseek
The DeepSeek-R1 model is an open-source reasoning model based purely on reinforcement learning. It excels in tasks such as mathematics, code, and natural language reasoning, with performance comparable to OpenAI's o1 model and achieving excellent results in several benchmark tests.
deepseek-ai/DeepSeek-V2-Chat
128k
-
Not Supported
Conversation
DeepSeek_deepseek
DeepSeek-V2 is a powerful, cost-effective Mixture-of-Experts (MoE) language model. It was pre-trained on a high-quality corpus of 8.1 trillion tokens and further enhanced with Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). Compared to DeepSeek 67B, DeepSeek-V2 achieves stronger performance while saving 42.5% in training costs, reducing KV cache by 93.3%, and increasing maximum generation throughput by 5.76 times.
deepseek-ai/DeepSeek-V2.5
32k
-
Supported
Conversation
DeepSeek_deepseek
DeepSeek-V2.5 is an upgraded version of DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating the general and coding capabilities of the two previous versions. This model has been optimized in several aspects, including writing and instruction-following abilities, to better align with human preferences.
deepseek-ai/DeepSeek-V3
128k
4k
Not Supported
Conversation
DeepSeek_deepseek
Open-source version of deepseek. Compared to the official version, it has a longer context and no issues with sensitive word refusal.
deepseek-chat
64k
8k
Supported
Conversation
DeepSeek_deepseek
236B parameters, 64K context (API), top-ranked on the open-source leaderboard for Chinese comprehensive ability (AlignBench), and in the same tier as closed-source models like GPT-4-Turbo and ERNIE 4.0 in evaluations.
deepseek-coder
64k
8k
Supported
Conversation, Code
DeepSeek_deepseek
236B parameters, 64K context (API), top-ranked on the open-source leaderboard for Chinese comprehensive ability (AlignBench), and in the same tier as closed-source models like GPT-4-Turbo and ERNIE 4.0 in evaluations.
deepseek-reasoner
64k
8k
Supported
Conversation, Reasoning
DeepSeek_deepseek
DeepSeek-Reasoner (DeepSeek-R1) is the latest reasoning model from DeepSeek, designed to enhance reasoning capabilities through reinforcement learning training. The model's reasoning process involves a large amount of reflection and validation, enabling it to handle complex logical reasoning tasks, with a chain-of-thought length that can reach tens of thousands of words. DeepSeek-R1 excels in solving mathematical, coding, and other complex problems and has been widely applied in various scenarios, demonstrating its powerful reasoning ability and flexibility. Compared to other models, DeepSeek-R1's reasoning performance is close to that of top-tier closed-source models, showcasing the potential and competitiveness of open-source models in the field of reasoning.
hunyuan-code
4k
4k
Not Supported
Conversation, Code
Tencent_hunyuan
Hunyuan's latest code generation model. The base model was augmented with 200B high-quality code data and trained with high-quality SFT data for half a year. The context window length has been increased to 8K. It ranks at the top in automatic evaluation metrics for code generation in five major languages. In high-quality manual evaluations of 10 comprehensive code tasks across five major languages, its performance is in the top tier.
hunyuan-functioncall
28k
4k
Supported
Conversation
Tencent_hunyuan
Hunyuan's latest MOE architecture FunctionCall model, trained with high-quality FunctionCall data, with a context window of up to 32K, leading in evaluation metrics across multiple dimensions.
hunyuan-large
28k
4k
Not Supported
Conversation
Tencent_hunyuan
The Hunyuan-large model has a total of about 389B parameters, with about 52B activated parameters, making it the open-source MoE model with the largest parameter scale and best performance in the industry.
hunyuan-large-longcontext
128k
6k
Not Supported
Conversation
Tencent_hunyuan
Excels at handling long-text tasks such as document summarization and document Q&A, while also being capable of handling general text generation tasks. It performs excellently in the analysis and generation of long texts, effectively handling complex and detailed long-form content processing needs.
hunyuan-lite
250k
6k
Not Supported
Conversation
Tencent_hunyuan
Upgraded to an MOE structure with a 256k context window, leading many open-source models in NLP, code, math, and industry-specific evaluation sets.
hunyuan-pro
28k
4k
Supported
Conversation
Tencent_hunyuan
A trillion-parameter scale MOE-32K long-text model. It achieves an absolute leading level on various benchmarks, with complex instruction and reasoning capabilities, complex mathematical abilities, and supports functioncall. It is specially optimized for applications in multilingual translation, finance, law, and medicine.
hunyuan-role
28k
4k
Not Supported
Conversation
Tencent_hunyuan
Hunyuan's latest role-playing model. This is a role-playing model officially fine-tuned and launched by Hunyuan, based on the Hunyuan model and augmented with role-playing scenario datasets, providing better foundational performance in role-playing scenarios.
hunyuan-standard
30k
2k
Not Supported
Conversation
Tencent_hunyuan
Adopts a better routing strategy, while also alleviating the problems of load balancing and expert convergence. MOE-32K has a relatively higher cost-performance ratio and can handle long text inputs while balancing performance and price.
hunyuan-standard-256K
250k
6k
Not Supported
Conversation
Tencent_hunyuan
Adopts a better routing strategy, while also alleviating the problems of load balancing and expert convergence. For long texts, the "needle in a haystack" metric reaches 99.9%. MOE-256K further breaks through in length and performance, greatly expanding the input length.
hunyuan-translation-lite
4k
4k
Not Supported
Conversation
Tencent_hunyuan
The Hunyuan translation model supports natural language conversational translation; it supports mutual translation between Chinese and 15 languages including English, Japanese, French, Portuguese, Spanish, Turkish, Russian, Arabic, Korean, Italian, German, Vietnamese, Malay, and Indonesian.
hunyuan-turbo
28k
4k
Supported
Conversation
Tencent_hunyuan
The default version of the Hunyuan-turbo model, which uses a new Mixture-of-Experts (MoE) structure, resulting in faster inference efficiency and stronger performance compared to hunyuan-pro.
hunyuan-turbo-latest
28k
4k
Supported
Conversation
Tencent_hunyuan
The dynamically updated version of the Hunyuan-turbo model. It is the best-performing version in the Hunyuan model series, consistent with the C-end (Tencent Yuanbao).
hunyuan-turbo-vision
8k
2k
Supported
Vision, Conversation
Tencent_hunyuan
Hunyuan's new generation flagship visual language model, using a new Mixture-of-Experts (MoE) structure. Its capabilities in basic recognition, content creation, knowledge Q&A, and analysis/reasoning related to image-text understanding are comprehensively improved compared to the previous generation model. Max input 6k, max output 2k.
hunyuan-vision
8k
2k
Supported
Conversation, Vision
Tencent_hunyuan
Hunyuan's latest multimodal model, supporting image + text input to generate text content. Basic Image Recognition: Recognizes subjects, elements, scenes, etc., in images. Image Content Creation: Summarizes images, creates advertising copy, social media posts, poems, etc. Multi-turn Image Dialogue: Engages in multi-turn interactive Q&A about a single image. Image Analysis and Reasoning: Performs statistical analysis on logical relationships, math problems, code, and charts in images. Image Knowledge Q&A: Answers questions about knowledge points contained in images, such as historical events, movie posters. Image OCR: Recognizes text in images from natural life scenes and non-natural scenes.
SparkDesk-Lite
4k
-
Not Supported
Conversation
Spark_SparkDesk
Supports online web search function, with fast and convenient responses, suitable for low-power inference and model fine-tuning and other customized scenarios.
SparkDesk-Max
128k
-
Supported
Conversation
Spark_SparkDesk
Quantized from the latest Spark Large Model Engine 4.0 Turbo. It supports multiple built-in plugins such as web search, weather, and date. Core capabilities are fully upgraded, with universal improvements in application effects across various scenarios. Supports System role persona and FunctionCall.
SparkDesk-Max-32k
32k
-
Supported
Conversation
Spark_SparkDesk
Stronger reasoning: Enhanced context understanding and logical reasoning abilities. Longer input: Supports 32K tokens of text input, suitable for long document reading, private knowledge Q&A, and other scenarios.
SparkDesk-Pro
128k
-
Not Supported
Conversation
Spark_SparkDesk
Specially optimized for scenarios such as math, code, medicine, and education. Supports multiple built-in plugins like web search, weather, and date, covering most knowledge Q&A, language understanding, and text creation scenarios.
SparkDesk-Pro-128K
128k
-
Not Supported
Conversation
Spark_SparkDesk
Professional-grade large language model with tens of billions of parameters. It has been specially optimized for scenarios in medicine, education, and code, with lower latency in search scenarios. Suitable for business scenarios that have higher requirements for performance and response speed, such as text and intelligent Q&A.
moonshot-v1-128k
128k
4k
Supported
Conversation
Moonshot AI_moonshot
A model with a length of 8k, suitable for generating short text.
moonshot-v1-32k
32k
4k
Supported
Conversation
Moonshot AI_moonshot
A model with a length of 32k, suitable for generating long text.
moonshot-v1-8k
8k
4k
Supported
Conversation
Moonshot AI_moonshot
A model with a length of 128k, suitable for generating ultra-long text.
codegeex-4
128k
4k
Not Supported
Conversation, Code
Zhipu_codegeex
Zhipu's code model: suitable for automatic code completion tasks.
charglm-3
4k
2k
Not Supported
Conversation
Zhipu_glm
Persona model.
emohaa
8k
4k
Not Supported
Conversation
Zhipu_glm
Psychology model: possesses professional counseling abilities to help users understand emotions and cope with emotional problems.
glm-3-turbo
128k
4k
Not Supported
Conversation
Zhipu_glm
To be deprecated (June 30, 2025).
glm-4
128k
4k
Supported
Conversation
Zhipu_glm
Old flagship: released on January 16, 2024, now replaced by GLM-4-0520.
glm-4-0520
128k
4k
Supported
Conversation
Zhipu_glm
High-intelligence model: suitable for handling highly complex and diverse tasks.
glm-4-air
128k
4k
Supported
Conversation
Zhipu_glm
High cost-performance: the most balanced model between inference capability and price.
glm-4-airx
8k
4k
Supported
Conversation
Zhipu_glm
Extremely fast inference: has ultra-fast inference speed and powerful inference effects.
glm-4-flash
128k
4k
Supported
Conversation
Zhipu_glm
High speed, low price: ultra-fast inference speed.
glm-4-flashx
128k
4k
Supported
Conversation
Zhipu_glm
High speed, low price: Enhanced version of Flash, ultra-fast inference speed.
glm-4-long
1m
4k
Supported
Conversation
Zhipu_glm
Ultra-long input: specially designed for handling ultra-long text and memory-intensive tasks.
glm-4-plus
128k
4k
Supported
Conversation
Zhipu_glm
High-intelligence flagship: comprehensive performance improvement, with significantly enhanced long-text and complex task capabilities.
glm-4v
2k
-
Not Supported
Conversation, Vision
Zhipu_glm
Image understanding: possesses image understanding and reasoning capabilities.
glm-4v-flash
2k
1k
Not Supported
Conversation, Vision
Zhipu_glm
Free model: possesses powerful image understanding capabilities.
最后更新于
这有帮助吗?