Model Data
This document was translated from Chinese by AI and has not yet been reviewed.
360gpt-pro
8k
-
Not supported
Conversation
360AI_360gpt
The most effective flagship billion-parameter large model in the 360 AI Brain series, widely applicable to complex task scenarios in various fields.
360gpt-turbo
7k
-
Not supported
Conversation
360AI_360gpt
A 10-billion-parameter large model that balances performance and effectiveness, suitable for scenarios with high performance/cost requirements.
360gpt-turbo-responsibility-8k
8k
-
Not supported
Conversation
360AI_360gpt
A 10-billion-parameter large model that balances performance and effectiveness, suitable for scenarios with high performance/cost requirements.
360gpt2-pro
8k
-
Not supported
Conversation
360AI_360gpt
The most effective flagship billion-parameter large model in the 360 AI Brain series, widely applicable to complex task scenarios in various fields.
claude-3-5-sonnet-20240620
200k
16k
Not supported
Conversation, Image recognition
Anthropic_claude
A snapshot version released on June 20, 2024. Claude 3.5 Sonnet is a model that balances performance and speed, offering top-tier performance while maintaining high speed, and supports multimodal input.
claude-3-5-haiku-20241022
200k
16k
Not supported
Conversation
Anthropic_claude
A snapshot version released on October 22, 2024. Claude 3.5 Haiku has improved in various skills, including coding, tool use, and reasoning. As the fastest model in the Anthropic series, it provides rapid response times, suitable for applications requiring high interactivity and low latency, such as user-facing chatbots and instant code completion. It also excels in specialized tasks like data extraction and real-time content moderation, making it a versatile tool for various industries. It does not support image input.
claude-3-5-sonnet-20241022
200k
8K
Not supported
Conversation, Image recognition
Anthropic_claude
A snapshot version released on October 22, 2024. Claude 3.5 Sonnet offers capabilities beyond Opus and faster speed than Sonnet, while maintaining the same price as Sonnet. Sonnet is particularly good at programming, data science, visual processing, and agent tasks.
claude-3-5-sonnet-latest
200K
8k
Not supported
Conversation, Image recognition
Anthropic_claude
Dynamically points to the latest Claude 3.5 Sonnet version. Claude 3.5 Sonnet offers capabilities beyond Opus and faster speed than Sonnet, while maintaining the same price as Sonnet. Sonnet is particularly good at programming, data science, visual processing, and agent tasks. This model points to the latest version.
claude-3-haiku-20240307
200k
4k
Not supported
Conversation, Image recognition
Anthropic_claude
Claude 3 Haiku is Anthropic's fastest and most compact model, designed for near-instant responses. It offers fast and accurate targeted performance.
claude-3-opus-20240229
200k
4k
Not supported
Conversation, Image recognition
Anthropic_claude
Claude 3 Opus is Anthropic's most powerful model for handling highly complex tasks. It excels in performance, intelligence, fluency, and understanding.
claude-3-sonnet-20240229
200k
8k
Not supported
Conversation, Image recognition
Anthropic_claude
A snapshot version released on February 29, 2024. Sonnet particularly excels at: - Coding: Ability to autonomously write, edit, and run code, with reasoning and troubleshooting capabilities. - Data Science: Enhances human data science expertise; able to process unstructured data when acquiring insights using multiple tools. - Visual Processing: Proficient at interpreting charts, graphs, and images, accurately transcribing text to gain insights beyond the text itself. - Agent Tasks: Excellent tool usage, very suitable for handling agent tasks (i.e., complex multi-step problem-solving tasks requiring interaction with other systems).
google/gemma-2-27b-it
8k
-
Not supported
Conversation
Google_gamma
Gemma is a family of lightweight, state-of-the-art open models developed by Google, built with the same research and technology used for the Gemini models. These models are decoder-only large language models that support English, offering open weights in both pre-trained and instruction-tuned variants. Gemma models are suitable for various text generation tasks, including question answering, summarization, and reasoning.
google/gemma-2-9b-it
8k
-
Not supported
Conversation
Google_gamma
Gemma is one of the lightweight, state-of-the-art open model families developed by Google. It is a decoder-only large language model that supports English, offering open weights, pre-trained variants, and instruction-tuned variants. Gemma models are suitable for various text generation tasks, including question answering, summarization, and reasoning. This 9B model was trained with 8 trillion tokens.
gemini-1.5-pro
2m
8k
Not supported
Conversation
Google_gemini
The latest stable version of Gemini 1.5 Pro. As a powerful multimodal model, it can process up to 60,000 lines of code or 2,000 pages of text. Particularly suitable for tasks requiring complex reasoning.
gemini-1.0-pro-001
33k
8k
Not supported
Conversation
Google_gemini
This is the stable version of Gemini 1.0 Pro. As an NLP model, it specializes in handling multi-turn text and code chat as well as code generation tasks. This model will be deprecated on February 15, 2025. It is recommended to migrate to the 1.5 series models.
gemini-1.0-pro-002
32k
8k
Not supported
Conversation
Google_gemini
This is the stable version of Gemini 1.0 Pro. As an NLP model, it specializes in handling multi-turn text and code chat as well as code generation tasks. This model will be deprecated on February 15, 2025. It is recommended to migrate to the 1.5 series models.
gemini-1.0-pro-latest
33k
8k
Not supported
Conversation, Deprecated or soon to be deprecated
Google_gemini
This is the latest version of Gemini 1.0 Pro. As an NLP model, it specializes in handling multi-turn text and code chat as well as code generation tasks. This model will be deprecated on February 15, 2025. It is recommended to migrate to the 1.5 series models.
gemini-1.0-pro-vision-001
16k
2k
Not supported
Conversation
Google_gemini
This is the vision version of Gemini 1.0 Pro. This model will be deprecated on February 15, 2025. It is recommended to migrate to the 1.5 series models.
gemini-1.0-pro-vision-latest
16k
2k
Not supported
Image recognition
Google_gemini
This is the latest vision version of Gemini 1.0 Pro. This model will be deprecated on February 15, 2025. It is recommended to migrate to the 1.5 series models.
gemini-1.5-flash
1m
8k
Not supported
Conversation, Image recognition
Google_gemini
This is the latest stable version of Gemini 1.5 Flash. As a balanced multimodal model, it can process audio, images, video, and text inputs.
gemini-1.5-flash-001
1m
8k
Not supported
Conversation, Image recognition
Google_gemini
This is the stable version of Gemini 1.5 Flash. They offer the same basic functions as gemini-1.5-flash, but with a fixed version, suitable for production environments.
gemini-1.5-flash-002
1m
8k
Not supported
Conversation, Image recognition
Google_gemini
This is the stable version of Gemini 1.5 Flash. They offer the same basic functions as gemini-1.5-flash, but with a fixed version, suitable for production environments.
gemini-1.5-flash-8b
1m
8k
Not supported
Conversation, Image recognition
Google_gemini
Gemini 1.5 Flash-8B is a new multimodal AI model from Google, designed for efficient processing of large-scale tasks. With 8 billion parameters, it supports text, image, audio, and video input, suitable for various applications such as chat, transcription, and translation. Compared to other Gemini models, Flash-8B is optimized for speed and cost-effectiveness, particularly appealing to cost-sensitive users. Its rate limit has doubled, enabling developers to process large-scale tasks more efficiently. Furthermore, Flash-8B uses "knowledge distillation" technology to extract key knowledge from larger models, ensuring lightweight and efficient performance while maintaining core capabilities.
gemini-1.5-flash-exp-0827
1m
8k
Not supported
Conversation, Image recognition
Google_gemini
This is an experimental version of Gemini 1.5 Flash, regularly updated to include the latest improvements. Suitable for exploratory testing and prototype development, not recommended for production environments.
gemini-1.5-flash-latest
1m
8k
Not supported
Conversation, Image recognition
Google_gemini
This is the cutting-edge version of Gemini 1.5 Flash, regularly updated to include the latest improvements. Suitable for exploratory testing and prototype development, not recommended for production environments.
gemini-1.5-pro-001
2m
8k
Not supported
Conversation, Image recognition
Google_gemini
This is the stable version of Gemini 1.5 Pro, offering fixed model behavior and performance characteristics. Suitable for production environments requiring stability.
gemini-1.5-pro-002
2m
8k
Not supported
Conversation, Image recognition
Google_gemini
This is the stable version of Gemini 1.5 Pro, offering fixed model behavior and performance characteristics. Suitable for production environments requiring stability.
gemini-1.5-pro-exp-0801
2m
8k
Not supported
Conversation, Image recognition
Google_gemini
Experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can process up to 60,000 lines of code or 2,000 pages of text. Particularly suitable for tasks requiring complex reasoning.
gemini-1.5-pro-exp-0827
2m
8k
Not supported
Conversation, Image recognition
Google_gemini
Experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can process up to 60,000 lines of code or 2,000 pages of text. Particularly suitable for tasks requiring complex reasoning.
gemini-1.5-pro-latest
2m
8k
Not supported
Conversation, Image recognition
Google_gemini
This is the latest version of Gemini 1.5 Pro, dynamically pointing to the latest snapshot version.
gemini-2.0-flash
1m
8k
Not supported
Conversation, Image recognition
Google_gemini
Gemini 2.0 Flash is Google's newly launched model, offering faster first token generation speed (TTFT) compared to version 1.5, while maintaining a quality level comparable to Gemini Pro 1.5; this model has significantly improved in multimodal understanding, code capabilities, complex instruction execution, and function calling, thus providing a smoother and more powerful intelligent experience.
gemini-2.0-flash-exp
100k
8k
Supported
Conversation, Image recognition
Google_gemini
Gemini 2.0 Flash introduces multimodal real-time APIs, improved speed and performance, enhanced quality, augmented agent capabilities, and added image generation and speech conversion features.
gemini-2.0-flash-lite-preview-02-05
1M
8k
Not supported
Conversation, Image recognition
Google_gemini
Gemini 2.0 Flash-Lite is Google's newly released cost-effective AI model, offering better quality while maintaining the same speed as 1.5 Flash; it supports a 1 million token context window and can handle multimodal tasks such as images, audio, and code; as Google's most cost-effective model, it adopts a simplified single pricing strategy, particularly suitable for large-scale applications that need to control costs.
gemini-2.0-flash-thinking-exp
40k
8k
Not supported
Conversation, Reasoning
Google_gemini
gemini-2.0-flash-thinking-exp is an experimental model that can generate the "thought process" it undergoes when responding. Therefore, responses in "thinking mode" exhibit stronger reasoning capabilities compared to the basic Gemini 2.0 Flash model.
gemini-2.0-flash-thinking-exp-01-21
1m
64k
Not supported
Conversation, Reasoning
Google_gemini
Gemini 2.0 Flash Thinking EXP-01-21 is Google's latest AI model, focusing on improving reasoning capabilities and user interaction experience. This model has strong reasoning capabilities, especially excelling in mathematics and programming, and supports a context window of up to 1 million tokens, suitable for complex tasks and in-depth analysis scenarios. Its unique feature is the ability to generate thought processes, improving the comprehensibility of AI thinking, while also supporting native code execution, enhancing the flexibility and practicality of interaction. Through optimized algorithms, the model reduces logical inconsistencies, further improving the accuracy and consistency of answers.
gemini-2.0-flash-thinking-exp-1219
40k
8k
Not supported
Conversation, Reasoning, Image recognition
Google_gemini
gemini-2.0-flash-thinking-exp-1219 is an experimental model that can generate the "thought process" it undergoes when responding. Therefore, responses in "thinking mode" exhibit stronger reasoning capabilities compared to the basic Gemini 2.0 Flash model.
gemini-2.0-pro-exp-01-28
2m
64k
Not supported
Conversation, Image recognition
Google_gemini
Pre-added model, not yet launched.
gemini-2.0-pro-exp-02-05
2m
8k
Not supported
Conversation, Image recognition
Google_gemini
Gemini 2.0 Pro Exp 02-05 is Google's latest experimental model released in February 2024, excelling in world knowledge, code generation, and long text understanding; this model supports an ultra-long context window of 2 million tokens, capable of processing 2 hours of video, 22 hours of audio, over 60,000 lines of code, and over 1.4 million words of content; as part of the Gemini 2.0 series, this model uses a new Flash Thinking training strategy, significantly improving performance and ranking high in multiple LLM scoring lists, demonstrating strong comprehensive capabilities.
gemini-exp-1114
8k
4k
Not supported
Conversation, Image recognition
Google_gemini
This is an experimental model, released on November 14, 2024, primarily focused on quality improvements.
gemini-exp-1121
8k
4k
Not supported
Conversation, Image recognition, Code
Google_gemini
This is an experimental model, released on November 21, 2024, with improved coding, reasoning, and visual capabilities.
gemini-exp-1206
8k
4k
Not supported
Conversation, Image recognition
Google_gemini
This is an experimental model, released on December 6, 2024, with improved coding, reasoning, and visual capabilities.
gemini-exp-latest
8k
4k
Not supported
Conversation, Image recognition
Google_gemini
This is an experimental model, dynamically pointing to the latest version.
gemini-pro
33k
8k
Not supported
Conversation
Google_gemini
Same as gemini-1.0-pro, an alias for gemini-1.0-pro.
gemini-pro-vision
16k
2k
Not supported
Conversation, Image recognition
Google_gemini
This is the vision version of Gemini 1.0 Pro. This model will be deprecated on February 15, 2025. It is recommended to migrate to the 1.5 series models.
grok-2
128k
-
Not supported
Conversation
Grok_grok
New version of the Grok model released by X.ai on 2024.12.12.
grok-2-1212
128k
-
Not supported
Conversation
Grok_grok
New version of the Grok model released by X.ai on 2024.12.12.
grok-2-latest
128k
-
Not supported
Conversation
Grok_grok
New version of the Grok model released by X.ai on 2024.12.12.
grok-2-vision-1212
32k
-
Not supported
Conversation, Image recognition
Grok_grok
Vision version of the Grok model released by X.ai on 2024.12.12.
grok-beta
100k
-
Not supported
Conversation
Grok_grok
Performance comparable to Grok 2, but with improved efficiency, speed, and functionality.
grok-vision-beta
8k
-
Not supported
Conversation, Image recognition
Grok_grok
The latest image understanding model can process various visual information, including documents, charts, screenshots, and photos.
internlm/internlm2_5-20b-chat
32k
-
Supported
Conversation
internlm
InternLM2.5-20B-Chat is an open-source large-scale conversational model developed based on the InternLM2 architecture. This model has 20 billion parameters and excels in mathematical reasoning, outperforming Llama3 and Gemma2-27B models of similar size. InternLM2.5-20B-Chat has significantly improved tool-calling capabilities, supporting information collection from hundreds of web pages for analysis and reasoning, and possessing stronger instruction understanding, tool selection, and result reflection capabilities.
meta-llama/Llama-3.2-11B-Vision-Instruct
8k
-
Not supported
Conversation, Image recognition
Meta_llama
Currently, the Llama series models can not only process text data but also image data; some models of Llama3.2 have added visual understanding capabilities. This model supports simultaneous input of text and image data, understanding images, and outputting text information.
meta-llama/Llama-3.2-3B-Instruct
32k
-
Not supported
Conversation
Meta_llama
Meta Llama 3.2 multilingual large language model (LLM), where 1B and 3B are lightweight models that can run on edge and mobile devices. This model is the 3B version.
meta-llama/Llama-3.2-90B-Vision-Instruct
8k
-
Not supported
Conversation, Image recognition
Meta_llama
Currently, the Llama series models can not only process text data but also image data; some models of Llama3.2 have added visual understanding capabilities. This model supports simultaneous input of text and image data, understanding images, and outputting text information.
meta-llama/Llama-3.3-70B-Instruct
131k
-
Not supported
Conversation
Meta_llama
Meta's latest 70B LLM, with performance comparable to Llama 3.1 405B.
meta-llama/Meta-Llama-3.1-405B-Instruct
32k
-
Not supported
Conversation
Meta_llama
The Meta Llama 3.1 multilingual large language model (LLM) collection is a set of pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. This model is the 405B version. Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual conversations and outperform many available open-source and closed-source chat models on common industry benchmarks.
meta-llama/Meta-Llama-3.1-70B-Instruct
32k
-
Not supported
Conversation
Meta_llama
Meta Llama 3.1 is a family of multilingual large language models developed by Meta, including pre-trained and instruction-tuned variants with 8B, 70B, and 405B parameters. This 70B instruction-tuned model is optimized for multilingual conversation scenarios and performs excellently across multiple industry benchmarks. The model was trained using over 15 trillion tokens of publicly available data and employs techniques such as supervised fine-tuning and human feedback reinforcement learning to enhance its usefulness and safety.
meta-llama/Meta-Llama-3.1-8B-Instruct
32k
-
Not supported
Conversation
Meta_llama
The Meta Llama 3.1 multilingual large language model (LLM) collection is a set of pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. This model is the 8B version. Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual conversations and outperform many available open-source and closed-source chat models on common industry benchmarks.
abab5.5-chat
16k
-
Supported
Conversation
Minimax_abab
Chinese persona-based dialogue scenarios
abab5.5s-chat
8k
-
Supported
Conversation
Minimax_abab
Chinese persona-based dialogue scenarios
abab6.5g-chat
8k
-
Supported
Conversation
Minimax_abab
English and other multilingual persona-based dialogue scenarios
abab6.5s-chat
245k
-
Supported
Conversation
Minimax_abab
General scenarios
abab6.5t-chat
8k
-
Supported
Conversation
Minimax_abab
Chinese persona-based dialogue scenarios
chatgpt-4o-latest
128k
16k
Not supported
Conversation, Image recognition
OpenAI
The chatgpt-4o-latest model version continuously points to the GPT-4o version used in ChatGPT and updates as quickly as possible when significant changes occur.
gpt-4o-2024-11-20
128k
16k
Supported
Conversation
OpenAI
The latest gpt-4o snapshot version from November 20, 2024.
gpt-4o-audio-preview
128k
16k
Not supported
Conversation
OpenAI
OpenAI's real-time speech conversation model.
gpt-4o-audio-preview-2024-10-01
128k
16k
Supported
Conversation
OpenAI
OpenAI's real-time speech conversation model.
o1
128k
32k
Not supported
Conversation, Reasoning, Image recognition
OpenAI
OpenAI's new reasoning model for complex tasks requiring extensive common sense. This model has a 200k context, is currently the strongest model globally, and supports image recognition.
o1-mini-2024-09-12
128k
64k
Not supported
Conversation, Reasoning
OpenAI
Fixed snapshot version of o1-mini, smaller and faster than o1-preview, 80% cheaper, performs well in code generation and small context operations.
o1-preview-2024-09-12
128k
32k
Not supported
Conversation, Reasoning
OpenAI
Fixed snapshot version of o1-preview.
gpt-3.5-turbo
16k
4k
Supported
Conversation
OpenAI_gpt-3
Based on GPT-3.5: GPT-3.5 Turbo is an improved version built upon the GPT-3.5 model, developed by OpenAI. Performance Goals: Designed to improve the model's inference speed, processing efficiency, and resource utilization by optimizing its structure and algorithms. Enhanced Inference Speed: Compared to GPT-3.5, GPT-3.5 Turbo typically offers faster inference speeds under the same hardware conditions, which is particularly beneficial for applications requiring large-scale text processing. Higher Throughput: When processing a large number of requests or data, GPT-3.5 Turbo can achieve higher concurrent processing capabilities, thereby improving overall system throughput. Optimized Resource Consumption: While maintaining performance, it may reduce hardware resource requirements (such as memory and computing resources), which helps lower operating costs and increase system scalability. Broad Natural Language Processing Tasks: GPT-3.5 Turbo is suitable for various natural language processing tasks, including but not limited to text generation, semantic understanding, dialogue systems, machine translation, etc. Developer Tools and API Support: Provides easy-to-integrate API interfaces for developers, supporting rapid application development and deployment.
gpt-3.5-turbo-0125
16k
4k
Supported
Conversation
OpenAI_gpt-3
An updated GPT 3.5 Turbo, with higher accuracy in response request formats and a fix for a bug causing non-English function call text encoding issues. Returns up to 4,096 output tokens.
gpt-3.5-turbo-0613
16k
4k
Supported
Conversation
OpenAI_gpt-3
Updated GPT 3.5 Turbo fixed snapshot version. Currently deprecated.
gpt-3.5-turbo-1106
16k
4k
Supported
Conversation
OpenAI_gpt-3
Features improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns up to 4,096 output tokens.
gpt-3.5-turbo-16k
16k
4k
Supported
Conversation, Deprecated or soon to be deprecated
OpenAI_gpt-3
(Deprecated)
gpt-3.5-turbo-16k-0613
16k
4k
Supported
Conversation, Deprecated or soon to be deprecated
OpenAI_gpt-3
Snapshot of gpt-3.5-turbo from June 13, 2023. (Deprecated)
gpt-3.5-turbo-instruct
4k
4k
Supported
Conversation
OpenAI_gpt-3
Capabilities similar to GPT-3 era models. Compatible with the legacy Completions endpoint, not for Chat Completions.
gpt-3.5o
16k
4k
Not supported
Conversation
OpenAI_gpt-3
Same as gpt-4o-lite.
gpt-4
8k
8k
Supported
Conversation
OpenAI_gpt-4
Currently points to gpt-4-0613.
gpt-4-0125-preview
128k
4k
Supported
Conversation
OpenAI_gpt-4
The latest GPT-4 model, designed to reduce "laziness" where the model doesn't complete tasks. Returns up to 4,096 output tokens.
gpt-4-0314
8k
8k
Supported
Conversation
OpenAI_gpt-4
Snapshot of gpt-4 from March 14, 2023.
gpt-4-0613
8k
8k
Supported
Conversation
OpenAI_gpt-4
Snapshot of gpt-4 from June 13, 2023, with enhanced function calling support.
gpt-4-1106-preview
128k
4k
Supported
Conversation
OpenAI_gpt-4
GPT-4 Turbo model with improved instruction following, JSON mode, reproducible outputs, function calling, etc. Returns up to 4,096 output tokens. This is a preview model.
gpt-4-32k
32k
4k
Supported
Conversation
OpenAI_gpt-4
gpt-4-32k will be deprecated on 2025-06-06.
gpt-4-32k-0613
32k
4k
Supported
Conversation, Deprecated or soon to be deprecated
OpenAI_gpt-4
Will be deprecated on 2025-06-06.
gpt-4-turbo
128k
4k
Supported
Conversation
OpenAI_gpt-4
The latest version of the GPT-4 Turbo model adds visual capabilities and supports visual requests via JSON mode and function calling. The current version of this model is gpt-4-turbo-2024-04-09.
gpt-4-turbo-2024-04-09
128k
4k
Supported
Conversation
OpenAI_gpt-4
GPT-4 Turbo model with visual capabilities. Visual requests can now be handled via JSON mode and function calling. The current version of gpt-4-turbo is this version.
gpt-4-turbo-preview
128k
4k
Supported
Conversation, Image recognition
OpenAI_gpt-4
Currently points to gpt-4-0125-preview.
gpt-4o
128k
16k
Supported
Conversation, Image recognition
OpenAI_gpt-4
OpenAI's highly intelligent flagship model, suitable for complex multi-step tasks. GPT-4o is cheaper and faster than GPT-4 Turbo.
gpt-4o-2024-05-13
128k
4k
Supported
Conversation, Image recognition
OpenAI_gpt-4
The original gpt-4o snapshot from May 13, 2024.
gpt-4o-2024-08-06
128k
16k
Supported
Conversation, Image recognition
OpenAI_gpt-4
The first snapshot supporting structured output. gpt-4o currently points to this version.
gpt-4o-mini
128k
16k
Supported
Conversation, Image recognition
OpenAI_gpt-4
OpenAI's affordable gpt-4o version for fast, lightweight tasks. GPT-4o mini is cheaper and more powerful than GPT-3.5 Turbo. Currently points to gpt-4o-mini-2024-07-18.
gpt-4o-mini-2024-07-18
128k
16k
Supported
Conversation, Image recognition
OpenAI_gpt-4
Fixed snapshot version of gpt-4o-mini.
gpt-4o-realtime-preview
128k
4k
Supported
Conversation, Real-time speech
OpenAI_gpt-4
OpenAI's real-time speech conversation model.
gpt-4o-realtime-preview-2024-10-01
128k
4k
Supported
Conversation, Real-time speech, Image recognition
OpenAI_gpt-4
gpt-4o-realtime-preview currently points to this snapshot version.
o1-mini
128k
64k
Not supported
Conversation, Reasoning
OpenAI_o1
Smaller and faster than o1-preview, 80% cheaper, performs well in code generation and small context operations.
o1-preview
128k
32k
Not supported
Conversation, Reasoning
OpenAI_o1
o1-preview is a new reasoning model for complex tasks requiring extensive common sense. This model has a 128K context and a knowledge cutoff of October 2023. Focuses on advanced reasoning and solving complex problems, including mathematical and scientific tasks. Ideal for applications requiring deep contextual understanding and autonomous workflows.
o3-mini
200k
100k
Supported
Conversation, Reasoning
OpenAI_o1
o3-mini is OpenAI's latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on science, mathematics, and coding tasks, supports structured output, function calling, batch API, and other developer features, with a knowledge base cutoff of October 2023, demonstrating a significant balance between reasoning capabilities and cost-effectiveness.
o3-mini-2025-01-31
200k
100k
Supported
Conversation, Reasoning
OpenAI_o1
o3-mini currently points to this version. o3-mini-2025-01-31 is OpenAI's latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on science, mathematics, and coding tasks, supports structured output, function calling, batch API, and other developer features, with a knowledge base cutoff of October 2023, demonstrating a significant balance between reasoning capabilities and cost-effectiveness.
Baichuan2-Turbo
32k
-
Not supported
Conversation
百川_baichuan
Compared to equivalent models in the industry, the model's performance remains leading while significantly reducing prices.
Baichuan3-Turbo
32k
-
Not supported
Conversation
百川_baichuan
Compared to equivalent models in the industry, the model's performance remains leading while significantly reducing prices.
Baichuan3-Turbo-128k
128k
-
Not supported
Conversation
百川_baichuan
Baichuan model processes complex text with a 128k ultra-long context window, specifically optimized for industries like finance, and significantly reduces costs while maintaining high performance, providing cost-effective solutions for enterprises.
Baichuan4
32k
-
Not supported
Conversation
百川_baichuan
Baichuan's MoE model provides efficient and cost-effective solutions for enterprise applications through specialized optimization, cost reduction, and performance enhancement.
Baichuan4-Air
32k
-
Not supported
Conversation
百川_baichuan
Baichuan's MoE model provides efficient and cost-effective solutions for enterprise applications through specialized optimization, cost reduction, and performance enhancement.
Baichuan4-Turbo
32k
-
Not supported
Conversation
百川_baichuan
Trained on massive amounts of high-quality scenario data, the availability in high-frequency enterprise scenarios is improved by 10%+ compared to Baichuan4, information summarization by 50%, multilingual capability by 31%, and content generation by 13%. Specialized optimization for inference performance, first token response speed increased by 51% compared to Baichuan4, and token streaming speed increased by 73%.
ERNIE-3.5-128K
128k
4k
Supported
Conversation
百度_ernie
Baidu's self-developed flagship large language model, covering massive Chinese and English corpora, with powerful general capabilities, meeting most requirements for dialogue Q&A, creative generation, and plugin application scenarios; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.
ERNIE-3.5-8K
8k
1k
Supported
Conversation
百度_ernie
Baidu's self-developed flagship large language model, covering massive Chinese and English corpora, with powerful general capabilities, meeting most requirements for dialogue Q&A, creative generation, and plugin application scenarios; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.
ERNIE-3.5-8K-Preview
8k
1k
Supported
Conversation
百度_ernie
Baidu's self-developed flagship large language model, covering massive Chinese and English corpora, with powerful general capabilities, meeting most requirements for dialogue Q&A, creative generation, and plugin application scenarios; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.
ERNIE-4.0-8K
8k
1k
Supported
Conversation
百度_ernie
Baidu's self-developed flagship ultra-large language model, achieving a comprehensive upgrade in model capabilities compared to ERNIE 3.5, widely applicable to complex task scenarios in various fields; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.
ERNIE-4.0-8K-Latest
8k
2k
Supported
Conversation
百度_ernie
ERNIE-4.0-8K-Latest offers comprehensive capability improvements compared to ERNIE-4.0-8K, with significant enhancements in role-playing and instruction following capabilities; it achieves a full upgrade in model capabilities compared to ERNIE 3.5, widely applicable to complex task scenarios in various fields; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information, supporting 5K tokens input + 2K tokens output. This document introduces the ERNIE-4.0-8K-Latest API calling method.
ERNIE-4.0-8K-Preview
8k
1k
Supported
Conversation
百度_ernie
Baidu's self-developed flagship ultra-large language model, achieving a comprehensive upgrade in model capabilities compared to ERNIE 3.5, widely applicable to complex task scenarios in various fields; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.
ERNIE-4.0-Turbo-128K
128k
4k
Supported
Conversation
百度_ernie
ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large language model, demonstrating excellent overall performance, widely applicable to complex task scenarios in various fields; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information. Compared to ERNIE 4.0, it performs better. ERNIE-4.0-Turbo-128K is a version of the model, and its long document performance is superior to ERNIE-3.5-128K. This document introduces the relevant APIs and usage.
ERNIE-4.0-Turbo-8K
8k
2k
Supported
Conversation
百度_ernie
ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large language model, demonstrating excellent overall performance, widely applicable to complex task scenarios in various fields; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information. Compared to ERNIE 4.0, it performs better. ERNIE-4.0-Turbo-8K is a version of the model. This document introduces the relevant APIs and usage.
ERNIE-4.0-Turbo-8K-Latest
8k
2k
Supported
Conversation
百度_ernie
ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large language model, demonstrating excellent overall performance, widely applicable to complex task scenarios in various fields; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information. Compared to ERNIE 4.0, it performs better. ERNIE-4.0-Turbo-8K is a version of the model.
ERNIE-4.0-Turbo-8K-Preview
8k
2k
Supported
Conversation
百度_ernie
ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large language model, demonstrating excellent overall performance, widely applicable to complex task scenarios in various fields; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information. ERNIE-4.0-Turbo-8K-Preview is a version of the model.
ERNIE-Character-8K
8k
1k
Not supported
Conversation
百度_ernie
Baidu's self-developed vertical-scenario large language model, suitable for applications such as game NPCs, customer service dialogues, and dialogue role-playing. It features a more distinct and consistent persona style, stronger instruction following, and superior reasoning performance.
ERNIE-Lite-8K
8k
4k
Not supported
Conversation
百度_ernie
Baidu's self-developed lightweight large language model, balancing excellent model performance with inference efficiency, suitable for inference on low-compute AI accelerator cards.
ERNIE-Lite-Pro-128K
128k
2k
Supported
Conversation
百度_ernie
Baidu's self-developed lightweight large language model, performing better than ERNIE Lite, balancing excellent model performance with inference efficiency, suitable for inference on low-compute AI accelerator cards. ERNIE-Lite-Pro-128K supports a 128K context length and performs better than ERNIE-Lite-128K.
ERNIE-Novel-8K
8k
2k
Not supported
Conversation
百度_ernie
ERNIE-Novel-8K is Baidu's self-developed general-purpose large language model, with a significant advantage in novel continuation capabilities, and can also be used in scenarios such as short dramas and movies.
ERNIE-Speed-128K
128k
4k
Not supported
Conversation
百度_ernie
Baidu's newly released self-developed high-performance large language model in 2024. It has excellent general capabilities, suitable for fine-tuning as a base model to better handle specific scenario problems, and possesses excellent inference performance.
ERNIE-Speed-8K
8k
1k
Not supported
Conversation
百度_ernie
Baidu's newly released self-developed high-performance large language model in 2024. It has excellent general capabilities, suitable for fine-tuning as a base model to better handle specific scenario problems, and possesses excellent inference performance.
ERNIE-Speed-Pro-128K
128k
4k
Not supported
Conversation
百度_ernie
ERNIE Speed Pro is Baidu's newly released self-developed high-performance large language model in 2024. It has excellent general capabilities, suitable for fine-tuning as a base model to better handle specific scenario problems, and possesses excellent inference performance. ERNIE-Speed-Pro-128K is the initial version released on August 30, 2024, supporting a 128K context length and performing better than ERNIE-Speed-128K.
ERNIE-Tiny-8K
8k
1k
Not supported
Conversation
百度_ernie
Baidu's self-developed ultra-high-performance large language model, with the lowest deployment and fine-tuning costs among the Wenxin series models.
Doubao-1.5-lite-32k
32k
12k
Supported
Conversation
豆包_doubao
Doubao1.5-lite is also a world-class lightweight language model, with performance on par with or surpassing GPT-4omini and Claude 3.5 Haiku in comprehensive (MMLU_pro), reasoning (BBH), mathematics (MATH), and professional knowledge (GPQA) authoritative evaluation metrics.
Doubao-1.5-pro-256k
256k
12k
Supported
Conversation
豆包_doubao
Doubao-1.5-Pro-256k, a fully upgraded version based on Doubao-1.5-Pro. Compared to Doubao-pro-256k/241115, overall performance has improved by 10%. Output length has significantly increased, supporting a maximum of 12k tokens.
Doubao-1.5-pro-32k
32k
12k
Supported
Conversation
豆包_doubao
Doubao-1.5-pro, a new generation flagship model with comprehensively upgraded performance, excelling in knowledge, code, reasoning, and other aspects. It achieves world-leading levels on multiple public evaluation benchmarks, especially ranking best in knowledge, code, reasoning, and Chinese authoritative evaluation benchmarks, with an overall score superior to industry-leading models like GPT4o and Claude 3.5 Sonnet.
Doubao-1.5-vision-pro
32k
12k
Not supported
Conversation, Image recognition
豆包_doubao
Doubao-1.5-vision-pro, a newly upgraded multimodal large model, supports image recognition of arbitrary resolutions and extreme aspect ratios, enhancing visual reasoning, document recognition, detailed information understanding, and instruction following capabilities.
Doubao-embedding
4k
-
Supported
Embedding
豆包_doubao
Doubao-embedding is a semantic vectorization model developed by ByteDance, primarily for vector retrieval scenarios, supporting Chinese and English with a maximum context length of 4K. Currently, the following versions are available: text-240715: Highest dimension vector 2560, supports 512, 1024, 2048 dimensionality reduction. Chinese and English Retrieval performance significantly improved compared to text-240515 version, this version is recommended. text-240515: Highest dimension vector 2048, supports 512, 1024 dimensionality reduction.
Doubao-embedding-large
4k
-
Not supported
Embedding
豆包_doubao
Chinese and English Retrieval performance significantly improved compared to Doubao-embedding/text-240715 version.
Doubao-embedding-vision
8k
-
Not supported
Embedding
豆包_doubao
Doubao-embedding-vision, a newly upgraded image-text multimodal vectorization model, primarily for image-text multimodal vector retrieval scenarios, supporting image input and Chinese/English text input, with a maximum context length of 8K.
Doubao-lite-128k
128k
4k
Supported
Conversation
豆包_doubao
Doubao-lite offers extreme response speed and better cost-effectiveness, providing more flexible choices for customers' different scenarios. Supports inference and fine-tuning with a 128k context window.
Doubao-lite-32k
32k
4k
Supported
Conversation
豆包_doubao
Doubao-lite offers extreme response speed and better cost-effectiveness, providing more flexible choices for customers' different scenarios. Supports inference and fine-tuning with a 32k context window.
Doubao-lite-4k
4k
4k
Supported
Conversation
豆包_doubao
Doubao-lite offers extreme response speed and better cost-effectiveness, providing more flexible choices for customers' different scenarios. Supports inference and fine-tuning with a 4k context window.
Doubao-pro-128k
128k
4k
Supported
Conversation
豆包_doubao
The best performing flagship model, suitable for complex tasks, with excellent results in reference Q&A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 128k context window.
Doubao-pro-32k
32k
4k
Supported
Conversation
豆包_doubao
The best performing flagship model, suitable for complex tasks, with excellent results in reference Q&A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 32k context window.
Doubao-pro-4k
4k
4k
Supported
Conversation
豆包_doubao
The best performing flagship model, suitable for complex tasks, with excellent results in reference Q&A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 4k context window.
step-1-128k
128k
-
Supported
Conversation
阶跃星辰
The step-1-128k model is an ultra-large language model capable of processing inputs up to 128,000 tokens. This capability gives it significant advantages in generating long-form content and performing complex reasoning, making it suitable for applications requiring rich context, such as writing novels and scripts.
step-1-256k
256k
-
Supported
Conversation
阶跃星辰
The step-1-256k model is one of the largest language models currently available, supporting inputs of 256,000 tokens. It is designed to meet extremely complex task requirements, such as large-scale data analysis and multi-turn dialogue systems, and can provide high-quality outputs across various domains.
step-1-32k
32k
-
Supported
Conversation
阶跃星辰
The step-1-32k model extends the context window, supporting inputs of 32,000 tokens. This makes it perform exceptionally well when processing long articles and complex conversations, suitable for tasks requiring deep understanding and analysis, such as legal documents and academic research.
step-1-8k
8k
-
Supported
Conversation
阶跃星辰
The step-1-8k model is an efficient language model designed specifically for processing shorter texts. It can perform reasoning within an 8,000-token context, making it suitable for applications requiring rapid responses, such as chatbots and real-time translation.
step-1-flash
8k
-
Supported
Conversation
阶跃星辰
The step-1-flash model focuses on rapid response and efficient processing, suitable for real-time applications. Its design allows it to provide high-quality language understanding and generation capabilities even with limited computing resources, making it suitable for mobile devices and edge computing scenarios.
step-1.5v-mini
32k
-
Supported
Conversation, Image recognition
阶跃星辰
The step-1.5v-mini model is a lightweight version designed to run in resource-constrained environments. Despite its small size, it retains good language processing capabilities, making it suitable for embedded systems and low-power devices.
step-1v-32k
32k
-
Supported
Conversation, Image recognition
阶跃星辰
The step-1v-32k model supports inputs of 32,000 tokens, suitable for applications requiring longer contexts. It performs excellently in handling complex conversations and long texts, making it suitable for areas such as customer service and content creation.
step-1v-8k
8k
-
Supported
Conversation, Image recognition
阶跃星辰
The step-1v-8k model is an optimized version designed for 8,000-token inputs, suitable for rapid generation and processing of short texts. It strikes a good balance between speed and accuracy, making it suitable for real-time applications.
step-2-16k
16k
-
Supported
Conversation
阶跃星辰
The step-2-16k model is a medium-sized language model that supports inputs of 16,000 tokens. It performs well in various tasks and is suitable for applications such as education, training, and knowledge management.
yi-lightning
16k
-
Supported
Conversation
零一万物_yi
Latest high-performance model, ensuring high-quality output while significantly increasing inference speed. Suitable for real-time interaction, complex reasoning scenarios, and offers excellent cost-effectiveness to support commercial products.
yi-vision-v2
16K
-
Supported
Conversation, Image recognition
零一万物_yi
Suitable for scenarios requiring analysis and interpretation of images and charts, such as image Q&A, chart understanding, OCR, visual reasoning, education, research report understanding, or multilingual document reading.
qwen-14b-chat
8k
2k
Supported
Conversation
千问_qwen
Alibaba Cloud's official Qwen - Open Source Edition.
qwen-72b-chat
32k
2k
Supported
Conversation
千问_qwen
Alibaba Cloud's official Qwen - Open Source Edition.
qwen-7b-chat
7.5k
1.5k
Supported
Conversation
千问_qwen
Alibaba Cloud's official Qwen - Open Source Edition.
qwen-coder-plus
128k
8k
Supported
Conversation, Code
千问_qwen
Qwen-Coder-Plus is a programming-specific model in the Qwen series, designed to enhance code generation and understanding capabilities. This model is trained on a large scale of programming data, capable of handling various programming languages, and supports code completion, error detection, and code refactoring. Its design goal is to provide developers with more efficient programming assistance and improve development efficiency.
qwen-coder-plus-latest
128k
8k
Supported
Conversation, Code
千问_qwen
Qwen-Coder-Plus-Latest is the newest version of Qwen-Coder-Plus, incorporating the latest algorithm optimizations and dataset updates. This model shows significant performance improvements, capable of understanding context more accurately and generating code that better meets developer needs. It also introduces support for more programming languages, enhancing its multilingual programming capabilities.
qwen-coder-turbo
128k
8k
Supported
Conversation, Code
千问_qwen
The Qwen series of code and programming models are language models specifically designed for programming and code generation, offering fast inference speed and low cost. This version always points to the latest stable snapshot.
qwen-coder-turbo-latest
128k
8k
Supported
Conversation, Code
千问_qwen
The Qwen series of code and programming models are language models specifically designed for programming and code generation, offering fast inference speed and low cost. This version always points to the latest snapshot.
qwen-long
10m
6k
Supported
Conversation
千问_qwen
Qwen-Long is a large language model from the Qwen series targeting ultra-long context processing scenarios. It supports Chinese, English, and other languages, and allows ultra-long context conversations of up to 10 million tokens (approximately 15 million characters or 15,000 pages of documents). Coupled with the synchronously launched document service, it supports parsing and conversation for various document formats such as Word, PDF, Markdown, EPUB, and MOBI. Note: For requests submitted directly via HTTP, it supports a length of 1M tokens; for lengths exceeding this, it is recommended to submit via file.
qwen-math-plus
4k
3k
Supported
Conversation
千问_qwen
Qwen-Math-Plus is a model focused on solving mathematical problems, aiming to provide efficient mathematical reasoning and computation capabilities. This model is trained on a large number of math problem sets, capable of handling complex mathematical expressions and problems, supporting various computational needs from basic arithmetic to advanced mathematics. Its application scenarios include education, scientific research, and engineering.
qwen-math-plus-latest
4k
3k
Supported
Conversation
千问_qwen
Qwen-Math-Plus-Latest is the newest version of Qwen-Math-Plus, integrating the latest mathematical reasoning technologies and algorithmic improvements. This model performs even better in handling complex mathematical problems, capable of providing more accurate solutions and reasoning processes. It also extends its understanding of mathematical symbols and formulas, suitable for a wider range of mathematical application scenarios.
qwen-math-turbo
4k
3k
Supported
Conversation
千问_qwen
Qwen-Math-Turbo is a high-performance mathematical model designed for rapid computation and real-time reasoning. This model optimizes computation speed, capable of processing a large number of mathematical problems in a very short time, suitable for applications requiring quick feedback, such as online education and real-time data analysis. Its efficient algorithm enables users to obtain instant results in complex calculations.
qwen-math-turbo-latest
4k
3k
Supported
Conversation
千问_qwen
Qwen-Math-Turbo-Latest is the newest version of Qwen-Math-Turbo, further enhancing computational efficiency and accuracy. This model features multiple algorithmic optimizations, capable of handling more complex mathematical problems and maintaining efficiency in real-time reasoning. It is suitable for mathematical applications requiring fast responses, such as financial analysis and scientific computing.
qwen-max
32k
8k
Supported
Conversation
千问_qwen
Qwen-Max is an ultra-large language model of the Qwen 2.5 series with trillions of parameters, supporting Chinese, English, and other languages. As the model is upgraded, qwen-max will be continuously updated.
qwen-max-latest
32k
8k
Supported
Conversation
千问_qwen
The best-performing model in the Qwen series. This model is dynamically updated, and model updates will not be announced in advance. It is suitable for complex, multi-step tasks. Its comprehensive Chinese and English capabilities are significantly improved, human preferences are notably enhanced, reasoning capabilities and complex instruction understanding are substantially strengthened, performance on difficult tasks is better, and mathematical and coding capabilities are significantly improved. It also enhances the ability to understand and generate structured data like Tables and JSON.
qwen-plus
128k
8k
Supported
Conversation
千问_qwen
A balanced model in the Qwen series, with inference performance and speed between Qwen-Max and Qwen-Turbo, suitable for moderately complex tasks. Its comprehensive Chinese and English capabilities are significantly improved, human preferences are notably enhanced, reasoning capabilities and complex instruction understanding are substantially strengthened, performance on difficult tasks is better, and mathematical and coding capabilities are significantly improved.
qwen-plus-latest
128k
8k
Supported
Conversation
千问_qwen
Qwen-Plus is an enhanced visual language model in the Qwen series, designed to improve detail recognition and text recognition capabilities. This model supports ultra-million pixel resolutions and arbitrary aspect ratio images, performing excellently in various visual language tasks, suitable for applications requiring high-precision image understanding.
qwen-turbo
128k
8k
Supported
Conversation
千问_qwen
The fastest and most cost-effective model in the Qwen series, suitable for simple tasks. Its comprehensive Chinese and English capabilities are significantly improved, human preferences are notably enhanced, reasoning capabilities and complex instruction understanding are substantially strengthened, performance on difficult tasks is better, and mathematical and coding capabilities are significantly improved.
qwen-turbo-latest
1m
8k
Supported
Conversation
千问_qwen
Qwen-Turbo is an efficient model designed for simple tasks, emphasizing speed and cost-effectiveness. It performs excellently in handling basic visual language tasks, suitable for applications with strict response time requirements, such as real-time image recognition and simple Q&A systems.
qwen-vl-max
32k
2k
Supported
Conversation
千问_qwen
Qwen-VL-Max (qwen-vl-max), the ultra-large visual language model of Qwen. Compared to the enhanced version, it further improves visual reasoning and instruction following capabilities, providing higher visual perception and cognitive levels. It offers optimal performance on more complex tasks.
qwen-vl-max-latest
32k
2k
Supported
Conversation, Image recognition
千问_qwen
Qwen-VL-Max is the highest-tier version in the Qwen-VL series, specifically designed to solve complex multimodal tasks. It combines advanced visual and language processing technologies, capable of understanding and analyzing high-resolution images, with extremely strong reasoning capabilities, suitable for applications requiring deep understanding and complex reasoning.
qwen-vl-ocr
34k
4k
Supported
Conversation, Image recognition
千问_qwen
Only supports OCR, not conversation.
qwen-vl-ocr-latest
34k
4k
Supported
Conversation, Image recognition
千问_qwen
Only supports OCR, not conversation.
qwen-vl-plus
8k
2k
Supported
Conversation, Image recognition
千问_qwen
Qwen-VL-Plus (qwen-vl-plus), the enhanced version of Qwen large-scale visual language model. Significantly improves detail recognition and text recognition capabilities, supporting ultra-million pixel resolutions and arbitrary aspect ratio images. Provides excellent performance across a wide range of visual tasks.
qwen-vl-plus-latest
32k
2k
Supported
Conversation, Image recognition
千问_qwen
Qwen-VL-Plus-Latest is the newest version of Qwen-VL-Plus, enhancing the model's multimodal understanding capabilities. It excels in combining image and text processing, suitable for applications requiring efficient handling of various input formats, such as intelligent customer service and content generation.
Qwen/Qwen2-1.5B-Instruct
32k
6k
Not supported
Conversation
千问_qwen
Qwen2-1.5B-Instruct is an instruction-tuned large language model in the Qwen2 series, with 1.5 billion parameters. This model is based on the Transformer architecture and incorporates techniques such as SwiGLU activation function, attention QKV bias, and Grouped Query Attention. It performs excellently in various benchmarks for language understanding, generation, multilingual capabilities, coding, mathematics, and reasoning, surpassing most open-source models.
Qwen/Qwen2-72B-Instruct
128k
6k
Not supported
Conversation
千问_qwen
Qwen2-72B-Instruct is an instruction-tuned large language model in the Qwen2 series, with 72 billion parameters. This model is based on the Transformer architecture and incorporates techniques such as SwiGLU activation function, attention QKV bias, and Grouped Query Attention. It can handle large-scale inputs. This model performs excellently in various benchmarks for language understanding, generation, multilingual capabilities, coding, mathematics, and reasoning, surpassing most open-source models.
Qwen/Qwen2-7B-Instruct
128k
6k
Not supported
Conversation
千问_qwen
Qwen2-7B-Instruct is an instruction-tuned large language model in the Qwen2 series, with 7 billion parameters. This model is based on the Transformer architecture and incorporates techniques such as SwiGLU activation function, attention QKV bias, and Grouped Query Attention. It can handle large-scale inputs. This model performs excellently in various benchmarks for language understanding, generation, multilingual capabilities, coding, mathematics, and reasoning, surpassing most open-source models.
Qwen/Qwen2-VL-72B-Instruct
32k
2k
Not supported
Conversation
千问_qwen
Qwen2-VL is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance in visual understanding benchmarks including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can understand videos over 20 minutes long for high-quality video-based Q&A, dialogue, and content creation. It also possesses complex reasoning and decision-making abilities, allowing integration with mobile devices, robots, etc., to perform autonomous operations based on visual environments and text instructions.
Qwen/Qwen2-VL-7B-Instruct
32k
-
Not supported
Conversation
千问_qwen
Qwen2-VL-7B-Instruct is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance in visual understanding benchmarks including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can be used for high-quality video-based Q&A, dialogue, and content creation, and also possesses complex reasoning and decision-making abilities, allowing integration with mobile devices, robots, etc., to perform autonomous operations based on visual environments and text instructions.
Qwen/Qwen2.5-72B-Instruct
128k
8k
Not supported
Conversation
千问_qwen
Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 72B model has significantly improved capabilities in areas such as coding and mathematics. It supports inputs up to 128K tokens and can generate long texts exceeding 8K tokens.
Qwen/Qwen2.5-72B-Instruct-128K
128k
8k
Not supported
Conversation
千问_qwen
Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 72B model has significantly improved capabilities in areas such as coding and mathematics. It supports inputs up to 128K tokens and can generate long texts exceeding 8K tokens.
Qwen/Qwen2.5-7B-Instruct
128k
8k
Not supported
Conversation
千问_qwen
Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model has significantly improved capabilities in areas such as coding and mathematics. This model also provides multilingual support, covering over 29 languages, including Chinese and English. The model shows significant improvements in instruction following, understanding structured data, and generating structured outputs (especially JSON).
Qwen/Qwen2.5-Coder-32B-Instruct
128k
8k
Not supported
Conversation, Code
千问_qwen
Qwen2.5-32B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 32B model has significantly improved capabilities in areas such as coding and mathematics. This model also provides multilingual support, covering over 29 languages, including Chinese and English. The model shows significant improvements in instruction following, understanding structured data, and generating structured outputs (especially JSON).
Qwen/Qwen2.5-Coder-7B-Instruct
128k
8k
Not supported
Conversation
千问_qwen
Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model has significantly improved capabilities in areas such as coding and mathematics. This model also provides multilingual support, covering over 29 languages, including Chinese and English. The model shows significant improvements in instruction following, understanding structured data, and generating structured outputs (especially JSON).
Qwen/QwQ-32B-Preview
32k
16k
Not supported
Conversation, Reasoning
千问_qwen
QwQ-32B-Preview is an experimental research model developed by the Qwen team, aiming to enhance AI's reasoning capabilities. As a preview version, it demonstrates excellent analytical abilities but also has some important limitations: 1. Language Mixing and Code Switching: The model may mix languages or unexpectedly switch between languages, affecting response clarity. 2. Recursive Reasoning Loops: The model may enter a loop of reasoning, leading to lengthy answers without clear conclusions. 3. Safety and Ethical Considerations: The model needs strengthened safety measures to ensure reliable and secure performance, and users should exercise caution when using it. 4. Performance and Benchmark Limitations: The model performs well in mathematics and programming but still has room for improvement in other areas such as common sense reasoning and nuanced language understanding.
qwen1.5-110b-chat
32k
8k
Not supported
Conversation
千问_qwen
-
qwen1.5-14b-chat
8k
2k
Not supported
Conversation
千问_qwen
-
qwen1.5-32b-chat
32k
2k
Not supported
Conversation
千问_qwen
-
qwen1.5-72b-chat
32k
2k
Not supported
Conversation
千问_qwen
-
qwen1.5-7b-chat
8k
2k
Not supported
Conversation
千问_qwen
-
qwen2-57b-a14b-instruct
65k
6k
Not supported
Conversation
千问_qwen
-
Qwen2-72B-Instruct
-
-
Not supported
Conversation
千问_qwen
-
qwen2-7b-instruct
128k
6k
Not supported
Conversation
千问_qwen
-
qwen2-math-72b-instruct
4k
3k
Not supported
Conversation
千问_qwen
-
qwen2-math-7b-instruct
4k
3k
Not supported
Conversation
千问_qwen
-
qwen2.5-14b-instruct
128k
8k
Not supported
Conversation
千问_qwen
-
qwen2.5-32b-instruct
128k
8k
Not supported
Conversation
千问_qwen
-
qwen2.5-72b-instruct
128k
8k
Not supported
Conversation
千问_qwen
-
qwen2.5-7b-instruct
128k
8k
Not supported
Conversation
千问_qwen
-
qwen2.5-coder-14b-instruct
128k
8k
Not supported
Conversation, Code
千问_qwen
-
qwen2.5-coder-32b-instruct
128k
8k
Not supported
Conversation, Code
千问_qwen
-
qwen2.5-coder-7b-instruct
128k
8k
Not supported
Conversation, Code
千问_qwen
-
qwen2.5-math-72b-instruct
4k
3k
Not supported
Conversation
千问_qwen
-
qwen2.5-math-7b-instruct
4k
3k
Not supported
Conversation
千问_qwen
-
deepseek-ai/DeepSeek-R1
64k
-
Not supported
Conversation, Reasoning
深度求索_deepseek
The DeepSeek-R1 model is an open-source reasoning model based purely on reinforcement learning, excelling in tasks such as mathematics, code, and natural language reasoning. Its performance is comparable to OpenAI's o1 model, achieving excellent results in multiple benchmarks.
deepseek-ai/DeepSeek-V2-Chat
128k
-
Not supported
Conversation
深度求索_deepseek
DeepSeek-V2 is a powerful, cost-efficient Mixture-of-Experts (MoE) language model. It was pre-trained on an 8.1 trillion token high-quality corpus and further enhanced through supervised fine-tuning (SFT) and reinforcement learning (RL). Compared to DeepSeek 67B, DeepSeek-V2 achieves stronger performance while saving 42.5% in training costs, reducing KV cache by 93.3%, and increasing maximum generation throughput by 5.76 times.
deepseek-ai/DeepSeek-V2.5
32k
-
Supported
Conversation
深度求索_deepseek
DeepSeek-V2.5 is an upgraded version of DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating the general and coding capabilities of the two previous versions. This model has been optimized in several aspects, including writing and instruction-following abilities, better aligning with human preferences.
deepseek-ai/DeepSeek-V3
128k
4k
Not supported
Conversation
深度求索_deepseek
DeepSeek open-source version, with a longer context than the official version and no issues like sensitive word refusal.
deepseek-chat
64k
8k
Supported
Conversation
深度求索_deepseek
236B parameters, 64K context (API), Chinese comprehensive capability (AlignBench) ranks first among open-source models, and is in the same tier as closed-source models like GPT-4-Turbo and Wenxin 4.0 in evaluations.
deepseek-coder
64k
8k
Supported
Conversation, Code
深度求索_deepseek
236B parameters, 64K context (API), Chinese comprehensive capability (AlignBench) ranks first among open-source models, and is in the same tier as closed-source models like GPT-4-Turbo and Wenxin 4.0 in evaluations.
deepseek-reasoner
64k
8k
Supported
Conversation, Reasoning
深度求索_deepseek
DeepSeek-Reasoner (DeepSeek-R1) is DeepSeek's newly launched reasoning model, designed to improve reasoning capabilities through reinforcement learning training. Its reasoning process involves extensive reflection and verification, capable of handling complex logical reasoning tasks, with a chain of thought length up to tens of thousands of characters. DeepSeek-R1 excels in solving mathematical, coding, and other complex problems, and has been widely applied in various scenarios, demonstrating its powerful reasoning capabilities and flexibility. Compared to other models, DeepSeek-R1's reasoning performance approaches that of top-tier closed-source models, showcasing the potential and competitiveness of open-source models in the reasoning domain.
hunyuan-code
4k
4k
Not supported
Conversation, Code
腾讯_hunyuan
Hunyuan's latest code generation model, based on a foundation model enhanced with 200B high-quality code data and trained with half a year of high-quality SFT data. The context window length is increased to 8K, and it ranks among the top in five major language code generation automated evaluation metrics; in ten comprehensive code task evaluations across five major languages by human experts, its performance is in the first tier.
hunyuan-functioncall
28k
4k
Supported
Conversation
腾讯_hunyuan
Hunyuan's latest MoE architecture FunctionCall model, trained with high-quality FunctionCall data, with a context window of 32K, and leading in multiple evaluation metrics.
hunyuan-large
28k
4k
Not supported
Conversation
腾讯_hunyuan
The Hunyuan-large model has approximately 389B total parameters and 52B active parameters, making it the industry's largest and most effective open-source MoE model with a Transformer architecture.
hunyuan-large-longcontext
128k
6k
Not supported
Conversation
腾讯_hunyuan
Excels at long document tasks such as document summarization and Q&A, and also has the ability to handle general text generation tasks. It performs excellently in the analysis and generation of long texts, effectively meeting the needs for processing complex and detailed long-form content.
hunyuan-lite
250k
6k
Not supported
Conversation
腾讯_hunyuan
Upgraded to MoE structure, with a context window of 256k, leading many open-source models in multiple NLP, code, mathematics, and industry evaluation sets.
hunyuan-pro
28k
4k
Supported
Conversation
腾讯_hunyuan
Trillion-parameter MoE-32K long-text model. Achieves absolute leading levels on various benchmarks, with complex instructions and reasoning, possesses complex mathematical capabilities, supports function calling, and is optimized for multilingual translation and applications in finance, law, and medicine.
hunyuan-role
28k
4k
Not supported
Conversation
腾讯_hunyuan
Hunyuan's latest role-playing model, fine-tuned and trained by Hunyuan officials, based on the Hunyuan model combined with role-playing scenario datasets, resulting in better foundational performance in role-playing scenarios.
hunyuan-standard
30k
2k
Not supported
Conversation
腾讯_hunyuan
Adopts a better routing strategy, while alleviating issues of load balancing and expert convergence. MOE-32K offers higher cost-effectiveness, and while balancing performance and price, it can handle long text inputs.
hunyuan-standard-256K
250k
6k
Not supported
Conversation
腾讯_hunyuan
Adopts a better routing strategy, while alleviating issues of load balancing and expert convergence. For long texts, the needle-in-a-haystack metric reaches 99.9%. MOE-256K further breaks through in length and performance, greatly expanding the input length.
hunyuan-translation-lite
4k
4k
Not supported
Conversation
腾讯_hunyuan
Hunyuan translation model supports natural language conversational translation; supports mutual translation between Chinese and 15 languages: English, Japanese, French, Portuguese, Spanish, Turkish, Russian, Arabic, Korean, Italian, German, Vietnamese, Malay, Indonesian.
hunyuan-turbo
28k
4k
Supported
Conversation
腾讯_hunyuan
Default version of the Hunyuan-turbo model, adopting a new Mixture-of-Experts (MoE) structure, with faster inference efficiency and stronger performance compared to hunyuan-pro.
hunyuan-turbo-latest
28k
4k
Supported
Conversation
腾讯_hunyuan
Dynamically updated version of the Hunyuan-turbo model, the best-performing version in the Hunyuan model series, consistent with the C-side (Tencent Yuanbao).
hunyuan-turbo-vision
8k
2k
Supported
Image recognition, Conversation
腾讯_hunyuan
Hunyuan's new generation flagship visual language large model, adopting a new Mixture-of-Experts (MoE) structure, with comprehensive improvements in basic recognition, content creation, knowledge Q&A, and analytical reasoning related to image-text understanding compared to the previous generation model. Max input 6k, max output 2k.
hunyuan-vision
8k
2k
Supported
Conversation, Image recognition
腾讯_hunyuan
Hunyuan's latest multimodal model, supporting image + text input to generate text content. Image basic recognition: Recognizes subjects, elements, scenes in images. Image content creation: Summarizes images, creates ad copy, WeChat Moments posts, poems, etc. Image multi-turn dialogue: Outputs a single image for multi-turn interactive Q&A. Image analysis and reasoning: Performs statistical analysis of logical relationships, math problems, code, and charts in images. Image knowledge Q&A: Answers knowledge-related questions based on images, such as historical events, movie posters. Image OCR: Recognizes text in images from natural life scenes and non-natural scenes.
SparkDesk-Lite
4k
-
Not supported
Conversation
星火_SparkDesk
Supports online search functionality, fast and convenient response, suitable for low-compute inference and customized scenarios like model fine-tuning.
SparkDesk-Max
128k
-
Supported
Conversation
星火_SparkDesk
Quantized from the latest SparkDesk 4.0 Turbo large model engine, supports multiple built-in plugins like online search, weather, and date. Core capabilities are comprehensively upgraded, and application effects in various scenarios are generally improved. Supports System role persona and FunctionCall function calling.
SparkDesk-Max-32k
32k
-
Supported
Conversation
星火_SparkDesk
Stronger reasoning: Stronger context understanding and logical reasoning capabilities. Longer input: Supports 32K tokens of text input, suitable for long document reading, private knowledge Q&A, and other scenarios.
SparkDesk-Pro
128k
-
Not supported
Conversation
星火_SparkDesk
Optimized for specific scenarios such as mathematics, code, medical, and education. Supports multiple built-in plugins like online search, weather, and date, covering most knowledge Q&A, language understanding, and text generation scenarios.
SparkDesk-Pro-128K
128k
-
Not supported
Conversation
星火_SparkDesk
Professional-grade large language model with tens of billions of parameters, specifically optimized for medical, educational, and coding scenarios. Lower latency in search scenarios. Suitable for text, intelligent Q&A, and other business scenarios with higher demands for performance and response speed.
moonshot-v1-128k
128k
4k
Supported
Conversation
月之暗面_moonshot
Model with a length of 8k, suitable for generating short texts.
moonshot-v1-32k
32k
4k
Supported
Conversation
月之暗面_moonshot
Model with a length of 32k, suitable for generating long texts.
moonshot-v1-8k
8k
4k
Supported
Conversation
月之暗面_moonshot
Model with a length of 128k, suitable for generating ultra-long texts.
codegeex-4
128k
4k
Not supported
Conversation, Code
智谱_codegeex
Zhipu's code model: suitable for automatic code completion tasks.
charglm-3
4k
2k
Not supported
Conversation
智谱_glm
Persona model.
emohaa
8k
4k
Not supported
Conversation
智谱_glm
Psychological model: possesses professional counseling abilities to help users understand emotions and cope with emotional issues.
glm-3-turbo
128k
4k
Not supported
Conversation
智谱_glm
Will be deprecated (June 30, 2025).
glm-4
128k
4k
Supported
Conversation
智谱_glm
Old flagship: Released on January 16, 2024, now replaced by GLM-4-0520.
glm-4-0520
128k
4k
Supported
Conversation
智谱_glm
High-intelligence model: suitable for handling highly complex and diverse tasks.
glm-4-air
128k
4k
Supported
Conversation
智谱_glm
High cost-performance: the most balanced model between reasoning ability and price.
glm-4-airx
8k
4k
Supported
Conversation
智谱_glm
Ultra-fast inference: features ultra-fast inference speed and powerful reasoning effects.
glm-4-flash
128k
4k
Supported
Conversation
智谱_glm
High speed, low cost: ultra-fast inference speed.
glm-4-flashx
128k
4k
Supported
Conversation
智谱_glm
High speed, low cost: Enhanced Flash version, ultra-fast inference speed.
glm-4-long
1m
4k
Supported
Conversation
智谱_glm
Ultra-long input: specifically designed for processing ultra-long texts and memory-intensive tasks.
glm-4-plus
128k
4k
Supported
Conversation
智谱_glm
High-intelligence flagship: comprehensively improved performance, significantly enhanced long text and complex task capabilities.
glm-4v
2k
-
Not supported
Conversation, Image recognition
智谱_glm
Image understanding: possesses image understanding and reasoning capabilities.
glm-4v-flash
2k
1k
Not supported
Conversation, Image recognition
智谱_glm
Free model: possesses powerful image understanding capabilities.
Last updated
Was this helpful?