Model Data
360gpt-pro
8k
-
Not supported
Conversation
360AI_360gpt
The flagship hundred-billion-parameter model of the 360 Zhinao series with the best performance, widely applicable to complex task scenarios across various domains.
360gpt-turbo
7k
-
Not supported
Conversation
360AI_360gpt
A ten-billion-parameter model that balances performance and efficiency, suitable for scenarios with higher performance/cost requirements.
360gpt-turbo-responsibility-8k
8k
-
Not supported
Conversation
360AI_360gpt
A ten-billion-parameter model that balances performance and efficiency, suitable for scenarios with higher performance/cost requirements.
360gpt2-pro
8k
-
Not supported
Conversation
360AI_360gpt
The flagship hundred-billion-parameter model of the 360 Zhinao series with the best performance, widely applicable to complex task scenarios across various domains.
claude-3-5-sonnet-20240620
200k
16k
Not supported
Conversation, Image Understanding
Anthropic_claude
Snapshot version released on June 20, 2024. Claude 3.5 Sonnet is a model that balances performance and speed, delivering top-tier performance while maintaining high speed, and supports multimodal input.
claude-3-5-haiku-20241022
200k
16k
Not supported
Conversation
Anthropic_claude
Snapshot version released on October 22, 2024. Claude 3.5 Haiku has improved skills across the board, including coding, tool use, and reasoning. As the fastest model in the Anthropic series, it offers quick response times, suitable for highly interactive, low-latency applications such as user-facing chatbots and real-time code completion. It also performs well on specialized tasks like data extraction and real-time content moderation. It does not support image input.
claude-3-5-sonnet-20241022
200k
8K
Not supported
Conversation, Image Understanding
Anthropic_claude
Snapshot version released on October 22, 2024. Claude 3.5 Sonnet provides capabilities beyond Opus and faster speeds than Sonnet, while maintaining the same price as Sonnet. Sonnet is particularly strong in programming, data science, visual processing, and agent tasks.
claude-3-5-sonnet-latest
200K
8k
Not supported
Conversation, Image Understanding
Anthropic_claude
Dynamically points to the latest Claude 3.5 Sonnet version. Claude 3.5 Sonnet provides capabilities beyond Opus and faster speeds than Sonnet while keeping the same price. Sonnet excels at programming, data science, visual processing, and agent tasks; this model points to the latest version.
claude-3-haiku-20240307
200k
4k
Not supported
Conversation, Image Understanding
Anthropic_claude
Claude 3 Haiku is Anthropic’s fastest and most compact model, designed for near-instant responses. It has fast and accurate targeted performance.
claude-3-opus-20240229
200k
4k
Not supported
Conversation, Image Understanding
Anthropic_claude
Claude 3 Opus is Anthropic’s most powerful model for handling highly complex tasks. It excels in performance, intelligence, fluency, and comprehension.
claude-3-sonnet-20240229
200k
8k
Not supported
Conversation, Image Understanding
Anthropic_claude
Snapshot version released on February 29, 2024. Sonnet is particularly good at: - Coding: can write, edit, and run code autonomously and has reasoning and debugging capabilities - Data science: augments human data science expertise; can handle unstructured data when using various tools to obtain insights - Visual processing: adept at interpreting charts, graphics, and images, accurately transcribing text to obtain insights beyond the text itself - Agent tasks: excellent at tool use, very suitable for handling agent tasks (complex multi-step problem-solving tasks that require interacting with other systems)
google/gemma-2-27b-it
8k
-
Not supported
Conversation
Google_gamma
Gemma is a lightweight, state-of-the-art open model series developed by Google, built with the same research and technology as the Gemini models. These models are decoder-only large language models that support English and provide open weights for both pretraining and instruction-tuned variants. Gemma models are suitable for various text generation tasks, including Q&A, summarization, and reasoning.
google/gemma-2-9b-it
8k
-
Not supported
Conversation
Google_gamma
Gemma is one of Google’s lightweight, state-of-the-art open model series. It is a decoder-only large language model that supports English and provides open weights, pretraining variants, and instruction-tuned variants. Gemma models are suitable for a range of text generation tasks, including Q&A, summarization, and reasoning. The 9B model was trained on 8 trillion tokens.
gemini-1.5-pro
2m
8k
Not supported
Conversation
Google_gemini
The latest stable release of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. Particularly suitable for tasks requiring complex reasoning.
gemini-1.0-pro-001
33k
8k
Not supported
Conversation
Google_gemini
This is the stable version of Gemini 1.0 Pro. As an NLP model, it is specialized for multi-turn text and code chat as well as code generation. This model will be deprecated on February 15, 2025; migration to the 1.5 series is recommended.
gemini-1.0-pro-002
32k
8k
Not supported
Conversation
Google_gemini
This is the stable version of Gemini 1.0 Pro. As an NLP model, it is specialized for multi-turn text and code chat as well as code generation. This model will be deprecated on February 15, 2025; migration to the 1.5 series is recommended.
gemini-1.0-pro-latest
33k
8k
Not supported
Conversation, Deprecated or Soon-to-be Deprecated
Google_gemini
This is the latest version of Gemini 1.0 Pro. As an NLP model, it is specialized for multi-turn text and code chat as well as code generation. This model will be deprecated on February 15, 2025; migration to the 1.5 series is recommended.
gemini-1.0-pro-vision-001
16k
2k
Not supported
Conversation
Google_gemini
This is the vision version of Gemini 1.0 Pro. This model will be deprecated on February 15, 2025; migration to the 1.5 series is recommended.
gemini-1.0-pro-vision-latest
16k
2k
Not supported
Image Understanding
Google_gemini
This is the latest vision version of Gemini 1.0 Pro. This model will be deprecated on February 15, 2025; migration to the 1.5 series is recommended.
gemini-1.5-flash
1m
8k
Not supported
Conversation, Image Understanding
Google_gemini
This is the latest stable release of Gemini 1.5 Flash. As a balanced multimodal model, it can handle audio, images, video, and text inputs.
gemini-1.5-flash-001
1m
8k
Not supported
Conversation, Image Understanding
Google_gemini
This is the stable version of Gemini 1.5 Flash. They provide the same core functionality as gemini-1.5-flash but are version-locked, suitable for production use.
gemini-1.5-flash-002
1m
8k
Not supported
Conversation, Image Understanding
Google_gemini
This is the stable version of Gemini 1.5 Flash. They provide the same core functionality as gemini-1.5-flash but are version-locked, suitable for production use.
gemini-1.5-flash-8b
1m
8k
Not supported
Conversation, Image Understanding
Google_gemini
Gemini 1.5 Flash-8B is Google’s newly released multimodal AI model designed for efficient handling of large-scale tasks. The model has 8 billion parameters and supports text, image, audio, and video inputs, suitable for various applications such as chat, transcription, and translation. Compared to other Gemini models, Flash-8B is optimized for speed and cost-effectiveness, making it especially suitable for cost-sensitive users. Its rate limits are doubled to allow developers to handle large-scale tasks more efficiently. Additionally, Flash-8B uses “knowledge distillation” to extract key knowledge from larger models, ensuring lightweight efficiency while maintaining core capabilities.
gemini-1.5-flash-exp-0827
1m
8k
Not supported
Conversation, Image Understanding
Google_gemini
This is an experimental version of Gemini 1.5 Flash, updated periodically to include the latest improvements. Suitable for exploratory testing and prototyping, not recommended for production.
gemini-1.5-flash-latest
1m
8k
Not supported
Conversation, Image Understanding
Google_gemini
This is the cutting-edge version of Gemini 1.5 Flash, updated periodically to include the latest improvements. Suitable for exploratory testing and prototyping, not recommended for production.
gemini-1.5-pro-001
2m
8k
Not supported
Conversation, Image Understanding
Google_gemini
This is the stable version of Gemini 1.5 Pro, providing fixed model behavior and performance characteristics. Suitable for production environments that require stability.
gemini-1.5-pro-002
2m
8k
Not supported
Conversation, Image Understanding
Google_gemini
This is the stable version of Gemini 1.5 Pro, providing fixed model behavior and performance characteristics. Suitable for production environments that require stability.
gemini-1.5-pro-exp-0801
2m
8k
Not supported
Conversation, Image Understanding
Google_gemini
An experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. Particularly suitable for tasks requiring complex reasoning.
gemini-1.5-pro-exp-0827
2m
8k
Not supported
Conversation, Image Understanding
Google_gemini
An experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. Particularly suitable for tasks requiring complex reasoning.
gemini-1.5-pro-latest
2m
8k
Not supported
Conversation, Image Understanding
Google_gemini
This is the latest version of Gemini 1.5 Pro, dynamically pointing to the newest snapshot.
gemini-2.0-flash
1m
8k
Not supported
Conversation, Image Understanding
Google_gemini
Gemini 2.0 Flash is Google’s latest model, offering faster time-to-first-token (TTFT) compared to the 1.5 version while maintaining quality comparable to Gemini Pro 1.5. This model has significant improvements in multimodal understanding, coding ability, complex instruction execution, and function calls, providing a smoother and more powerful intelligent experience.
gemini-2.0-flash-exp
100k
8k
Supported
Conversation, Image Understanding
Google_gemini
Gemini 2.0 Flash introduces multimodal real-time APIs, improved speed and performance, quality enhancements, stronger agent capabilities, and adds image generation and voice conversion features.
gemini-2.0-flash-lite-preview-02-05
1M
8k
Not supported
Conversation, Image Understanding
Google_gemini
Gemini 2.0 Flash-Lite is Google’s newly released cost-effective AI model that offers better quality while maintaining the same speed as 1.5 Flash; it supports a 1,000,000-token context window and can handle multimodal tasks including images, audio, and code. As Google’s most cost-effective model to date, it uses a simplified single-pricing strategy, making it especially suitable for large-scale applications requiring cost control.
gemini-2.0-flash-thinking-exp
40k
8k
Not supported
Conversation, Reasoning
Google_gemini
gemini-2.0-flash-thinking-exp is an experimental model that can generate the “thought process” experienced while producing a response. Therefore, compared to the basic Gemini 2.0 Flash model, the “thinking mode” responses have stronger reasoning capabilities.
gemini-2.0-flash-thinking-exp-01-21
1m
64k
Not supported
Conversation, Reasoning
Google_gemini
Gemini 2.0 Flash Thinking EXP-01-21 is Google’s latest AI model focused on improving reasoning ability and user interaction. The model has strong reasoning capabilities, especially in mathematics and programming, and supports context windows up to 1,000,000 tokens, suitable for complex tasks and deep analysis. Its uniqueness lies in generating thought processes to improve the interpretability of AI reasoning, while supporting native code execution to enhance interaction flexibility and practicality. Through algorithmic optimizations, the model reduces logical contradictions, further improving answer accuracy and consistency.
gemini-2.0-flash-thinking-exp-1219
40k
8k
Not supported
Conversation, Reasoning, Image Understanding
Google_gemini
gemini-2.0-flash-thinking-exp-1219 is an experimental model that can generate the “thought process” experienced while producing a response. Therefore, compared to the basic Gemini 2.0 Flash model, the “thinking mode” responses have stronger reasoning capabilities.
gemini-2.0-pro-exp-01-28
2m
64k
Not supported
Conversation, Image Understanding
Google_gemini
Pre-release model, not yet online
gemini-2.0-pro-exp-02-05
2m
8k
Not supported
Conversation, Image Understanding
Google_gemini
Gemini 2.0 Pro Exp 02-05 is Google’s experimental model released in February 2024, excelling at world knowledge, code generation, and long-text understanding. The model supports a 2,000,000-token ultra-long context window and can handle 2-hour video, 22-hour audio, over 60,000 lines of code, and more than 1.4 million words of content. As part of the Gemini 2.0 family, it uses the new Flash Thinking training strategy, significantly improving performance and ranking highly on multiple LLM benchmarks, demonstrating strong overall capabilities.
gemini-exp-1114
8k
4k
Not supported
Conversation, Image Understanding
Google_gemini
This is an experimental model released on November 14, 2024, primarily focused on quality improvements.
gemini-exp-1121
8k
4k
Not supported
Conversation, Image Understanding, Code
Google_gemini
This is an experimental model released on November 21, 2024, with improvements in coding, reasoning, and visual capabilities.
gemini-exp-1206
8k
4k
Not supported
Conversation, Image Understanding
Google_gemini
This is an experimental model released on December 6, 2024, with improvements in coding, reasoning, and visual capabilities.
gemini-exp-latest
8k
4k
Not supported
Conversation, Image Understanding
Google_gemini
This is an experimental model that dynamically points to the latest version.
gemini-pro
33k
8k
Not supported
Conversation
Google_gemini
Same as gemini-1.0-pro; an alias for gemini-1.0-pro
gemini-pro-vision
16k
2k
Not supported
Conversation, Image Understanding
Google_gemini
This is the vision version of Gemini 1.0 Pro. This model will be deprecated on February 15, 2025; migration to the 1.5 series is recommended.
grok-2
128k
-
Not supported
Conversation
Grok_grok
A new grok model version released by X.ai on 2024-12-12.
grok-2-1212
128k
-
Not supported
Conversation
Grok_grok
A new grok model version released by X.ai on 2024-12-12.
grok-2-latest
128k
-
Not supported
Conversation
Grok_grok
A new grok model version released by X.ai on 2024-12-12.
grok-2-vision-1212
32k
-
Not supported
Conversation, Image Understanding
Grok_grok
The grok vision model released by X.ai on 2024-12-12.
grok-beta
100k
-
Not supported
Conversation
Grok_grok
Comparable performance to Grok 2, with improvements in efficiency, speed, and features.
grok-vision-beta
8k
-
Not supported
Conversation, Image Understanding
Grok_grok
The latest image understanding model that can handle various visual inputs, including documents, charts, screenshots, and photos.
internlm/internlm2_5-20b-chat
32k
-
Supported
Conversation
internlm
InternLM2.5-20B-Chat is an open-source large conversational model developed on the InternLM2 architecture. The model has 20 billion parameters and excels in mathematical reasoning, outperforming Llama3 and Gemma2-27B models of similar size. InternLM2.5-20B-Chat has significantly improved tool-calling capabilities, supporting information collection from hundreds of web pages for analysis and reasoning, and has stronger instruction understanding, tool selection, and result reflection abilities.
meta-llama/Llama-3.2-11B-Vision-Instruct
8k
-
Not supported
Conversation, Image Understanding
Meta_llama
Llama series models can now handle both text and image data; some Llama 3.2 variants include visual understanding capabilities. This model supports simultaneous text and image input, understands images, and outputs textual information.
meta-llama/Llama-3.2-3B-Instruct
32k
-
Not supported
Conversation
Meta_llama
Meta Llama 3.2 is a multilingual large language model family with 1B and 3B lightweight variants suitable for edge and mobile devices; this model is the 3B version.
meta-llama/Llama-3.2-90B-Vision-Instruct
8k
-
Not supported
Conversation, Image Understanding
Meta_llama
Llama series models can now handle both text and image data; some Llama 3.2 variants include visual understanding capabilities. This model supports simultaneous text and image input, understands images, and outputs textual information.
meta-llama/Llama-3.3-70B-Instruct
131k
-
Not supported
Conversation
Meta_llama
Meta’s latest 70B LLM, with performance comparable to llama 3.1 405B.
meta-llama/Meta-Llama-3.1-405B-Instruct
32k
-
Not supported
Conversation
Meta_llama
The Meta Llama 3.1 multilingual LLM family is a collection of pretrained and instruction-tuned generative models in 8B, 70B, and 405B sizes; this model is the 405B version. The Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual dialogue and outperform many available open-source and closed-source chat models on common industry benchmarks.
meta-llama/Meta-Llama-3.1-70B-Instruct
32k
-
Not supported
Conversation
Meta_llama
Meta Llama 3.1 is a multilingual large language model family developed by Meta, including pretrained and instruction-tuned variants at 8B, 70B, and 405B parameter scales. The 70B instruction-tuned model is optimized for multilingual dialogue and performs strongly across multiple industry benchmarks. The model was trained on over 150 trillion public tokens and uses supervised fine-tuning and reinforcement learning from human feedback to improve usefulness and safety.
meta-llama/Meta-Llama-3.1-8B-Instruct
32k
-
Not supported
Conversation
Meta_llama
The Meta Llama 3.1 multilingual LLM family is a collection of pretrained and instruction-tuned generative models in 8B, 70B, and 405B sizes; this model is the 8B version. The Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual dialogue and outperform many available open-source and closed-source chat models on common industry benchmarks.
abab5.5-chat
16k
-
Supported
Conversation
Minimax_abab
Chinese persona chat scenarios
abab5.5s-chat
8k
-
Supported
Conversation
Minimax_abab
Chinese persona chat scenarios
abab6.5g-chat
8k
-
Supported
Conversation
Minimax_abab
English and other multilingual persona chat scenarios
abab6.5s-chat
245k
-
Supported
Conversation
Minimax_abab
General scenarios
abab6.5t-chat
8k
-
Supported
Conversation
Minimax_abab
Chinese persona chat scenarios
chatgpt-4o-latest
128k
16k
Not supported
Conversation, Image Understanding
OpenAI
The chatgpt-4o-latest model version continuously points to the GPT-4o version used in ChatGPT and is updated as soon as significant changes occur.
gpt-4o-2024-11-20
128k
16k
Supported
Conversation
OpenAI
Latest gpt-4o snapshot from November 20, 2024.
gpt-4o-audio-preview
128k
16k
Not supported
Conversation
OpenAI
OpenAI’s real-time voice conversation model
gpt-4o-audio-preview-2024-10-01
128k
16k
Supported
Conversation
OpenAI
OpenAI’s real-time voice conversation model
o1
128k
32k
Not supported
Conversation, Reasoning, Image Understanding
OpenAI
A new reasoning model from OpenAI for complex tasks that require broad common sense. The model has a 200k context window, is currently the most powerful model globally, and supports image recognition.
o1-mini-2024-09-12
128k
64k
Not supported
Conversation, Reasoning
OpenAI
A fixed snapshot of o1-mini, smaller and faster than o1-preview, about 80% lower cost, and performs well in code generation and small-context operations.
o1-preview-2024-09-12
128k
32k
Not supported
Conversation, Reasoning
OpenAI
A fixed snapshot of o1-preview.
gpt-3.5-turbo
16k
4k
Supported
Conversation
OpenAI_gpt-3
Based on GPT-3.5: GPT-3.5 Turbo is an improved version built on GPT-3.5 and developed by OpenAI. Performance goals: designed to improve inference speed, processing efficiency, and resource utilization through optimizations to model structure and algorithms. Improved inference speed: compared to GPT-3.5, GPT-3.5 Turbo typically offers faster inference under the same hardware conditions, which is beneficial for large-scale text processing applications. Higher throughput: when handling many requests or data, GPT-3.5 Turbo can achieve higher concurrent processing capacity, improving overall system throughput. Optimized resource consumption: while maintaining performance, it may reduce hardware resource demands (such as memory and compute), helping lower operating costs and improve scalability. Wide range of NLP tasks: GPT-3.5 Turbo is suitable for many NLP tasks, including but not limited to text generation, semantic understanding, dialogue systems, and machine translation. Developer tools and API support: provides developer-friendly APIs for easy integration and use, supporting rapid development and deployment of applications.
gpt-3.5-turbo-0125
16k
4k
Supported
Conversation
OpenAI_gpt-3
Updated GPT 3.5 Turbo with more accurate response formatting and a fix for a bug that made non-English function call text encoding problematic. Returns up to 4,096 output tokens.
gpt-3.5-turbo-0613
16k
4k
Supported
Conversation
OpenAI_gpt-3
Updated fixed snapshot version of GPT 3.5 Turbo. Now deprecated.
gpt-3.5-turbo-1106
16k
4k
Supported
Conversation
OpenAI_gpt-3
Features improved instruction following, JSON mode, reproducible outputs, parallel function calls, etc. Returns up to 4,096 output tokens.
gpt-3.5-turbo-16k
16k
4k
Supported
Conversation, Deprecated or Soon-to-be Deprecated
OpenAI_gpt-3
(Deprecated)
gpt-3.5-turbo-16k-0613
16k
4k
Supported
Conversation, Deprecated or Soon-to-be Deprecated
OpenAI_gpt-3
Snapshot of gpt-3.5-turbo from June 13, 2023. (Deprecated)
gpt-3.5-turbo-instruct
4k
4k
Supported
Conversation
OpenAI_gpt-3
Capabilities similar to models from the GPT-3 era. Compatible with legacy Completions endpoints, not for Chat Completions.
gpt-3.5o
16k
4k
Not supported
Conversation
OpenAI_gpt-3
Same as gpt-4o-lite
gpt-4
8k
8k
Supported
Conversation
OpenAI_gpt-4
Currently points to gpt-4-0613.
gpt-4-0125-preview
128k
4k
Supported
Conversation
OpenAI_gpt-4
The latest GPT-4 model aimed at reducing “laziness,” where the model fails to complete tasks. Returns up to 4,096 output tokens.
gpt-4-0314
8k
8k
Supported
Conversation
OpenAI_gpt-4
Snapshot of gpt-4 from March 14, 2023.
gpt-4-0613
8k
8k
Supported
Conversation
OpenAI_gpt-4
Snapshot of gpt-4 from June 13, 2023, with enhanced function call support.
gpt-4-1106-preview
128k
4k
Supported
Conversation
OpenAI_gpt-4
GPT-4 Turbo model with improved instruction following, JSON mode, reproducible outputs, function calls, etc. Returns up to 4,096 output tokens. This is a preview model.
gpt-4-32k
32k
4k
Supported
Conversation
OpenAI_gpt-4
gpt-4-32k will be deprecated on 2025-06-06.
gpt-4-32k-0613
32k
4k
Supported
Conversation, Deprecated or Soon-to-be Deprecated
OpenAI_gpt-4
Will be deprecated on 2025-06-06.
gpt-4-turbo
128k
4k
Supported
Conversation
OpenAI_gpt-4
The latest GPT-4 Turbo model adds vision capabilities and supports handling visual requests via JSON mode and function calls. The current version is gpt-4-turbo-2024-04-09.
gpt-4-turbo-2024-04-09
128k
4k
Supported
Conversation
OpenAI_gpt-4
GPT-4 Turbo with vision capabilities. Visual requests can now be handled via JSON mode and function calls. The current gpt-4-turbo version is this one.
gpt-4-turbo-preview
128k
4k
Supported
Conversation, Image Understanding
OpenAI_gpt-4
Currently points to gpt-4-0125-preview.
gpt-4o
128k
16k
Supported
Conversation, Image Understanding
OpenAI_gpt-4
OpenAI’s high-intelligence flagship model suitable for complex multi-step tasks. GPT-4o is cheaper and faster than GPT-4 Turbo.
gpt-4o-2024-05-13
128k
4k
Supported
Conversation, Image Understanding
OpenAI_gpt-4
Original gpt-4o snapshot from May 13, 2024.
gpt-4o-2024-08-06
128k
16k
Supported
Conversation, Image Understanding
OpenAI_gpt-4
The first snapshot to support structured outputs. gpt-4o currently points to this version.
gpt-4o-mini
128k
16k
Supported
Conversation, Image Understanding
OpenAI_gpt-4
OpenAI’s affordable gpt-4o variant for fast, lightweight tasks. GPT-4o mini is cheaper and more capable than GPT-3.5 Turbo. Currently points to gpt-4o-mini-2024-07-18.
gpt-4o-mini-2024-07-18
128k
16k
Supported
Conversation, Image Understanding
OpenAI_gpt-4
A fixed snapshot version of gpt-4o-mini.
gpt-4o-realtime-preview
128k
4k
Supported
Conversation, Real-time Voice
OpenAI_gpt-4
OpenAI’s real-time voice conversation model
gpt-4o-realtime-preview-2024-10-01
128k
4k
Supported
Conversation, Real-time Voice, Image Understanding
OpenAI_gpt-4
gpt-4o-realtime-preview currently points to this snapshot version
o1-mini
128k
64k
Not supported
Conversation, Reasoning
OpenAI_o1
Smaller and faster than o1-preview, about 80% lower cost, and performs well in code generation and small-context operations.
o1-preview
128k
32k
Not supported
Conversation, Reasoning
OpenAI_o1
o1-preview is a new reasoning model for complex tasks requiring broad common sense. The model has a 128K context and a knowledge cutoff of October 2023. It focuses on advanced reasoning and solving complex problems, including mathematics and science tasks. It is ideal for applications that need deep context understanding and autonomous workflows.
o3-mini
200k
100k
Supported
Conversation, Reasoning
OpenAI_o1
o3-mini is OpenAI’s latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on scientific, mathematical, and coding tasks, supports structured outputs, function calls, batch API, and other developer features, with knowledge cutoff in October 2023, demonstrating a significant balance between reasoning capability and economy.
o3-mini-2025-01-31
200k
100k
Supported
Conversation, Reasoning
OpenAI_o1
o3-mini currently points to this version. o3-mini-2025-01-31 is OpenAI’s latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on scientific, mathematical, and coding tasks, supports structured outputs, function calls, batch API, and other developer features, with knowledge cutoff in October 2023, demonstrating a significant balance between reasoning capability and economy.
Baichuan2-Turbo
32k
-
Not supported
Conversation
Baichuan_baichuan
Compared to industry models of the same size, the model maintains industry-leading performance while significantly reducing cost.
Baichuan3-Turbo
32k
-
Not supported
Conversation
Baichuan_baichuan
Compared to industry models of the same size, the model maintains industry-leading performance while significantly reducing cost.
Baichuan3-Turbo-128k
128k
-
Not supported
Conversation
Baichuan_baichuan
Baichuan’s model handles complex text via a 128k ultra-long context window, is specially optimized for industries like finance, and while maintaining high performance, greatly reduces cost to provide a high cost-performance solution for enterprises.
Baichuan4
32k
-
Not supported
Conversation
Baichuan_baichuan
Baichuan’s MoE model provides cost-effective enterprise solutions through specialized optimization, cost reduction, and performance improvement.
Baichuan4-Air
32k
-
Not supported
Conversation
Baichuan_baichuan
Baichuan’s MoE model provides cost-effective enterprise solutions through specialized optimization, cost reduction, and performance improvement.
Baichuan4-Turbo
32k
-
Not supported
Conversation
Baichuan_baichuan
Trained on massive high-quality scenario data, enterprise high-frequency scenario availability is improved by over 10% compared to Baichuan4, information summarization improved by 50%, multilingual performance improved by 31%, and content generation improved by 13%. With special optimization for reasoning performance, first-token response speed is increased by 51% relative to Baichuan4, and token throughput is increased by 73%.
ERNIE-3.5-128K
128k
4k
Supported
Conversation
Baidu_ernie
Baidu’s self-developed flagship large-scale language model covering massive Chinese and English corpora, with strong general capabilities to meet most dialogue Q&A, creative generation, and plugin application scenarios; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.
ERNIE-3.5-8K
8k
1k
Supported
Conversation
Baidu_ernie
Baidu’s self-developed flagship large-scale language model covering massive Chinese and English corpora, with strong general capabilities to meet most dialogue Q&A, creative generation, and plugin application scenarios; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.
ERNIE-3.5-8K-Preview
8k
1k
Supported
Conversation
Baidu_ernie
Baidu’s self-developed flagship large-scale language model covering massive Chinese and English corpora, with strong general capabilities to meet most dialogue Q&A, creative generation, and plugin application scenarios; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.
ERNIE-4.0-8K
8k
1k
Supported
Conversation
Baidu_ernie
Baidu’s self-developed flagship ultra-large-scale language model, achieving a comprehensive upgrade in capabilities compared to ERNIE 3.5, widely applicable to complex task scenarios across various domains; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.
ERNIE-4.0-8K-Latest
8k
2k
Supported
Conversation
Baidu_ernie
Compared to ERNIE-4.0-8K, ERNIE-4.0-8K-Latest has a comprehensive capability upgrade, with significant improvements in role-playing abilities and instruction following. Compared to ERNIE 3.5, it achieves a full upgrade in model capability and is widely applicable to complex task scenarios across various domains; it supports automatic integration with Baidu search plugins to ensure Q&A timeliness and supports 5K tokens input + 2K tokens output. This article describes the ERNIE-4.0-8K-Latest API usage.
ERNIE-4.0-8K-Preview
8k
1k
Supported
Conversation
Baidu_ernie
Baidu’s self-developed flagship ultra-large-scale language model, achieving a comprehensive upgrade in capabilities compared to ERNIE 3.5, widely applicable to complex task scenarios across various domains; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.
ERNIE-4.0-Turbo-128K
128k
4k
Supported
Conversation
Baidu_ernie
ERNIE 4.0 Turbo is Baidu’s self-developed flagship ultra-large-scale language model with excellent overall performance, widely applicable to complex task scenarios across domains; supports automatic integration with Baidu search plugins to ensure Q&A timeliness. Compared to ERNIE 4.0, it has superior performance. ERNIE-4.0-Turbo-128K is a version of the model whose long-document overall performance is better than ERNIE-3.5-128K. This article describes related APIs and usage.
ERNIE-4.0-Turbo-8K
8k
2k
Supported
Conversation
Baidu_ernie
ERNIE 4.0 Turbo is Baidu’s self-developed flagship ultra-large-scale language model with excellent overall performance, widely applicable to complex task scenarios across domains; supports automatic integration with Baidu search plugins to ensure Q&A timeliness. Compared to ERNIE 4.0, it has superior performance. ERNIE-4.0-Turbo-8K is a version of the model. This article describes related APIs and usage.
ERNIE-4.0-Turbo-8K-Latest
8k
2k
Supported
Conversation
Baidu_ernie
ERNIE 4.0 Turbo is Baidu’s self-developed flagship ultra-large-scale language model with excellent overall performance, widely applicable to complex task scenarios across domains; supports automatic integration with Baidu search plugins to ensure Q&A timeliness. Compared to ERNIE 4.0, it has superior performance. ERNIE-4.0-Turbo-8K is one version of the model.
ERNIE-4.0-Turbo-8K-Preview
8k
2k
Supported
Conversation
Baidu_ernie
ERNIE 4.0 Turbo is Baidu’s self-developed flagship ultra-large-scale language model with excellent overall performance, widely applicable to complex task scenarios across domains; supports automatic integration with Baidu search plugins to ensure Q&A timeliness. ERNIE-4.0-Turbo-8K-Preview is one version of the model.
ERNIE-Character-8K
8k
1k
Not supported
Conversation
Baidu_ernie
Baidu’s self-developed vertical-domain large language model, suitable for game NPCs, customer service dialogues, and role-playing applications. It has a more distinctive and consistent persona style, stronger instruction following, and superior reasoning performance.
ERNIE-Lite-8K
8k
4k
Not supported
Conversation
Baidu_ernie
Baidu’s self-developed lightweight large language model, balancing excellent model quality and inference performance, suitable for inference on low-power AI acceleration cards.
ERNIE-Lite-Pro-128K
128k
2k
Supported
Conversation
Baidu_ernie
Baidu’s self-developed lightweight large language model with better performance than ERNIE Lite, balancing excellent model quality and inference performance, suitable for inference on low-power AI acceleration cards. ERNIE-Lite-Pro-128K supports 128K context length and outperforms ERNIE-Lite-128K.
ERNIE-Novel-8K
8k
2k
Not supported
Conversation
Baidu_ernie
ERNIE-Novel-8K is Baidu’s general-purpose large language model with notable advantages in novel continuation capabilities; it can also be used for short dramas, movies, and similar scenarios.
ERNIE-Speed-128K
128k
4k
Not supported
Conversation
Baidu_ernie
Baidu’s latest high-performance large language model released in 2024, with excellent general capability, suitable as a base model for fine-tuning to better handle specific scenarios, while also offering outstanding inference performance.
ERNIE-Speed-8K
8k
1k
Not supported
Conversation
Baidu_ernie
Baidu’s latest high-performance large language model released in 2024, with excellent general capability, suitable as a base model for fine-tuning to better handle specific scenarios, while also offering outstanding inference performance.
ERNIE-Speed-Pro-128K
128k
4k
Not supported
Conversation
Baidu_ernie
ERNIE Speed Pro is Baidu’s latest high-performance large language model released in 2024, with excellent general capability, suitable as a base model for fine-tuning to better handle specific scenarios, while also offering outstanding inference performance. ERNIE-Speed-Pro-128K is the initial version released on August 30, 2024, supports 128K context length, and outperforms ERNIE-Speed-128K.
ERNIE-Tiny-8K
8k
1k
Not supported
Conversation
Baidu_ernie
Baidu’s self-developed ultra-high-performance large language model with the lowest deployment and fine-tuning costs within the Wenxin series.
Doubao-1.5-lite-32k
32k
12k
Supported
Conversation
Doubao_doubao
Doubao1.5-lite ranks among the world’s top lightweight language models. On comprehensive (MMLU_pro), reasoning (BBH), mathematics (MATH), and professional knowledge (GPQA) benchmarks it matches or surpasses GPT-4o mini and Claude 3.5 Haiku.
Doubao-1.5-pro-256k
256k
12k
Supported
Conversation
Doubao_doubao
Doubao-1.5-Pro-256k is a comprehensive upgrade based on Doubao-1.5-Pro. Compared to Doubao-pro-256k/241115, overall performance is significantly improved by 10%. Output length is greatly increased, supporting up to 12k tokens.
Doubao-1.5-pro-32k
32k
12k
Supported
Conversation
Doubao_doubao
Doubao-1.5-pro is a new generation flagship model with comprehensive performance upgrades, excelling in knowledge, coding, reasoning, and more. It achieves world-leading levels on multiple public benchmarks, especially excelling in knowledge, coding, reasoning, and Chinese authoritative benchmarks, with overall scores surpassing industry-leading models such as GPT4o and Claude 3.5 Sonnet.
Doubao-1.5-vision-pro
32k
12k
Not supported
Conversation, Image Understanding
Doubao_doubao
Doubao-1.5-vision-pro is a newly upgraded multimodal large model that supports arbitrary resolution and extreme aspect-ratio image recognition, enhancing visual reasoning, document recognition, fine-detail understanding, and instruction following.
Doubao-embedding
4k
-
Supported
Embedding
Doubao_doubao
Doubao-embedding is a semantic vectorization model developed by ByteDance, mainly for vector retrieval scenarios, supporting Chinese and English with a maximum 4K context length. Currently provides the following versions: text-240715: maximum vector dimension 2560, supports 512, 1024, 2048 dimensionality reduction. Chinese-English retrieval performance is significantly improved compared to text-240515; recommended. text-240515: maximum vector dimension 2048, supports 512, 1024 dimensionality reduction.
Doubao-embedding-large
4k
-
Not supported
Embedding
Doubao_doubao
Chinese-English retrieval performance is significantly improved compared to Doubao-embedding/text-240715 version
Doubao-embedding-vision
8k
-
Not supported
Embedding
Doubao_doubao
Doubao-embedding-vision is a newly upgraded image-text multimodal vectorization model mainly aimed at image-text multimodal vector retrieval scenarios. It supports image input and Chinese/English text input with a maximum 8K context length.
Doubao-lite-128k
128k
4k
Supported
Conversation
Doubao_doubao
Doubao-lite offers extreme response speed and better cost-effectiveness, providing customers with more flexible options for different scenarios. Supports 128k context window for inference and fine-tuning.
Doubao-lite-32k
32k
4k
Supported
Conversation
Doubao_doubao
Doubao-lite offers extreme response speed and better cost-effectiveness, providing customers with more flexible options for different scenarios. Supports 32k context window for inference and fine-tuning.
Doubao-lite-4k
4k
4k
Supported
Conversation
Doubao_doubao
Doubao-lite offers extreme response speed and better cost-effectiveness, providing customers with more flexible options for different scenarios. Supports 4k context window for inference and fine-tuning.
Doubao-pro-128k
128k
4k
Supported
Conversation
Doubao_doubao
The flagship model with the best performance, suitable for handling complex tasks and effective in reference Q&A, summarization, creative generation, text classification, role-playing, and other scenarios. Supports 128k context window for inference and fine-tuning.
Doubao-pro-32k
32k
4k
Supported
Conversation
Doubao_doubao
The flagship model with the best performance, suitable for handling complex tasks and effective in reference Q&A, summarization, creative generation, text classification, role-playing, and other scenarios. Supports 32k context window for inference and fine-tuning.
Doubao-pro-4k
4k
4k
Supported
Conversation
Doubao_doubao
The flagship model with the best performance, suitable for handling complex tasks and effective in reference Q&A, summarization, creative generation, text classification, role-playing, and other scenarios. Supports 4k context window for inference and fine-tuning.
step-1-128k
128k
-
Supported
Conversation
Leap Star (Step One)
The step-1-128k model is a very large language model capable of handling up to 128,000 tokens of input. This capability gives it significant advantages in generating long-form content and performing complex reasoning, making it suitable for applications such as novel and script writing that require rich context.
step-1-256k
256k
-
Supported
Conversation
Leap Star (Step One)
The step-1-256k model is one of the largest language models currently available, supporting 256,000 tokens of input. It is designed to meet extreme complex task requirements such as large-scale data analysis and multi-turn dialogue systems, and can provide high-quality outputs across various domains.
step-1-32k
32k
-
Supported
Conversation
Leap Star (Step One)
The step-1-32k model expands the context window to support 32,000 tokens of input. This makes it excel at handling long articles and complex dialogues, suitable for tasks that require deep understanding and analysis such as legal documents and academic research.
step-1-8k
8k
-
Supported
Conversation
Leap Star (Step One)
The step-1-8k model is an efficient language model designed for shorter text processing. It can reason within an 8,000-token context, suitable for applications requiring quick responses such as chatbots and real-time translation.
step-1-flash
8k
-
Supported
Conversation
Leap Star (Step One)
The step-1-flash model focuses on fast response and efficient processing, suitable for real-time applications. Its design enables high-quality language understanding and generation under limited compute resources, suitable for mobile devices and edge computing.
step-1.5v-mini
32k
-
Supported
Conversation, Image Understanding
Leap Star (Step One)
The step-1.5v-mini model is a lightweight version designed to run in resource-constrained environments. Despite its small size, it retains good language processing capabilities and is suitable for embedded systems and low-power devices.
step-1v-32k
32k
-
Supported
Conversation, Image Understanding
Leap Star (Step One)
The step-1v-32k model supports 32,000 tokens of input, suitable for applications that require longer context. It performs well on complex dialogues and long-form text, making it suitable for customer service and content creation.
step-1v-8k
8k
-
Supported
Conversation, Image Understanding
Leap Star (Step One)
The step-1v-8k model is an optimized version designed for 8,000-token inputs, suitable for fast generation and short-text processing. It achieves a good balance between speed and accuracy, ideal for real-time applications.
step-2-16k
16k
-
Supported
Conversation
Leap Star (Step One)
The step-2-16k model is a medium-sized language model supporting 16,000-token input. It performs well across a variety of tasks, suitable for education, training, and knowledge management scenarios.
yi-lightning
16k
-
Supported
Conversation
Yi One Wanwu_yi
The latest high-performance model that ensures high-quality outputs while significantly improving inference speed. Suitable for real-time interaction and high-complexity reasoning scenarios. Its excellent cost-performance ratio can provide strong product support for commercial products.
yi-vision-v2
16K
-
Supported
Conversation, Image Understanding
Yi One Wanwu_yi
Suitable for scenarios that require analysis and interpretation of images and charts, such as image Q&A, chart understanding, OCR, visual reasoning, education, research report comprehension, or multilingual document reading.
qwen-14b-chat
8k
2k
Supported
Conversation
Qianwen_qwen
Alibaba Cloud’s official Tongyi Qianwen open-source edition.
qwen-72b-chat
32k
2k
Supported
Conversation
Qianwen_qwen
Alibaba Cloud’s official Tongyi Qianwen open-source edition.
qwen-7b-chat
7.5k
1.5k
Supported
Conversation
Qianwen_qwen
Alibaba Cloud’s official Tongyi Qianwen open-source edition.
qwen-coder-plus
128k
8k
Supported
Conversation, Code
Qianwen_qwen
Qwen-Coder-Plus is a programming-specialized model in the Qwen series designed to improve code generation and understanding. Trained on large-scale programming data, it can handle multiple programming languages and supports code completion, error detection, and code refactoring. Its goal is to provide developers more efficient programming assistance and improve development productivity.
qwen-coder-plus-latest
128k
8k
Supported
Conversation, Code
Qianwen_qwen
Qwen-Coder-Plus-Latest is the latest version of Qwen-Coder-Plus, containing the newest algorithm optimizations and dataset updates. The model has significant performance improvements, better understanding of context, and generates code that better meets developers’ needs. It also introduces support for more programming languages, enhancing multilingual programming capabilities.
qwen-coder-turbo
128k
8k
Supported
Conversation, Code
Qianwen_qwen
The Qwen coder and programming models are specialized for programming and code generation, with fast inference and low cost. This version always points to the latest stable snapshot.
qwen-coder-turbo-latest
128k
8k
Supported
Conversation, Code
Qianwen_qwen
The Qwen coder and programming models are specialized for programming and code generation, with fast inference and low cost. This version always points to the latest snapshot.
qwen-long
10m
6k
Supported
Conversation
Qianwen_qwen
Qwen-Long is Qwen’s large model for ultra-long context scenarios, supporting Chinese, English, and other languages, with up to 10 million tokens (about 15 million characters or 15,000 pages) of ultra-long context dialogue. Together with its document service launched simultaneously, it supports parsing and dialogue for various document formats such as Word, PDF, Markdown, EPUB, and MOBI. Note: direct HTTP requests support up to 1M tokens; for lengths beyond this it is recommended to submit via files.
qwen-math-plus
4k
3k
Supported
Conversation
Qianwen_qwen
Qwen-Math-Plus is a model focused on solving math problems, intended to provide efficient mathematical reasoning and computational capabilities. Trained on large math corpora, it can handle complex mathematical expressions and problems, supporting a range of computations from basic arithmetic to advanced mathematics. Use cases include education, research, and engineering.
qwen-math-plus-latest
4k
3k
Supported
Conversation
Qianwen_qwen
Qwen-Math-Plus-Latest is the latest version of Qwen-Math-Plus, integrating the newest math reasoning techniques and algorithmic improvements. The model performs better on complex math problems and can provide more accurate solutions and reasoning processes. It also expands understanding of mathematical symbols and formulas, suitable for broader math applications.
qwen-math-turbo
4k
3k
Supported
Conversation
Qianwen_qwen
Qwen-Math-Turbo is a high-performance math model designed for fast computation and real-time reasoning. The model optimizes computation speed and can process large volumes of math problems in very short timeframes, suitable for applications demanding quick feedback such as online education and real-time data analysis. Its efficient algorithms allow users to get immediate results for complex calculations.
qwen-math-turbo-latest
4k
3k
Supported
Conversation
Qianwen_qwen
Qwen-Math-Turbo-Latest is the latest version of Qwen-Math-Turbo, further improving computation efficiency and accuracy. The model includes multiple algorithmic optimizations to handle more complex math problems while remaining efficient in real-time reasoning. It is suitable for math applications requiring fast responses, such as financial analysis and scientific computation.
qwen-max
32k
8k
Supported
Conversation
Qianwen_qwen
The Qwen 2.5 series is a hundred-billion-parameter ultra-large-scale language model supporting Chinese, English, and other languages. As the model upgrades, qwen-max will receive rolling updates.
qwen-max-latest
32k
8k
Supported
Conversation
Qianwen_qwen
The best-performing model in the Qwen series. This model is dynamically updated and model updates are not announced in advance. It is suitable for complex, multi-step tasks. The model’s Chinese and English overall capabilities are significantly improved, human preference alignment is significantly enhanced, reasoning and complex instruction understanding are greatly strengthened, performance on difficult tasks is improved, and math and coding capabilities are significantly enhanced. It also improves understanding and generation of structured data such as tables and JSON.
qwen-plus
128k
8k
Supported
Conversation
Qianwen_qwen
A balanced-capability model in the Qwen series, with reasoning performance and speed between Qwen-Max and Qwen-Turbo, suitable for moderately complex tasks. The model’s Chinese and English overall capabilities are significantly improved, human preference alignment is significantly enhanced, reasoning and complex instruction understanding are greatly strengthened, performance on difficult tasks is improved, and math and coding capabilities are significantly enhanced.
qwen-plus-latest
128k
8k
Supported
Conversation
Qianwen_qwen
Qwen-Plus is an enhanced vision-language model in the Qwen series, designed to improve fine-detail recognition and text recognition. The model supports images at over one million pixels resolution and arbitrary aspect ratios, performing well across various vision-language tasks and suitable for applications requiring high-precision image understanding.
qwen-turbo
128k
8k
Supported
Conversation
Qianwen_qwen
The fastest and most cost-effective model in the Qwen series, suitable for simple tasks. The model’s Chinese and English overall capabilities are significantly improved, human preference alignment is significantly enhanced, reasoning and complex instruction understanding are greatly strengthened, performance on difficult tasks is improved, and math and coding capabilities are significantly enhanced.
qwen-turbo-latest
1m
8k
Supported
Conversation
Qianwen_qwen
Qwen-Turbo is an efficient model designed for simple tasks, emphasizing speed and cost-effectiveness. It performs well on basic vision-language tasks and is suitable for applications with strict response-time requirements such as real-time image recognition and simple Q&A systems.
qwen-vl-max
32k
2k
Supported
Conversation
Qianwen_qwen
Qwen-VL-Max (qwen-vl-max) is the ultra-large-scale vision-language model of the Qwen family. Compared to the enhanced version, it further improves visual reasoning and instruction-following capabilities, offering higher visual perception and cognition levels and delivering optimal performance on more complex tasks.
qwen-vl-max-latest
32k
2k
Supported
Conversation, Image Understanding
Qianwen_qwen
Qwen-VL-Max is the top-tier version in the Qwen-VL series, designed to solve complex multimodal tasks. It combines advanced visual and language processing technologies, can understand and analyze high-resolution images, has extremely strong reasoning ability, and is suitable for applications requiring deep understanding and complex reasoning.
qwen-vl-ocr
34k
4k
Supported
Conversation, Image Understanding
Qianwen_qwen
Only supports OCR, does not support conversation.
qwen-vl-ocr-latest
34k
4k
Supported
Conversation, Image Understanding
Qianwen_qwen
Only supports OCR, does not support conversation.
qwen-vl-plus
8k
2k
Supported
Conversation, Image Understanding
Qianwen_qwen
Qwen-VL-Plus (qwen-vl-plus) is an enhanced version of the Qwen large-scale vision-language model. It greatly improves fine-detail recognition and text recognition, supports images at over one million pixels resolution and arbitrary aspect ratios, and delivers excellent performance across a wide range of vision tasks.
qwen-vl-plus-latest
32k
2k
Supported
Conversation, Image Understanding
Qianwen_qwen
Qwen-VL-Plus-Latest is the latest version of Qwen-VL-Plus, enhancing the model’s multimodal understanding capabilities. It excels at combined processing of images and text and is suitable for applications that need to efficiently handle multiple input formats, such as intelligent customer service and content generation.
Qwen/Qwen2-1.5B-Instruct
32k
6k
Not supported
Conversation
Qianwen_qwen
Qwen2-1.5B-Instruct is an instruction-tuned LLM in the Qwen2 series with 1.5B parameters. Based on the Transformer architecture, it uses SwiGLU activation, attention QKV bias, and grouped-query attention techniques. It performs well on language understanding, generation, multilingual capabilities, coding, math, and reasoning benchmarks, surpassing most open-source models.
Qwen/Qwen2-72B-Instruct
128k
6k
Not supported
Conversation
Qianwen_qwen
Qwen2-72B-Instruct is an instruction-tuned LLM in the Qwen2 series with 72B parameters. Based on the Transformer architecture, it uses SwiGLU activation, attention QKV bias, and grouped-query attention techniques. It can handle large-scale inputs and performs strongly on language understanding, generation, multilingual capabilities, coding, math, and reasoning benchmarks, surpassing most open-source models.
Qwen/Qwen2-7B-Instruct
128k
6k
Not supported
Conversation
Qianwen_qwen
Qwen2-7B-Instruct is an instruction-tuned LLM in the Qwen2 series with 7B parameters. Based on the Transformer architecture, it uses SwiGLU activation, attention QKV bias, and grouped-query attention techniques. It can handle large-scale inputs and performs strongly on language understanding, generation, multilingual capabilities, coding, math, and reasoning benchmarks, surpassing most open-source models.
Qwen/Qwen2-VL-72B-Instruct
32k
2k
Not supported
Conversation
Qianwen_qwen
Qwen2-VL is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance on visual understanding benchmarks including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can understand videos longer than 20 minutes for high-quality video-based Q&A, dialogue, and content creation. It also has complex reasoning and decision-making abilities and can be integrated with mobile devices and robots to perform autonomous operations based on visual environments and text instructions.
Qwen/Qwen2-VL-7B-Instruct
32k
-
Not supported
Conversation
Qianwen_qwen
Qwen2-VL-7B-Instruct is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance on visual understanding benchmarks including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can be used for high-quality video-based Q&A, dialogue, and content creation, and also has complex reasoning and decision-making capabilities, allowing integration with mobile devices and robots to operate autonomously based on visual environments and text instructions.
Qwen/Qwen2.5-72B-Instruct
128k
8k
Not supported
Conversation
Qianwen_qwen
Qwen2.5-72B-Instruct is one of Alibaba Cloud’s latest LLM series. This 72B model has significant improvements in coding and mathematics. It supports inputs up to 128K tokens and can generate long texts exceeding 8K tokens.
Qwen/Qwen2.5-72B-Instruct-128K
128k
8k
Not supported
Conversation
Qianwen_qwen
Qwen2.5-72B-Instruct is one of Alibaba Cloud’s latest LLM series. This 72B model has significant improvements in coding and mathematics. It supports inputs up to 128K tokens and can generate long texts exceeding 8K tokens.
Qwen/Qwen2.5-7B-Instruct
128k
8k
Not supported
Conversation
Qianwen_qwen
Qwen2.5-7B-Instruct is one of Alibaba Cloud’s latest LLM series. This 7B model has significant improvements in coding and mathematics. The model also provides multilingual support covering over 29 languages including Chinese and English. It shows significant improvements in instruction following, understanding structured data, and generating structured outputs (especially JSON).
Qwen/Qwen2.5-Coder-32B-Instruct
128k
8k
Not supported
Conversation, Code
Qianwen_qwen
Qwen2.5-32B-Instruct is one of Alibaba Cloud’s latest LLM series. This 32B model has significant improvements in coding and mathematics. The model also provides multilingual support covering over 29 languages including Chinese and English. It shows significant improvements in instruction following, understanding structured data, and generating structured outputs (especially JSON).
Qwen/Qwen2.5-Coder-7B-Instruct
128k
8k
Not supported
Conversation
Qianwen_qwen
Qwen2.5-7B-Instruct is one of Alibaba Cloud’s latest LLM series. This 7B model has significant improvements in coding and mathematics. The model also provides multilingual support covering over 29 languages including Chinese and English. It shows significant improvements in instruction following, understanding structured data, and generating structured outputs (especially JSON).
Qwen/QwQ-32B-Preview
32k
16k
Not supported
Conversation, Reasoning
Qianwen_qwen
QwQ-32B-Preview is an experimental research model developed by the Qwen team to enhance AI reasoning capabilities. As a preview version, it demonstrates strong analytical ability but has important limitations: 1. Language mixing and code switching: the model may mix languages or switch between languages unexpectedly, affecting response clarity. 2. Recursive reasoning loops: the model may enter looped reasoning patterns, producing verbose answers without clear conclusions. 3. Safety and ethical considerations: the model needs strengthened safety measures to ensure reliable and safe performance; users should be cautious when using it. 4. Performance and benchmark limitations: the model performs well in math and programming but still has room for improvement in common-sense reasoning and nuanced language understanding.
qwen1.5-110b-chat
32k
8k
Not supported
Conversation
Qianwen_qwen
-
qwen1.5-14b-chat
8k
2k
Not supported
Conversation
Qianwen_qwen
-
qwen1.5-32b-chat
32k
2k
Not supported
Conversation
Qianwen_qwen
-
qwen1.5-72b-chat
32k
2k
Not supported
Conversation
Qianwen_qwen
-
qwen1.5-7b-chat
8k
2k
Not supported
Conversation
Qianwen_qwen
-
qwen2-57b-a14b-instruct
65k
6k
Not supported
Conversation
Qianwen_qwen
-
Qwen2-72B-Instruct
-
-
Not supported
Conversation
Qianwen_qwen
-
qwen2-7b-instruct
128k
6k
Not supported
Conversation
Qianwen_qwen
-
qwen2-math-72b-instruct
4k
3k
Not supported
Conversation
Qianwen_qwen
-
qwen2-math-7b-instruct
4k
3k
Not supported
Conversation
Qianwen_qwen
-
qwen2.5-14b-instruct
128k
8k
Not supported
Conversation
Qianwen_qwen
-
qwen2.5-32b-instruct
128k
8k
Not supported
Conversation
Qianwen_qwen
-
qwen2.5-72b-instruct
128k
8k
Not supported
Conversation
Qianwen_qwen
-
qwen2.5-7b-instruct
128k
8k
Not supported
Conversation
Qianwen_qwen
-
qwen2.5-coder-14b-instruct
128k
8k
Not supported
Conversation, Code
Qianwen_qwen
-
qwen2.5-coder-32b-instruct
128k
8k
Not supported
Conversation, Code
Qianwen_qwen
-
qwen2.5-coder-7b-instruct
128k
8k
Not supported
Conversation, Code
Qianwen_qwen
-
qwen2.5-math-72b-instruct
4k
3k
Not supported
Conversation
Qianwen_qwen
-
qwen2.5-math-7b-instruct
4k
3k
Not supported
Conversation
Qianwen_qwen
-
deepseek-ai/DeepSeek-R1
64k
-
Not supported
Conversation, Reasoning
DeepSeek_deepseek
The DeepSeek-R1 model is an open-source inference model based purely on reinforcement learning. It performs excellently on tasks in mathematics, coding, and natural language reasoning, with performance comparable to OpenAI’s o1 model and outstanding results on multiple benchmarks.
deepseek-ai/DeepSeek-V2-Chat
128k
-
Not supported
Conversation
DeepSeek_deepseek
DeepSeek-V2 is a powerful, cost-effective mixture-of-experts (MoE) language model. It was pretrained on a high-quality corpus of 8.1 trillion tokens and further improved through supervised fine-tuning (SFT) and reinforcement learning (RL). Compared with DeepSeek 67B, DeepSeek-V2 delivers stronger performance while saving 42.5% of training cost, reducing KV cache by 93.3%, and increasing maximum generation throughput by 5.76×.
deepseek-ai/DeepSeek-V2.5
32k
-
Supported
Conversation
DeepSeek_deepseek
DeepSeek-V2.5 is an upgraded version combining DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating the general and coding capabilities of the two previous versions. The model is optimized in multiple aspects, including writing and instruction-following ability, better aligning with human preferences.
deepseek-ai/DeepSeek-V3
128k
4k
Not supported
Conversation
DeepSeek_deepseek
DeepSeek open-source version, with a longer context window compared to the official version and without issues like refusing to answer due to sensitive words.
deepseek-chat
64k
8k
Supported
Conversation
DeepSeek_deepseek
236B parameters, 64K context (API); ranks first among open-source models in Chinese comprehensive ability (AlignBench); is in the same tier as closed-source models such as GPT-4-Turbo and Wenxin 4.0 in evaluations.
deepseek-coder
64k
8k
Supported
Conversation, Code
DeepSeek_deepseek
236B parameters, 64K context (API); ranks first among open-source models in Chinese comprehensive ability (AlignBench); is in the same tier as closed-source models such as GPT-4-Turbo and Wenxin 4.0 in evaluations.
deepseek-reasoner
64k
8k
Supported
Conversation, Reasoning
DeepSeek_deepseek
DeepSeek-Reasoner (DeepSeek-R1) is DeepSeek's latest reasoning model, designed to improve reasoning ability via reinforcement learning training. The model's reasoning process includes extensive reflection and verification and can handle complex logical reasoning tasks with chain-of-thought lengths reaching tens of thousands of characters. DeepSeek-R1 performs excellently on math, code, and other complex problems, has been widely applied across scenarios, and demonstrates strong reasoning ability and flexibility. Compared to other models, DeepSeek-R1 approaches top closed-source models in reasoning performance, showcasing the potential and competitiveness of open-source models in reasoning.
hunyuan-code
4k
4k
Not supported
Conversation, Code
Tencent_hunyuan
Hunyuan's latest code generation model, further trained on a base model with 200B high-quality code data, iteratively trained for half a year on high-quality SFT data, context window increased to 8K, ranks among the top in automatic evaluation metrics for code generation across five major languages; in human high-quality evaluations across ten metrics for code tasks in five languages, performance is in the leading tier.
hunyuan-functioncall
28k
4k
Supported
Conversation
Tencent_hunyuan
Hunyuan's latest MOE-architecture FunctionCall model, trained on high-quality FunctionCall data, with a context window up to 32K, leading in multiple evaluation metric dimensions.
hunyuan-large
28k
4k
Not supported
Conversation
Tencent_hunyuan
The Hunyuan-large model has approximately 389B total parameters and about 52B active parameters; it is the largest-scale and best-performing open-source MoE Transformer architecture model currently in the industry.
hunyuan-large-longcontext
128k
6k
Not supported
Conversation
Tencent_hunyuan
Good at handling long-text tasks such as document summarization and document question answering, and also capable of general text generation tasks. It excels at analyzing and generating long texts and effectively addresses complex and detailed long-form content processing needs.
hunyuan-lite
250k
6k
Not supported
Conversation
Tencent_hunyuan
Upgraded to an MOE structure with a 256k context window, leading many open-source models on multiple benchmarks in NLP, code, math, and industry-specific evaluations.
hunyuan-pro
28k
4k
Supported
Conversation
Tencent_hunyuan
Trillion-parameter-scale MOE-32K long-context model. Achieves absolute leading levels across various benchmarks; supports complex instructions and reasoning, possesses advanced mathematical capability, supports functioncall, and is specially optimized for multilingual translation and application domains such as finance, law, and healthcare.
hunyuan-role
28k
4k
Not supported
Conversation
Tencent_hunyuan
Hunyuan's latest role-playing model: an officially fine-tuned role-play model from Hunyuan, further trained on role-play scenario datasets based on the Hunyuan model, offering better foundational performance in role-play scenarios.
hunyuan-standard
30k
2k
Not supported
Conversation
Tencent_hunyuan
Uses an improved routing strategy while mitigating load balancing and expert collapse issues. MOE-32K offers better cost-effectiveness; while balancing performance and price, it enables processing of long-text inputs.
hunyuan-standard-256K
250k
6k
Not supported
Conversation
Tencent_hunyuan
Uses an improved routing strategy while mitigating load balancing and expert collapse issues. For long texts, the 'needle-in-a-haystack' metric reaches 99.9%. MOE-256K further breaks through in length and performance, greatly expanding the allowable input length.
hunyuan-translation-lite
4k
4k
Not supported
Conversation
Tencent_hunyuan
Hunyuan translation model supports natural-language conversational translation; supports mutual translation among 15 languages including Chinese and English, Japanese, French, Portuguese, Spanish, Turkish, Russian, Arabic, Korean, Italian, German, Vietnamese, Malay, and Indonesian.
hunyuan-turbo
28k
4k
Supported
Conversation
Tencent_hunyuan
Hunyuan-turbo is the default version of the Hunyuan model, adopting a new mixture-of-experts (MoE) architecture; compared with hunyuan-pro, it has faster inference efficiency and stronger performance.
hunyuan-turbo-latest
28k
4k
Supported
Conversation
Tencent_hunyuan
Hunyuan-turbo dynamic updated version, the best-performing version in the Hunyuan model series, consistent with the consumer-end (Tencent Yuanbao) version.
hunyuan-turbo-vision
8k
2k
Supported
Image recognition, dialogue
Tencent_hunyuan
Hunyuan's next-generation flagship vision-language large model, adopting a new mixture-of-experts (MoE) architecture, comprehensively improving on capabilities related to image-text understanding such as basic recognition, content creation, knowledge Q&A, and analytic reasoning compared to the previous generation. Maximum input 6K, maximum output 2K.
hunyuan-vision
8k
2k
Supported
Conversation, Image Understanding
Tencent_hunyuan
Hunyuan's latest multimodal model, supports image + text input to generate text content. Image basic recognition: identifies main objects, elements, scenes, etc., in images. Image content creation: summarizes images, creates ad copy, social media posts, poetry, etc. Multi-turn image dialogue: enables multi-turn interactive Q&A about a single image. Image analysis and reasoning: performs statistical analysis on logic relations, math problems, code, and charts found in images. Image knowledge Q&A: answers questions about knowledge points contained in images, such as historical events or movie posters. Image OCR: recognizes text in natural and non-natural scene images.
SparkDesk-Lite
4k
-
Not supported
Conversation
SparkDesk_SparkDesk
Supports online web search functionality, responds quickly and conveniently, suitable for low-compute inference and model fine-tuning and other customized scenarios.
SparkDesk-Max
128k
-
Supported
Conversation
SparkDesk_SparkDesk
Quantized from the latest Spark large model engine 4.0 Turbo; supports web search, weather, date and other built-in plugins; core capabilities comprehensively upgraded, with improved performance across scenarios; supports System role personas and FunctionCall function invocation.
SparkDesk-Max-32k
32k
-
Supported
Conversation
SparkDesk_SparkDesk
Stronger inference: stronger context understanding and logical reasoning; longer input: supports 32K tokens of text input, suitable for long-document reading, private knowledge question answering, and similar scenarios.
SparkDesk-Pro
128k
-
Not supported
Conversation
SparkDesk_SparkDesk
Specifically optimized for math, code, healthcare, education and other scenarios; supports web search, weather, date and other built-in plugins; covers most knowledge Q&A, language understanding, and text-creation scenarios.
SparkDesk-Pro-128K
128k
-
Not supported
Conversation
SparkDesk_SparkDesk
Professional-grade large language model with tens of billions of parameters, specially optimized for medical, educational, and coding scenarios, and lower latency in search scenarios. Suitable for business scenarios that require higher performance and response speed for text and intelligent Q&A.
moonshot-v1-128k
128k
4k
Supported
Conversation
Moon's Dark Side_moonshot
Model with length 8k, suitable for generating short texts.
moonshot-v1-32k
32k
4k
Supported
Conversation
Moon's Dark Side_moonshot
Model with length 32k, suitable for generating long texts.
moonshot-v1-8k
8k
4k
Supported
Conversation
Moon's Dark Side_moonshot
Model with length 128k, suitable for generating ultra-long texts.
codegeex-4
128k
4k
Not supported
Conversation, Code
Zhipu_codegeex
Zhipu's code model: suitable for code auto-completion tasks.
charglm-3
4k
2k
Not supported
Conversation
Zhipu_glm
Personification model
emohaa
8k
4k
Not supported
Conversation
Zhipu_glm
Psychological model: has professional counseling ability, helps users understand emotions and cope with emotional issues.
glm-3-turbo
128k
4k
Not supported
Conversation
Zhipu_glm
Deprecation scheduled (June 30, 2025)
glm-4
128k
4k
Supported
Conversation
Zhipu_glm
Legacy flagship: released January 16, 2024, now superseded by GLM-4-0520
glm-4-0520
128k
4k
Supported
Conversation
Zhipu_glm
High-intelligence model: suitable for handling highly complex and diverse tasks
glm-4-air
128k
4k
Supported
Conversation
Zhipu_glm
High cost-effectiveness: the model with the best balance between inference capability and price
glm-4-airx
8k
4k
Supported
Conversation
Zhipu_glm
Ultra-fast inference: extremely fast inference speed with strong reasoning performance
glm-4-flash
128k
4k
Supported
Conversation
Zhipu_glm
High-speed low-cost: ultra-fast inference speed
glm-4-flashx
128k
4k
Supported
Conversation
Zhipu_glm
High-speed low-cost: Flash enhanced version, ultra-fast inference speed
glm-4-long
1m
4k
Supported
Conversation
Zhipu_glm
Ultra-long input: designed specifically for handling ultra-long text and memory-style tasks
glm-4-plus
128k
4k
Supported
Conversation
Zhipu_glm
High-intelligence flagship: overall performance greatly improved, with significantly enhanced long-text and complex-task capabilities
glm-4v
2k
-
Not supported
Conversation, Image Understanding
Zhipu_glm
Image understanding: possesses image understanding and reasoning capabilities
glm-4v-flash
2k
1k
Not supported
Conversation, Image Understanding
Zhipu_glm
Free model: possesses powerful image understanding capability
Last updated
Was this helpful?