Model Data

  • The following information is for reference only; if there are errors you can contact us to correct them. For some models, different providers may have different context lengths and model details;

  • When inputting data on the client side, you need to convert “k” to the actual number (theoretically 1k = 1024 tokens; 1m = 1024k tokens), for example 8k = 8 × 1024 = 8192 tokens. It is recommended to use ×1000 in practice to avoid errors, e.g., 8k = 8 × 1000 = 8000, 1m = 1 × 1000000 = 1000000;

  • A maximum output shown as “-” indicates that no official explicit maximum output information for that model was found.

Model Name
Maximum Input
Maximum Output
Function Calls
Model Capability
Provider
Description

360gpt-pro

8k

-

Not supported

Conversation

360AI_360gpt

The flagship hundred-billion-parameter model of the 360 Zhinao series with the best performance, widely applicable to complex task scenarios across various domains.

360gpt-turbo

7k

-

Not supported

Conversation

360AI_360gpt

A ten-billion-parameter model that balances performance and efficiency, suitable for scenarios with higher performance/cost requirements.

360gpt-turbo-responsibility-8k

8k

-

Not supported

Conversation

360AI_360gpt

A ten-billion-parameter model that balances performance and efficiency, suitable for scenarios with higher performance/cost requirements.

360gpt2-pro

8k

-

Not supported

Conversation

360AI_360gpt

The flagship hundred-billion-parameter model of the 360 Zhinao series with the best performance, widely applicable to complex task scenarios across various domains.

claude-3-5-sonnet-20240620

200k

16k

Not supported

Conversation, Image Understanding

Anthropic_claude

Snapshot version released on June 20, 2024. Claude 3.5 Sonnet is a model that balances performance and speed, delivering top-tier performance while maintaining high speed, and supports multimodal input.

claude-3-5-haiku-20241022

200k

16k

Not supported

Conversation

Anthropic_claude

Snapshot version released on October 22, 2024. Claude 3.5 Haiku has improved skills across the board, including coding, tool use, and reasoning. As the fastest model in the Anthropic series, it offers quick response times, suitable for highly interactive, low-latency applications such as user-facing chatbots and real-time code completion. It also performs well on specialized tasks like data extraction and real-time content moderation. It does not support image input.

claude-3-5-sonnet-20241022

200k

8K

Not supported

Conversation, Image Understanding

Anthropic_claude

Snapshot version released on October 22, 2024. Claude 3.5 Sonnet provides capabilities beyond Opus and faster speeds than Sonnet, while maintaining the same price as Sonnet. Sonnet is particularly strong in programming, data science, visual processing, and agent tasks.

claude-3-5-sonnet-latest

200K

8k

Not supported

Conversation, Image Understanding

Anthropic_claude

Dynamically points to the latest Claude 3.5 Sonnet version. Claude 3.5 Sonnet provides capabilities beyond Opus and faster speeds than Sonnet while keeping the same price. Sonnet excels at programming, data science, visual processing, and agent tasks; this model points to the latest version.

claude-3-haiku-20240307

200k

4k

Not supported

Conversation, Image Understanding

Anthropic_claude

Claude 3 Haiku is Anthropic’s fastest and most compact model, designed for near-instant responses. It has fast and accurate targeted performance.

claude-3-opus-20240229

200k

4k

Not supported

Conversation, Image Understanding

Anthropic_claude

Claude 3 Opus is Anthropic’s most powerful model for handling highly complex tasks. It excels in performance, intelligence, fluency, and comprehension.

claude-3-sonnet-20240229

200k

8k

Not supported

Conversation, Image Understanding

Anthropic_claude

Snapshot version released on February 29, 2024. Sonnet is particularly good at: - Coding: can write, edit, and run code autonomously and has reasoning and debugging capabilities - Data science: augments human data science expertise; can handle unstructured data when using various tools to obtain insights - Visual processing: adept at interpreting charts, graphics, and images, accurately transcribing text to obtain insights beyond the text itself - Agent tasks: excellent at tool use, very suitable for handling agent tasks (complex multi-step problem-solving tasks that require interacting with other systems)

google/gemma-2-27b-it

8k

-

Not supported

Conversation

Google_gamma

Gemma is a lightweight, state-of-the-art open model series developed by Google, built with the same research and technology as the Gemini models. These models are decoder-only large language models that support English and provide open weights for both pretraining and instruction-tuned variants. Gemma models are suitable for various text generation tasks, including Q&A, summarization, and reasoning.

google/gemma-2-9b-it

8k

-

Not supported

Conversation

Google_gamma

Gemma is one of Google’s lightweight, state-of-the-art open model series. It is a decoder-only large language model that supports English and provides open weights, pretraining variants, and instruction-tuned variants. Gemma models are suitable for a range of text generation tasks, including Q&A, summarization, and reasoning. The 9B model was trained on 8 trillion tokens.

gemini-1.5-pro

2m

8k

Not supported

Conversation

Google_gemini

The latest stable release of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. Particularly suitable for tasks requiring complex reasoning.

gemini-1.0-pro-001

33k

8k

Not supported

Conversation

Google_gemini

This is the stable version of Gemini 1.0 Pro. As an NLP model, it is specialized for multi-turn text and code chat as well as code generation. This model will be deprecated on February 15, 2025; migration to the 1.5 series is recommended.

gemini-1.0-pro-002

32k

8k

Not supported

Conversation

Google_gemini

This is the stable version of Gemini 1.0 Pro. As an NLP model, it is specialized for multi-turn text and code chat as well as code generation. This model will be deprecated on February 15, 2025; migration to the 1.5 series is recommended.

gemini-1.0-pro-latest

33k

8k

Not supported

Conversation, Deprecated or Soon-to-be Deprecated

Google_gemini

This is the latest version of Gemini 1.0 Pro. As an NLP model, it is specialized for multi-turn text and code chat as well as code generation. This model will be deprecated on February 15, 2025; migration to the 1.5 series is recommended.

gemini-1.0-pro-vision-001

16k

2k

Not supported

Conversation

Google_gemini

This is the vision version of Gemini 1.0 Pro. This model will be deprecated on February 15, 2025; migration to the 1.5 series is recommended.

gemini-1.0-pro-vision-latest

16k

2k

Not supported

Image Understanding

Google_gemini

This is the latest vision version of Gemini 1.0 Pro. This model will be deprecated on February 15, 2025; migration to the 1.5 series is recommended.

gemini-1.5-flash

1m

8k

Not supported

Conversation, Image Understanding

Google_gemini

This is the latest stable release of Gemini 1.5 Flash. As a balanced multimodal model, it can handle audio, images, video, and text inputs.

gemini-1.5-flash-001

1m

8k

Not supported

Conversation, Image Understanding

Google_gemini

This is the stable version of Gemini 1.5 Flash. They provide the same core functionality as gemini-1.5-flash but are version-locked, suitable for production use.

gemini-1.5-flash-002

1m

8k

Not supported

Conversation, Image Understanding

Google_gemini

This is the stable version of Gemini 1.5 Flash. They provide the same core functionality as gemini-1.5-flash but are version-locked, suitable for production use.

gemini-1.5-flash-8b

1m

8k

Not supported

Conversation, Image Understanding

Google_gemini

Gemini 1.5 Flash-8B is Google’s newly released multimodal AI model designed for efficient handling of large-scale tasks. The model has 8 billion parameters and supports text, image, audio, and video inputs, suitable for various applications such as chat, transcription, and translation. Compared to other Gemini models, Flash-8B is optimized for speed and cost-effectiveness, making it especially suitable for cost-sensitive users. Its rate limits are doubled to allow developers to handle large-scale tasks more efficiently. Additionally, Flash-8B uses “knowledge distillation” to extract key knowledge from larger models, ensuring lightweight efficiency while maintaining core capabilities.

gemini-1.5-flash-exp-0827

1m

8k

Not supported

Conversation, Image Understanding

Google_gemini

This is an experimental version of Gemini 1.5 Flash, updated periodically to include the latest improvements. Suitable for exploratory testing and prototyping, not recommended for production.

gemini-1.5-flash-latest

1m

8k

Not supported

Conversation, Image Understanding

Google_gemini

This is the cutting-edge version of Gemini 1.5 Flash, updated periodically to include the latest improvements. Suitable for exploratory testing and prototyping, not recommended for production.

gemini-1.5-pro-001

2m

8k

Not supported

Conversation, Image Understanding

Google_gemini

This is the stable version of Gemini 1.5 Pro, providing fixed model behavior and performance characteristics. Suitable for production environments that require stability.

gemini-1.5-pro-002

2m

8k

Not supported

Conversation, Image Understanding

Google_gemini

This is the stable version of Gemini 1.5 Pro, providing fixed model behavior and performance characteristics. Suitable for production environments that require stability.

gemini-1.5-pro-exp-0801

2m

8k

Not supported

Conversation, Image Understanding

Google_gemini

An experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. Particularly suitable for tasks requiring complex reasoning.

gemini-1.5-pro-exp-0827

2m

8k

Not supported

Conversation, Image Understanding

Google_gemini

An experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. Particularly suitable for tasks requiring complex reasoning.

gemini-1.5-pro-latest

2m

8k

Not supported

Conversation, Image Understanding

Google_gemini

This is the latest version of Gemini 1.5 Pro, dynamically pointing to the newest snapshot.

gemini-2.0-flash

1m

8k

Not supported

Conversation, Image Understanding

Google_gemini

Gemini 2.0 Flash is Google’s latest model, offering faster time-to-first-token (TTFT) compared to the 1.5 version while maintaining quality comparable to Gemini Pro 1.5. This model has significant improvements in multimodal understanding, coding ability, complex instruction execution, and function calls, providing a smoother and more powerful intelligent experience.

gemini-2.0-flash-exp

100k

8k

Supported

Conversation, Image Understanding

Google_gemini

Gemini 2.0 Flash introduces multimodal real-time APIs, improved speed and performance, quality enhancements, stronger agent capabilities, and adds image generation and voice conversion features.

gemini-2.0-flash-lite-preview-02-05

1M

8k

Not supported

Conversation, Image Understanding

Google_gemini

Gemini 2.0 Flash-Lite is Google’s newly released cost-effective AI model that offers better quality while maintaining the same speed as 1.5 Flash; it supports a 1,000,000-token context window and can handle multimodal tasks including images, audio, and code. As Google’s most cost-effective model to date, it uses a simplified single-pricing strategy, making it especially suitable for large-scale applications requiring cost control.

gemini-2.0-flash-thinking-exp

40k

8k

Not supported

Conversation, Reasoning

Google_gemini

gemini-2.0-flash-thinking-exp is an experimental model that can generate the “thought process” experienced while producing a response. Therefore, compared to the basic Gemini 2.0 Flash model, the “thinking mode” responses have stronger reasoning capabilities.

gemini-2.0-flash-thinking-exp-01-21

1m

64k

Not supported

Conversation, Reasoning

Google_gemini

Gemini 2.0 Flash Thinking EXP-01-21 is Google’s latest AI model focused on improving reasoning ability and user interaction. The model has strong reasoning capabilities, especially in mathematics and programming, and supports context windows up to 1,000,000 tokens, suitable for complex tasks and deep analysis. Its uniqueness lies in generating thought processes to improve the interpretability of AI reasoning, while supporting native code execution to enhance interaction flexibility and practicality. Through algorithmic optimizations, the model reduces logical contradictions, further improving answer accuracy and consistency.

gemini-2.0-flash-thinking-exp-1219

40k

8k

Not supported

Conversation, Reasoning, Image Understanding

Google_gemini

gemini-2.0-flash-thinking-exp-1219 is an experimental model that can generate the “thought process” experienced while producing a response. Therefore, compared to the basic Gemini 2.0 Flash model, the “thinking mode” responses have stronger reasoning capabilities.

gemini-2.0-pro-exp-01-28

2m

64k

Not supported

Conversation, Image Understanding

Google_gemini

Pre-release model, not yet online

gemini-2.0-pro-exp-02-05

2m

8k

Not supported

Conversation, Image Understanding

Google_gemini

Gemini 2.0 Pro Exp 02-05 is Google’s experimental model released in February 2024, excelling at world knowledge, code generation, and long-text understanding. The model supports a 2,000,000-token ultra-long context window and can handle 2-hour video, 22-hour audio, over 60,000 lines of code, and more than 1.4 million words of content. As part of the Gemini 2.0 family, it uses the new Flash Thinking training strategy, significantly improving performance and ranking highly on multiple LLM benchmarks, demonstrating strong overall capabilities.

gemini-exp-1114

8k

4k

Not supported

Conversation, Image Understanding

Google_gemini

This is an experimental model released on November 14, 2024, primarily focused on quality improvements.

gemini-exp-1121

8k

4k

Not supported

Conversation, Image Understanding, Code

Google_gemini

This is an experimental model released on November 21, 2024, with improvements in coding, reasoning, and visual capabilities.

gemini-exp-1206

8k

4k

Not supported

Conversation, Image Understanding

Google_gemini

This is an experimental model released on December 6, 2024, with improvements in coding, reasoning, and visual capabilities.

gemini-exp-latest

8k

4k

Not supported

Conversation, Image Understanding

Google_gemini

This is an experimental model that dynamically points to the latest version.

gemini-pro

33k

8k

Not supported

Conversation

Google_gemini

Same as gemini-1.0-pro; an alias for gemini-1.0-pro

gemini-pro-vision

16k

2k

Not supported

Conversation, Image Understanding

Google_gemini

This is the vision version of Gemini 1.0 Pro. This model will be deprecated on February 15, 2025; migration to the 1.5 series is recommended.

grok-2

128k

-

Not supported

Conversation

Grok_grok

A new grok model version released by X.ai on 2024-12-12.

grok-2-1212

128k

-

Not supported

Conversation

Grok_grok

A new grok model version released by X.ai on 2024-12-12.

grok-2-latest

128k

-

Not supported

Conversation

Grok_grok

A new grok model version released by X.ai on 2024-12-12.

grok-2-vision-1212

32k

-

Not supported

Conversation, Image Understanding

Grok_grok

The grok vision model released by X.ai on 2024-12-12.

grok-beta

100k

-

Not supported

Conversation

Grok_grok

Comparable performance to Grok 2, with improvements in efficiency, speed, and features.

grok-vision-beta

8k

-

Not supported

Conversation, Image Understanding

Grok_grok

The latest image understanding model that can handle various visual inputs, including documents, charts, screenshots, and photos.

internlm/internlm2_5-20b-chat

32k

-

Supported

Conversation

internlm

InternLM2.5-20B-Chat is an open-source large conversational model developed on the InternLM2 architecture. The model has 20 billion parameters and excels in mathematical reasoning, outperforming Llama3 and Gemma2-27B models of similar size. InternLM2.5-20B-Chat has significantly improved tool-calling capabilities, supporting information collection from hundreds of web pages for analysis and reasoning, and has stronger instruction understanding, tool selection, and result reflection abilities.

meta-llama/Llama-3.2-11B-Vision-Instruct

8k

-

Not supported

Conversation, Image Understanding

Meta_llama

Llama series models can now handle both text and image data; some Llama 3.2 variants include visual understanding capabilities. This model supports simultaneous text and image input, understands images, and outputs textual information.

meta-llama/Llama-3.2-3B-Instruct

32k

-

Not supported

Conversation

Meta_llama

Meta Llama 3.2 is a multilingual large language model family with 1B and 3B lightweight variants suitable for edge and mobile devices; this model is the 3B version.

meta-llama/Llama-3.2-90B-Vision-Instruct

8k

-

Not supported

Conversation, Image Understanding

Meta_llama

Llama series models can now handle both text and image data; some Llama 3.2 variants include visual understanding capabilities. This model supports simultaneous text and image input, understands images, and outputs textual information.

meta-llama/Llama-3.3-70B-Instruct

131k

-

Not supported

Conversation

Meta_llama

Meta’s latest 70B LLM, with performance comparable to llama 3.1 405B.

meta-llama/Meta-Llama-3.1-405B-Instruct

32k

-

Not supported

Conversation

Meta_llama

The Meta Llama 3.1 multilingual LLM family is a collection of pretrained and instruction-tuned generative models in 8B, 70B, and 405B sizes; this model is the 405B version. The Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual dialogue and outperform many available open-source and closed-source chat models on common industry benchmarks.

meta-llama/Meta-Llama-3.1-70B-Instruct

32k

-

Not supported

Conversation

Meta_llama

Meta Llama 3.1 is a multilingual large language model family developed by Meta, including pretrained and instruction-tuned variants at 8B, 70B, and 405B parameter scales. The 70B instruction-tuned model is optimized for multilingual dialogue and performs strongly across multiple industry benchmarks. The model was trained on over 150 trillion public tokens and uses supervised fine-tuning and reinforcement learning from human feedback to improve usefulness and safety.

meta-llama/Meta-Llama-3.1-8B-Instruct

32k

-

Not supported

Conversation

Meta_llama

The Meta Llama 3.1 multilingual LLM family is a collection of pretrained and instruction-tuned generative models in 8B, 70B, and 405B sizes; this model is the 8B version. The Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual dialogue and outperform many available open-source and closed-source chat models on common industry benchmarks.

abab5.5-chat

16k

-

Supported

Conversation

Minimax_abab

Chinese persona chat scenarios

abab5.5s-chat

8k

-

Supported

Conversation

Minimax_abab

Chinese persona chat scenarios

abab6.5g-chat

8k

-

Supported

Conversation

Minimax_abab

English and other multilingual persona chat scenarios

abab6.5s-chat

245k

-

Supported

Conversation

Minimax_abab

General scenarios

abab6.5t-chat

8k

-

Supported

Conversation

Minimax_abab

Chinese persona chat scenarios

chatgpt-4o-latest

128k

16k

Not supported

Conversation, Image Understanding

OpenAI

The chatgpt-4o-latest model version continuously points to the GPT-4o version used in ChatGPT and is updated as soon as significant changes occur.

gpt-4o-2024-11-20

128k

16k

Supported

Conversation

OpenAI

Latest gpt-4o snapshot from November 20, 2024.

gpt-4o-audio-preview

128k

16k

Not supported

Conversation

OpenAI

OpenAI’s real-time voice conversation model

gpt-4o-audio-preview-2024-10-01

128k

16k

Supported

Conversation

OpenAI

OpenAI’s real-time voice conversation model

o1

128k

32k

Not supported

Conversation, Reasoning, Image Understanding

OpenAI

A new reasoning model from OpenAI for complex tasks that require broad common sense. The model has a 200k context window, is currently the most powerful model globally, and supports image recognition.

o1-mini-2024-09-12

128k

64k

Not supported

Conversation, Reasoning

OpenAI

A fixed snapshot of o1-mini, smaller and faster than o1-preview, about 80% lower cost, and performs well in code generation and small-context operations.

o1-preview-2024-09-12

128k

32k

Not supported

Conversation, Reasoning

OpenAI

A fixed snapshot of o1-preview.

gpt-3.5-turbo

16k

4k

Supported

Conversation

OpenAI_gpt-3

Based on GPT-3.5: GPT-3.5 Turbo is an improved version built on GPT-3.5 and developed by OpenAI. Performance goals: designed to improve inference speed, processing efficiency, and resource utilization through optimizations to model structure and algorithms. Improved inference speed: compared to GPT-3.5, GPT-3.5 Turbo typically offers faster inference under the same hardware conditions, which is beneficial for large-scale text processing applications. Higher throughput: when handling many requests or data, GPT-3.5 Turbo can achieve higher concurrent processing capacity, improving overall system throughput. Optimized resource consumption: while maintaining performance, it may reduce hardware resource demands (such as memory and compute), helping lower operating costs and improve scalability. Wide range of NLP tasks: GPT-3.5 Turbo is suitable for many NLP tasks, including but not limited to text generation, semantic understanding, dialogue systems, and machine translation. Developer tools and API support: provides developer-friendly APIs for easy integration and use, supporting rapid development and deployment of applications.

gpt-3.5-turbo-0125

16k

4k

Supported

Conversation

OpenAI_gpt-3

Updated GPT 3.5 Turbo with more accurate response formatting and a fix for a bug that made non-English function call text encoding problematic. Returns up to 4,096 output tokens.

gpt-3.5-turbo-0613

16k

4k

Supported

Conversation

OpenAI_gpt-3

Updated fixed snapshot version of GPT 3.5 Turbo. Now deprecated.

gpt-3.5-turbo-1106

16k

4k

Supported

Conversation

OpenAI_gpt-3

Features improved instruction following, JSON mode, reproducible outputs, parallel function calls, etc. Returns up to 4,096 output tokens.

gpt-3.5-turbo-16k

16k

4k

Supported

Conversation, Deprecated or Soon-to-be Deprecated

OpenAI_gpt-3

(Deprecated)

gpt-3.5-turbo-16k-0613

16k

4k

Supported

Conversation, Deprecated or Soon-to-be Deprecated

OpenAI_gpt-3

Snapshot of gpt-3.5-turbo from June 13, 2023. (Deprecated)

gpt-3.5-turbo-instruct

4k

4k

Supported

Conversation

OpenAI_gpt-3

Capabilities similar to models from the GPT-3 era. Compatible with legacy Completions endpoints, not for Chat Completions.

gpt-3.5o

16k

4k

Not supported

Conversation

OpenAI_gpt-3

Same as gpt-4o-lite

gpt-4

8k

8k

Supported

Conversation

OpenAI_gpt-4

Currently points to gpt-4-0613.

gpt-4-0125-preview

128k

4k

Supported

Conversation

OpenAI_gpt-4

The latest GPT-4 model aimed at reducing “laziness,” where the model fails to complete tasks. Returns up to 4,096 output tokens.

gpt-4-0314

8k

8k

Supported

Conversation

OpenAI_gpt-4

Snapshot of gpt-4 from March 14, 2023.

gpt-4-0613

8k

8k

Supported

Conversation

OpenAI_gpt-4

Snapshot of gpt-4 from June 13, 2023, with enhanced function call support.

gpt-4-1106-preview

128k

4k

Supported

Conversation

OpenAI_gpt-4

GPT-4 Turbo model with improved instruction following, JSON mode, reproducible outputs, function calls, etc. Returns up to 4,096 output tokens. This is a preview model.

gpt-4-32k

32k

4k

Supported

Conversation

OpenAI_gpt-4

gpt-4-32k will be deprecated on 2025-06-06.

gpt-4-32k-0613

32k

4k

Supported

Conversation, Deprecated or Soon-to-be Deprecated

OpenAI_gpt-4

Will be deprecated on 2025-06-06.

gpt-4-turbo

128k

4k

Supported

Conversation

OpenAI_gpt-4

The latest GPT-4 Turbo model adds vision capabilities and supports handling visual requests via JSON mode and function calls. The current version is gpt-4-turbo-2024-04-09.

gpt-4-turbo-2024-04-09

128k

4k

Supported

Conversation

OpenAI_gpt-4

GPT-4 Turbo with vision capabilities. Visual requests can now be handled via JSON mode and function calls. The current gpt-4-turbo version is this one.

gpt-4-turbo-preview

128k

4k

Supported

Conversation, Image Understanding

OpenAI_gpt-4

Currently points to gpt-4-0125-preview.

gpt-4o

128k

16k

Supported

Conversation, Image Understanding

OpenAI_gpt-4

OpenAI’s high-intelligence flagship model suitable for complex multi-step tasks. GPT-4o is cheaper and faster than GPT-4 Turbo.

gpt-4o-2024-05-13

128k

4k

Supported

Conversation, Image Understanding

OpenAI_gpt-4

Original gpt-4o snapshot from May 13, 2024.

gpt-4o-2024-08-06

128k

16k

Supported

Conversation, Image Understanding

OpenAI_gpt-4

The first snapshot to support structured outputs. gpt-4o currently points to this version.

gpt-4o-mini

128k

16k

Supported

Conversation, Image Understanding

OpenAI_gpt-4

OpenAI’s affordable gpt-4o variant for fast, lightweight tasks. GPT-4o mini is cheaper and more capable than GPT-3.5 Turbo. Currently points to gpt-4o-mini-2024-07-18.

gpt-4o-mini-2024-07-18

128k

16k

Supported

Conversation, Image Understanding

OpenAI_gpt-4

A fixed snapshot version of gpt-4o-mini.

gpt-4o-realtime-preview

128k

4k

Supported

Conversation, Real-time Voice

OpenAI_gpt-4

OpenAI’s real-time voice conversation model

gpt-4o-realtime-preview-2024-10-01

128k

4k

Supported

Conversation, Real-time Voice, Image Understanding

OpenAI_gpt-4

gpt-4o-realtime-preview currently points to this snapshot version

o1-mini

128k

64k

Not supported

Conversation, Reasoning

OpenAI_o1

Smaller and faster than o1-preview, about 80% lower cost, and performs well in code generation and small-context operations.

o1-preview

128k

32k

Not supported

Conversation, Reasoning

OpenAI_o1

o1-preview is a new reasoning model for complex tasks requiring broad common sense. The model has a 128K context and a knowledge cutoff of October 2023. It focuses on advanced reasoning and solving complex problems, including mathematics and science tasks. It is ideal for applications that need deep context understanding and autonomous workflows.

o3-mini

200k

100k

Supported

Conversation, Reasoning

OpenAI_o1

o3-mini is OpenAI’s latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on scientific, mathematical, and coding tasks, supports structured outputs, function calls, batch API, and other developer features, with knowledge cutoff in October 2023, demonstrating a significant balance between reasoning capability and economy.

o3-mini-2025-01-31

200k

100k

Supported

Conversation, Reasoning

OpenAI_o1

o3-mini currently points to this version. o3-mini-2025-01-31 is OpenAI’s latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on scientific, mathematical, and coding tasks, supports structured outputs, function calls, batch API, and other developer features, with knowledge cutoff in October 2023, demonstrating a significant balance between reasoning capability and economy.

Baichuan2-Turbo

32k

-

Not supported

Conversation

Baichuan_baichuan

Compared to industry models of the same size, the model maintains industry-leading performance while significantly reducing cost.

Baichuan3-Turbo

32k

-

Not supported

Conversation

Baichuan_baichuan

Compared to industry models of the same size, the model maintains industry-leading performance while significantly reducing cost.

Baichuan3-Turbo-128k

128k

-

Not supported

Conversation

Baichuan_baichuan

Baichuan’s model handles complex text via a 128k ultra-long context window, is specially optimized for industries like finance, and while maintaining high performance, greatly reduces cost to provide a high cost-performance solution for enterprises.

Baichuan4

32k

-

Not supported

Conversation

Baichuan_baichuan

Baichuan’s MoE model provides cost-effective enterprise solutions through specialized optimization, cost reduction, and performance improvement.

Baichuan4-Air

32k

-

Not supported

Conversation

Baichuan_baichuan

Baichuan’s MoE model provides cost-effective enterprise solutions through specialized optimization, cost reduction, and performance improvement.

Baichuan4-Turbo

32k

-

Not supported

Conversation

Baichuan_baichuan

Trained on massive high-quality scenario data, enterprise high-frequency scenario availability is improved by over 10% compared to Baichuan4, information summarization improved by 50%, multilingual performance improved by 31%, and content generation improved by 13%. With special optimization for reasoning performance, first-token response speed is increased by 51% relative to Baichuan4, and token throughput is increased by 73%.

ERNIE-3.5-128K

128k

4k

Supported

Conversation

Baidu_ernie

Baidu’s self-developed flagship large-scale language model covering massive Chinese and English corpora, with strong general capabilities to meet most dialogue Q&A, creative generation, and plugin application scenarios; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.

ERNIE-3.5-8K

8k

1k

Supported

Conversation

Baidu_ernie

Baidu’s self-developed flagship large-scale language model covering massive Chinese and English corpora, with strong general capabilities to meet most dialogue Q&A, creative generation, and plugin application scenarios; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.

ERNIE-3.5-8K-Preview

8k

1k

Supported

Conversation

Baidu_ernie

Baidu’s self-developed flagship large-scale language model covering massive Chinese and English corpora, with strong general capabilities to meet most dialogue Q&A, creative generation, and plugin application scenarios; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.

ERNIE-4.0-8K

8k

1k

Supported

Conversation

Baidu_ernie

Baidu’s self-developed flagship ultra-large-scale language model, achieving a comprehensive upgrade in capabilities compared to ERNIE 3.5, widely applicable to complex task scenarios across various domains; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.

ERNIE-4.0-8K-Latest

8k

2k

Supported

Conversation

Baidu_ernie

Compared to ERNIE-4.0-8K, ERNIE-4.0-8K-Latest has a comprehensive capability upgrade, with significant improvements in role-playing abilities and instruction following. Compared to ERNIE 3.5, it achieves a full upgrade in model capability and is widely applicable to complex task scenarios across various domains; it supports automatic integration with Baidu search plugins to ensure Q&A timeliness and supports 5K tokens input + 2K tokens output. This article describes the ERNIE-4.0-8K-Latest API usage.

ERNIE-4.0-8K-Preview

8k

1k

Supported

Conversation

Baidu_ernie

Baidu’s self-developed flagship ultra-large-scale language model, achieving a comprehensive upgrade in capabilities compared to ERNIE 3.5, widely applicable to complex task scenarios across various domains; supports automatic integration with Baidu search plugins to ensure the timeliness of Q&A information.

ERNIE-4.0-Turbo-128K

128k

4k

Supported

Conversation

Baidu_ernie

ERNIE 4.0 Turbo is Baidu’s self-developed flagship ultra-large-scale language model with excellent overall performance, widely applicable to complex task scenarios across domains; supports automatic integration with Baidu search plugins to ensure Q&A timeliness. Compared to ERNIE 4.0, it has superior performance. ERNIE-4.0-Turbo-128K is a version of the model whose long-document overall performance is better than ERNIE-3.5-128K. This article describes related APIs and usage.

ERNIE-4.0-Turbo-8K

8k

2k

Supported

Conversation

Baidu_ernie

ERNIE 4.0 Turbo is Baidu’s self-developed flagship ultra-large-scale language model with excellent overall performance, widely applicable to complex task scenarios across domains; supports automatic integration with Baidu search plugins to ensure Q&A timeliness. Compared to ERNIE 4.0, it has superior performance. ERNIE-4.0-Turbo-8K is a version of the model. This article describes related APIs and usage.

ERNIE-4.0-Turbo-8K-Latest

8k

2k

Supported

Conversation

Baidu_ernie

ERNIE 4.0 Turbo is Baidu’s self-developed flagship ultra-large-scale language model with excellent overall performance, widely applicable to complex task scenarios across domains; supports automatic integration with Baidu search plugins to ensure Q&A timeliness. Compared to ERNIE 4.0, it has superior performance. ERNIE-4.0-Turbo-8K is one version of the model.

ERNIE-4.0-Turbo-8K-Preview

8k

2k

Supported

Conversation

Baidu_ernie

ERNIE 4.0 Turbo is Baidu’s self-developed flagship ultra-large-scale language model with excellent overall performance, widely applicable to complex task scenarios across domains; supports automatic integration with Baidu search plugins to ensure Q&A timeliness. ERNIE-4.0-Turbo-8K-Preview is one version of the model.

ERNIE-Character-8K

8k

1k

Not supported

Conversation

Baidu_ernie

Baidu’s self-developed vertical-domain large language model, suitable for game NPCs, customer service dialogues, and role-playing applications. It has a more distinctive and consistent persona style, stronger instruction following, and superior reasoning performance.

ERNIE-Lite-8K

8k

4k

Not supported

Conversation

Baidu_ernie

Baidu’s self-developed lightweight large language model, balancing excellent model quality and inference performance, suitable for inference on low-power AI acceleration cards.

ERNIE-Lite-Pro-128K

128k

2k

Supported

Conversation

Baidu_ernie

Baidu’s self-developed lightweight large language model with better performance than ERNIE Lite, balancing excellent model quality and inference performance, suitable for inference on low-power AI acceleration cards. ERNIE-Lite-Pro-128K supports 128K context length and outperforms ERNIE-Lite-128K.

ERNIE-Novel-8K

8k

2k

Not supported

Conversation

Baidu_ernie

ERNIE-Novel-8K is Baidu’s general-purpose large language model with notable advantages in novel continuation capabilities; it can also be used for short dramas, movies, and similar scenarios.

ERNIE-Speed-128K

128k

4k

Not supported

Conversation

Baidu_ernie

Baidu’s latest high-performance large language model released in 2024, with excellent general capability, suitable as a base model for fine-tuning to better handle specific scenarios, while also offering outstanding inference performance.

ERNIE-Speed-8K

8k

1k

Not supported

Conversation

Baidu_ernie

Baidu’s latest high-performance large language model released in 2024, with excellent general capability, suitable as a base model for fine-tuning to better handle specific scenarios, while also offering outstanding inference performance.

ERNIE-Speed-Pro-128K

128k

4k

Not supported

Conversation

Baidu_ernie

ERNIE Speed Pro is Baidu’s latest high-performance large language model released in 2024, with excellent general capability, suitable as a base model for fine-tuning to better handle specific scenarios, while also offering outstanding inference performance. ERNIE-Speed-Pro-128K is the initial version released on August 30, 2024, supports 128K context length, and outperforms ERNIE-Speed-128K.

ERNIE-Tiny-8K

8k

1k

Not supported

Conversation

Baidu_ernie

Baidu’s self-developed ultra-high-performance large language model with the lowest deployment and fine-tuning costs within the Wenxin series.

Doubao-1.5-lite-32k

32k

12k

Supported

Conversation

Doubao_doubao

Doubao1.5-lite ranks among the world’s top lightweight language models. On comprehensive (MMLU_pro), reasoning (BBH), mathematics (MATH), and professional knowledge (GPQA) benchmarks it matches or surpasses GPT-4o mini and Claude 3.5 Haiku.

Doubao-1.5-pro-256k

256k

12k

Supported

Conversation

Doubao_doubao

Doubao-1.5-Pro-256k is a comprehensive upgrade based on Doubao-1.5-Pro. Compared to Doubao-pro-256k/241115, overall performance is significantly improved by 10%. Output length is greatly increased, supporting up to 12k tokens.

Doubao-1.5-pro-32k

32k

12k

Supported

Conversation

Doubao_doubao

Doubao-1.5-pro is a new generation flagship model with comprehensive performance upgrades, excelling in knowledge, coding, reasoning, and more. It achieves world-leading levels on multiple public benchmarks, especially excelling in knowledge, coding, reasoning, and Chinese authoritative benchmarks, with overall scores surpassing industry-leading models such as GPT4o and Claude 3.5 Sonnet.

Doubao-1.5-vision-pro

32k

12k

Not supported

Conversation, Image Understanding

Doubao_doubao

Doubao-1.5-vision-pro is a newly upgraded multimodal large model that supports arbitrary resolution and extreme aspect-ratio image recognition, enhancing visual reasoning, document recognition, fine-detail understanding, and instruction following.

Doubao-embedding

4k

-

Supported

Embedding

Doubao_doubao

Doubao-embedding is a semantic vectorization model developed by ByteDance, mainly for vector retrieval scenarios, supporting Chinese and English with a maximum 4K context length. Currently provides the following versions: text-240715: maximum vector dimension 2560, supports 512, 1024, 2048 dimensionality reduction. Chinese-English retrieval performance is significantly improved compared to text-240515; recommended. text-240515: maximum vector dimension 2048, supports 512, 1024 dimensionality reduction.

Doubao-embedding-large

4k

-

Not supported

Embedding

Doubao_doubao

Chinese-English retrieval performance is significantly improved compared to Doubao-embedding/text-240715 version

Doubao-embedding-vision

8k

-

Not supported

Embedding

Doubao_doubao

Doubao-embedding-vision is a newly upgraded image-text multimodal vectorization model mainly aimed at image-text multimodal vector retrieval scenarios. It supports image input and Chinese/English text input with a maximum 8K context length.

Doubao-lite-128k

128k

4k

Supported

Conversation

Doubao_doubao

Doubao-lite offers extreme response speed and better cost-effectiveness, providing customers with more flexible options for different scenarios. Supports 128k context window for inference and fine-tuning.

Doubao-lite-32k

32k

4k

Supported

Conversation

Doubao_doubao

Doubao-lite offers extreme response speed and better cost-effectiveness, providing customers with more flexible options for different scenarios. Supports 32k context window for inference and fine-tuning.

Doubao-lite-4k

4k

4k

Supported

Conversation

Doubao_doubao

Doubao-lite offers extreme response speed and better cost-effectiveness, providing customers with more flexible options for different scenarios. Supports 4k context window for inference and fine-tuning.

Doubao-pro-128k

128k

4k

Supported

Conversation

Doubao_doubao

The flagship model with the best performance, suitable for handling complex tasks and effective in reference Q&A, summarization, creative generation, text classification, role-playing, and other scenarios. Supports 128k context window for inference and fine-tuning.

Doubao-pro-32k

32k

4k

Supported

Conversation

Doubao_doubao

The flagship model with the best performance, suitable for handling complex tasks and effective in reference Q&A, summarization, creative generation, text classification, role-playing, and other scenarios. Supports 32k context window for inference and fine-tuning.

Doubao-pro-4k

4k

4k

Supported

Conversation

Doubao_doubao

The flagship model with the best performance, suitable for handling complex tasks and effective in reference Q&A, summarization, creative generation, text classification, role-playing, and other scenarios. Supports 4k context window for inference and fine-tuning.

step-1-128k

128k

-

Supported

Conversation

Leap Star (Step One)

The step-1-128k model is a very large language model capable of handling up to 128,000 tokens of input. This capability gives it significant advantages in generating long-form content and performing complex reasoning, making it suitable for applications such as novel and script writing that require rich context.

step-1-256k

256k

-

Supported

Conversation

Leap Star (Step One)

The step-1-256k model is one of the largest language models currently available, supporting 256,000 tokens of input. It is designed to meet extreme complex task requirements such as large-scale data analysis and multi-turn dialogue systems, and can provide high-quality outputs across various domains.

step-1-32k

32k

-

Supported

Conversation

Leap Star (Step One)

The step-1-32k model expands the context window to support 32,000 tokens of input. This makes it excel at handling long articles and complex dialogues, suitable for tasks that require deep understanding and analysis such as legal documents and academic research.

step-1-8k

8k

-

Supported

Conversation

Leap Star (Step One)

The step-1-8k model is an efficient language model designed for shorter text processing. It can reason within an 8,000-token context, suitable for applications requiring quick responses such as chatbots and real-time translation.

step-1-flash

8k

-

Supported

Conversation

Leap Star (Step One)

The step-1-flash model focuses on fast response and efficient processing, suitable for real-time applications. Its design enables high-quality language understanding and generation under limited compute resources, suitable for mobile devices and edge computing.

step-1.5v-mini

32k

-

Supported

Conversation, Image Understanding

Leap Star (Step One)

The step-1.5v-mini model is a lightweight version designed to run in resource-constrained environments. Despite its small size, it retains good language processing capabilities and is suitable for embedded systems and low-power devices.

step-1v-32k

32k

-

Supported

Conversation, Image Understanding

Leap Star (Step One)

The step-1v-32k model supports 32,000 tokens of input, suitable for applications that require longer context. It performs well on complex dialogues and long-form text, making it suitable for customer service and content creation.

step-1v-8k

8k

-

Supported

Conversation, Image Understanding

Leap Star (Step One)

The step-1v-8k model is an optimized version designed for 8,000-token inputs, suitable for fast generation and short-text processing. It achieves a good balance between speed and accuracy, ideal for real-time applications.

step-2-16k

16k

-

Supported

Conversation

Leap Star (Step One)

The step-2-16k model is a medium-sized language model supporting 16,000-token input. It performs well across a variety of tasks, suitable for education, training, and knowledge management scenarios.

yi-lightning

16k

-

Supported

Conversation

Yi One Wanwu_yi

The latest high-performance model that ensures high-quality outputs while significantly improving inference speed. Suitable for real-time interaction and high-complexity reasoning scenarios. Its excellent cost-performance ratio can provide strong product support for commercial products.

yi-vision-v2

16K

-

Supported

Conversation, Image Understanding

Yi One Wanwu_yi

Suitable for scenarios that require analysis and interpretation of images and charts, such as image Q&A, chart understanding, OCR, visual reasoning, education, research report comprehension, or multilingual document reading.

qwen-14b-chat

8k

2k

Supported

Conversation

Qianwen_qwen

Alibaba Cloud’s official Tongyi Qianwen open-source edition.

qwen-72b-chat

32k

2k

Supported

Conversation

Qianwen_qwen

Alibaba Cloud’s official Tongyi Qianwen open-source edition.

qwen-7b-chat

7.5k

1.5k

Supported

Conversation

Qianwen_qwen

Alibaba Cloud’s official Tongyi Qianwen open-source edition.

qwen-coder-plus

128k

8k

Supported

Conversation, Code

Qianwen_qwen

Qwen-Coder-Plus is a programming-specialized model in the Qwen series designed to improve code generation and understanding. Trained on large-scale programming data, it can handle multiple programming languages and supports code completion, error detection, and code refactoring. Its goal is to provide developers more efficient programming assistance and improve development productivity.

qwen-coder-plus-latest

128k

8k

Supported

Conversation, Code

Qianwen_qwen

Qwen-Coder-Plus-Latest is the latest version of Qwen-Coder-Plus, containing the newest algorithm optimizations and dataset updates. The model has significant performance improvements, better understanding of context, and generates code that better meets developers’ needs. It also introduces support for more programming languages, enhancing multilingual programming capabilities.

qwen-coder-turbo

128k

8k

Supported

Conversation, Code

Qianwen_qwen

The Qwen coder and programming models are specialized for programming and code generation, with fast inference and low cost. This version always points to the latest stable snapshot.

qwen-coder-turbo-latest

128k

8k

Supported

Conversation, Code

Qianwen_qwen

The Qwen coder and programming models are specialized for programming and code generation, with fast inference and low cost. This version always points to the latest snapshot.

qwen-long

10m

6k

Supported

Conversation

Qianwen_qwen

Qwen-Long is Qwen’s large model for ultra-long context scenarios, supporting Chinese, English, and other languages, with up to 10 million tokens (about 15 million characters or 15,000 pages) of ultra-long context dialogue. Together with its document service launched simultaneously, it supports parsing and dialogue for various document formats such as Word, PDF, Markdown, EPUB, and MOBI. Note: direct HTTP requests support up to 1M tokens; for lengths beyond this it is recommended to submit via files.

qwen-math-plus

4k

3k

Supported

Conversation

Qianwen_qwen

Qwen-Math-Plus is a model focused on solving math problems, intended to provide efficient mathematical reasoning and computational capabilities. Trained on large math corpora, it can handle complex mathematical expressions and problems, supporting a range of computations from basic arithmetic to advanced mathematics. Use cases include education, research, and engineering.

qwen-math-plus-latest

4k

3k

Supported

Conversation

Qianwen_qwen

Qwen-Math-Plus-Latest is the latest version of Qwen-Math-Plus, integrating the newest math reasoning techniques and algorithmic improvements. The model performs better on complex math problems and can provide more accurate solutions and reasoning processes. It also expands understanding of mathematical symbols and formulas, suitable for broader math applications.

qwen-math-turbo

4k

3k

Supported

Conversation

Qianwen_qwen

Qwen-Math-Turbo is a high-performance math model designed for fast computation and real-time reasoning. The model optimizes computation speed and can process large volumes of math problems in very short timeframes, suitable for applications demanding quick feedback such as online education and real-time data analysis. Its efficient algorithms allow users to get immediate results for complex calculations.

qwen-math-turbo-latest

4k

3k

Supported

Conversation

Qianwen_qwen

Qwen-Math-Turbo-Latest is the latest version of Qwen-Math-Turbo, further improving computation efficiency and accuracy. The model includes multiple algorithmic optimizations to handle more complex math problems while remaining efficient in real-time reasoning. It is suitable for math applications requiring fast responses, such as financial analysis and scientific computation.

qwen-max

32k

8k

Supported

Conversation

Qianwen_qwen

The Qwen 2.5 series is a hundred-billion-parameter ultra-large-scale language model supporting Chinese, English, and other languages. As the model upgrades, qwen-max will receive rolling updates.

qwen-max-latest

32k

8k

Supported

Conversation

Qianwen_qwen

The best-performing model in the Qwen series. This model is dynamically updated and model updates are not announced in advance. It is suitable for complex, multi-step tasks. The model’s Chinese and English overall capabilities are significantly improved, human preference alignment is significantly enhanced, reasoning and complex instruction understanding are greatly strengthened, performance on difficult tasks is improved, and math and coding capabilities are significantly enhanced. It also improves understanding and generation of structured data such as tables and JSON.

qwen-plus

128k

8k

Supported

Conversation

Qianwen_qwen

A balanced-capability model in the Qwen series, with reasoning performance and speed between Qwen-Max and Qwen-Turbo, suitable for moderately complex tasks. The model’s Chinese and English overall capabilities are significantly improved, human preference alignment is significantly enhanced, reasoning and complex instruction understanding are greatly strengthened, performance on difficult tasks is improved, and math and coding capabilities are significantly enhanced.

qwen-plus-latest

128k

8k

Supported

Conversation

Qianwen_qwen

Qwen-Plus is an enhanced vision-language model in the Qwen series, designed to improve fine-detail recognition and text recognition. The model supports images at over one million pixels resolution and arbitrary aspect ratios, performing well across various vision-language tasks and suitable for applications requiring high-precision image understanding.

qwen-turbo

128k

8k

Supported

Conversation

Qianwen_qwen

The fastest and most cost-effective model in the Qwen series, suitable for simple tasks. The model’s Chinese and English overall capabilities are significantly improved, human preference alignment is significantly enhanced, reasoning and complex instruction understanding are greatly strengthened, performance on difficult tasks is improved, and math and coding capabilities are significantly enhanced.

qwen-turbo-latest

1m

8k

Supported

Conversation

Qianwen_qwen

Qwen-Turbo is an efficient model designed for simple tasks, emphasizing speed and cost-effectiveness. It performs well on basic vision-language tasks and is suitable for applications with strict response-time requirements such as real-time image recognition and simple Q&A systems.

qwen-vl-max

32k

2k

Supported

Conversation

Qianwen_qwen

Qwen-VL-Max (qwen-vl-max) is the ultra-large-scale vision-language model of the Qwen family. Compared to the enhanced version, it further improves visual reasoning and instruction-following capabilities, offering higher visual perception and cognition levels and delivering optimal performance on more complex tasks.

qwen-vl-max-latest

32k

2k

Supported

Conversation, Image Understanding

Qianwen_qwen

Qwen-VL-Max is the top-tier version in the Qwen-VL series, designed to solve complex multimodal tasks. It combines advanced visual and language processing technologies, can understand and analyze high-resolution images, has extremely strong reasoning ability, and is suitable for applications requiring deep understanding and complex reasoning.

qwen-vl-ocr

34k

4k

Supported

Conversation, Image Understanding

Qianwen_qwen

Only supports OCR, does not support conversation.

qwen-vl-ocr-latest

34k

4k

Supported

Conversation, Image Understanding

Qianwen_qwen

Only supports OCR, does not support conversation.

qwen-vl-plus

8k

2k

Supported

Conversation, Image Understanding

Qianwen_qwen

Qwen-VL-Plus (qwen-vl-plus) is an enhanced version of the Qwen large-scale vision-language model. It greatly improves fine-detail recognition and text recognition, supports images at over one million pixels resolution and arbitrary aspect ratios, and delivers excellent performance across a wide range of vision tasks.

qwen-vl-plus-latest

32k

2k

Supported

Conversation, Image Understanding

Qianwen_qwen

Qwen-VL-Plus-Latest is the latest version of Qwen-VL-Plus, enhancing the model’s multimodal understanding capabilities. It excels at combined processing of images and text and is suitable for applications that need to efficiently handle multiple input formats, such as intelligent customer service and content generation.

Qwen/Qwen2-1.5B-Instruct

32k

6k

Not supported

Conversation

Qianwen_qwen

Qwen2-1.5B-Instruct is an instruction-tuned LLM in the Qwen2 series with 1.5B parameters. Based on the Transformer architecture, it uses SwiGLU activation, attention QKV bias, and grouped-query attention techniques. It performs well on language understanding, generation, multilingual capabilities, coding, math, and reasoning benchmarks, surpassing most open-source models.

Qwen/Qwen2-72B-Instruct

128k

6k

Not supported

Conversation

Qianwen_qwen

Qwen2-72B-Instruct is an instruction-tuned LLM in the Qwen2 series with 72B parameters. Based on the Transformer architecture, it uses SwiGLU activation, attention QKV bias, and grouped-query attention techniques. It can handle large-scale inputs and performs strongly on language understanding, generation, multilingual capabilities, coding, math, and reasoning benchmarks, surpassing most open-source models.

Qwen/Qwen2-7B-Instruct

128k

6k

Not supported

Conversation

Qianwen_qwen

Qwen2-7B-Instruct is an instruction-tuned LLM in the Qwen2 series with 7B parameters. Based on the Transformer architecture, it uses SwiGLU activation, attention QKV bias, and grouped-query attention techniques. It can handle large-scale inputs and performs strongly on language understanding, generation, multilingual capabilities, coding, math, and reasoning benchmarks, surpassing most open-source models.

Qwen/Qwen2-VL-72B-Instruct

32k

2k

Not supported

Conversation

Qianwen_qwen

Qwen2-VL is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance on visual understanding benchmarks including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can understand videos longer than 20 minutes for high-quality video-based Q&A, dialogue, and content creation. It also has complex reasoning and decision-making abilities and can be integrated with mobile devices and robots to perform autonomous operations based on visual environments and text instructions.

Qwen/Qwen2-VL-7B-Instruct

32k

-

Not supported

Conversation

Qianwen_qwen

Qwen2-VL-7B-Instruct is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance on visual understanding benchmarks including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can be used for high-quality video-based Q&A, dialogue, and content creation, and also has complex reasoning and decision-making capabilities, allowing integration with mobile devices and robots to operate autonomously based on visual environments and text instructions.

Qwen/Qwen2.5-72B-Instruct

128k

8k

Not supported

Conversation

Qianwen_qwen

Qwen2.5-72B-Instruct is one of Alibaba Cloud’s latest LLM series. This 72B model has significant improvements in coding and mathematics. It supports inputs up to 128K tokens and can generate long texts exceeding 8K tokens.

Qwen/Qwen2.5-72B-Instruct-128K

128k

8k

Not supported

Conversation

Qianwen_qwen

Qwen2.5-72B-Instruct is one of Alibaba Cloud’s latest LLM series. This 72B model has significant improvements in coding and mathematics. It supports inputs up to 128K tokens and can generate long texts exceeding 8K tokens.

Qwen/Qwen2.5-7B-Instruct

128k

8k

Not supported

Conversation

Qianwen_qwen

Qwen2.5-7B-Instruct is one of Alibaba Cloud’s latest LLM series. This 7B model has significant improvements in coding and mathematics. The model also provides multilingual support covering over 29 languages including Chinese and English. It shows significant improvements in instruction following, understanding structured data, and generating structured outputs (especially JSON).

Qwen/Qwen2.5-Coder-32B-Instruct

128k

8k

Not supported

Conversation, Code

Qianwen_qwen

Qwen2.5-32B-Instruct is one of Alibaba Cloud’s latest LLM series. This 32B model has significant improvements in coding and mathematics. The model also provides multilingual support covering over 29 languages including Chinese and English. It shows significant improvements in instruction following, understanding structured data, and generating structured outputs (especially JSON).

Qwen/Qwen2.5-Coder-7B-Instruct

128k

8k

Not supported

Conversation

Qianwen_qwen

Qwen2.5-7B-Instruct is one of Alibaba Cloud’s latest LLM series. This 7B model has significant improvements in coding and mathematics. The model also provides multilingual support covering over 29 languages including Chinese and English. It shows significant improvements in instruction following, understanding structured data, and generating structured outputs (especially JSON).

Qwen/QwQ-32B-Preview

32k

16k

Not supported

Conversation, Reasoning

Qianwen_qwen

QwQ-32B-Preview is an experimental research model developed by the Qwen team to enhance AI reasoning capabilities. As a preview version, it demonstrates strong analytical ability but has important limitations: 1. Language mixing and code switching: the model may mix languages or switch between languages unexpectedly, affecting response clarity. 2. Recursive reasoning loops: the model may enter looped reasoning patterns, producing verbose answers without clear conclusions. 3. Safety and ethical considerations: the model needs strengthened safety measures to ensure reliable and safe performance; users should be cautious when using it. 4. Performance and benchmark limitations: the model performs well in math and programming but still has room for improvement in common-sense reasoning and nuanced language understanding.

qwen1.5-110b-chat

32k

8k

Not supported

Conversation

Qianwen_qwen

-

qwen1.5-14b-chat

8k

2k

Not supported

Conversation

Qianwen_qwen

-

qwen1.5-32b-chat

32k

2k

Not supported

Conversation

Qianwen_qwen

-

qwen1.5-72b-chat

32k

2k

Not supported

Conversation

Qianwen_qwen

-

qwen1.5-7b-chat

8k

2k

Not supported

Conversation

Qianwen_qwen

-

qwen2-57b-a14b-instruct

65k

6k

Not supported

Conversation

Qianwen_qwen

-

Qwen2-72B-Instruct

-

-

Not supported

Conversation

Qianwen_qwen

-

qwen2-7b-instruct

128k

6k

Not supported

Conversation

Qianwen_qwen

-

qwen2-math-72b-instruct

4k

3k

Not supported

Conversation

Qianwen_qwen

-

qwen2-math-7b-instruct

4k

3k

Not supported

Conversation

Qianwen_qwen

-

qwen2.5-14b-instruct

128k

8k

Not supported

Conversation

Qianwen_qwen

-

qwen2.5-32b-instruct

128k

8k

Not supported

Conversation

Qianwen_qwen

-

qwen2.5-72b-instruct

128k

8k

Not supported

Conversation

Qianwen_qwen

-

qwen2.5-7b-instruct

128k

8k

Not supported

Conversation

Qianwen_qwen

-

qwen2.5-coder-14b-instruct

128k

8k

Not supported

Conversation, Code

Qianwen_qwen

-

qwen2.5-coder-32b-instruct

128k

8k

Not supported

Conversation, Code

Qianwen_qwen

-

qwen2.5-coder-7b-instruct

128k

8k

Not supported

Conversation, Code

Qianwen_qwen

-

qwen2.5-math-72b-instruct

4k

3k

Not supported

Conversation

Qianwen_qwen

-

qwen2.5-math-7b-instruct

4k

3k

Not supported

Conversation

Qianwen_qwen

-

deepseek-ai/DeepSeek-R1

64k

-

Not supported

Conversation, Reasoning

DeepSeek_deepseek

The DeepSeek-R1 model is an open-source inference model based purely on reinforcement learning. It performs excellently on tasks in mathematics, coding, and natural language reasoning, with performance comparable to OpenAI’s o1 model and outstanding results on multiple benchmarks.

deepseek-ai/DeepSeek-V2-Chat

128k

-

Not supported

Conversation

DeepSeek_deepseek

DeepSeek-V2 is a powerful, cost-effective mixture-of-experts (MoE) language model. It was pretrained on a high-quality corpus of 8.1 trillion tokens and further improved through supervised fine-tuning (SFT) and reinforcement learning (RL). Compared with DeepSeek 67B, DeepSeek-V2 delivers stronger performance while saving 42.5% of training cost, reducing KV cache by 93.3%, and increasing maximum generation throughput by 5.76×.

deepseek-ai/DeepSeek-V2.5

32k

-

Supported

Conversation

DeepSeek_deepseek

DeepSeek-V2.5 is an upgraded version combining DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating the general and coding capabilities of the two previous versions. The model is optimized in multiple aspects, including writing and instruction-following ability, better aligning with human preferences.

deepseek-ai/DeepSeek-V3

128k

4k

Not supported

Conversation

DeepSeek_deepseek

DeepSeek open-source version, with a longer context window compared to the official version and without issues like refusing to answer due to sensitive words.

deepseek-chat

64k

8k

Supported

Conversation

DeepSeek_deepseek

236B parameters, 64K context (API); ranks first among open-source models in Chinese comprehensive ability (AlignBench); is in the same tier as closed-source models such as GPT-4-Turbo and Wenxin 4.0 in evaluations.

deepseek-coder

64k

8k

Supported

Conversation, Code

DeepSeek_deepseek

236B parameters, 64K context (API); ranks first among open-source models in Chinese comprehensive ability (AlignBench); is in the same tier as closed-source models such as GPT-4-Turbo and Wenxin 4.0 in evaluations.

deepseek-reasoner

64k

8k

Supported

Conversation, Reasoning

DeepSeek_deepseek

DeepSeek-Reasoner (DeepSeek-R1) is DeepSeek's latest reasoning model, designed to improve reasoning ability via reinforcement learning training. The model's reasoning process includes extensive reflection and verification and can handle complex logical reasoning tasks with chain-of-thought lengths reaching tens of thousands of characters. DeepSeek-R1 performs excellently on math, code, and other complex problems, has been widely applied across scenarios, and demonstrates strong reasoning ability and flexibility. Compared to other models, DeepSeek-R1 approaches top closed-source models in reasoning performance, showcasing the potential and competitiveness of open-source models in reasoning.

hunyuan-code

4k

4k

Not supported

Conversation, Code

Tencent_hunyuan

Hunyuan's latest code generation model, further trained on a base model with 200B high-quality code data, iteratively trained for half a year on high-quality SFT data, context window increased to 8K, ranks among the top in automatic evaluation metrics for code generation across five major languages; in human high-quality evaluations across ten metrics for code tasks in five languages, performance is in the leading tier.

hunyuan-functioncall

28k

4k

Supported

Conversation

Tencent_hunyuan

Hunyuan's latest MOE-architecture FunctionCall model, trained on high-quality FunctionCall data, with a context window up to 32K, leading in multiple evaluation metric dimensions.

hunyuan-large

28k

4k

Not supported

Conversation

Tencent_hunyuan

The Hunyuan-large model has approximately 389B total parameters and about 52B active parameters; it is the largest-scale and best-performing open-source MoE Transformer architecture model currently in the industry.

hunyuan-large-longcontext

128k

6k

Not supported

Conversation

Tencent_hunyuan

Good at handling long-text tasks such as document summarization and document question answering, and also capable of general text generation tasks. It excels at analyzing and generating long texts and effectively addresses complex and detailed long-form content processing needs.

hunyuan-lite

250k

6k

Not supported

Conversation

Tencent_hunyuan

Upgraded to an MOE structure with a 256k context window, leading many open-source models on multiple benchmarks in NLP, code, math, and industry-specific evaluations.

hunyuan-pro

28k

4k

Supported

Conversation

Tencent_hunyuan

Trillion-parameter-scale MOE-32K long-context model. Achieves absolute leading levels across various benchmarks; supports complex instructions and reasoning, possesses advanced mathematical capability, supports functioncall, and is specially optimized for multilingual translation and application domains such as finance, law, and healthcare.

hunyuan-role

28k

4k

Not supported

Conversation

Tencent_hunyuan

Hunyuan's latest role-playing model: an officially fine-tuned role-play model from Hunyuan, further trained on role-play scenario datasets based on the Hunyuan model, offering better foundational performance in role-play scenarios.

hunyuan-standard

30k

2k

Not supported

Conversation

Tencent_hunyuan

Uses an improved routing strategy while mitigating load balancing and expert collapse issues. MOE-32K offers better cost-effectiveness; while balancing performance and price, it enables processing of long-text inputs.

hunyuan-standard-256K

250k

6k

Not supported

Conversation

Tencent_hunyuan

Uses an improved routing strategy while mitigating load balancing and expert collapse issues. For long texts, the 'needle-in-a-haystack' metric reaches 99.9%. MOE-256K further breaks through in length and performance, greatly expanding the allowable input length.

hunyuan-translation-lite

4k

4k

Not supported

Conversation

Tencent_hunyuan

Hunyuan translation model supports natural-language conversational translation; supports mutual translation among 15 languages including Chinese and English, Japanese, French, Portuguese, Spanish, Turkish, Russian, Arabic, Korean, Italian, German, Vietnamese, Malay, and Indonesian.

hunyuan-turbo

28k

4k

Supported

Conversation

Tencent_hunyuan

Hunyuan-turbo is the default version of the Hunyuan model, adopting a new mixture-of-experts (MoE) architecture; compared with hunyuan-pro, it has faster inference efficiency and stronger performance.

hunyuan-turbo-latest

28k

4k

Supported

Conversation

Tencent_hunyuan

Hunyuan-turbo dynamic updated version, the best-performing version in the Hunyuan model series, consistent with the consumer-end (Tencent Yuanbao) version.

hunyuan-turbo-vision

8k

2k

Supported

Image recognition, dialogue

Tencent_hunyuan

Hunyuan's next-generation flagship vision-language large model, adopting a new mixture-of-experts (MoE) architecture, comprehensively improving on capabilities related to image-text understanding such as basic recognition, content creation, knowledge Q&A, and analytic reasoning compared to the previous generation. Maximum input 6K, maximum output 2K.

hunyuan-vision

8k

2k

Supported

Conversation, Image Understanding

Tencent_hunyuan

Hunyuan's latest multimodal model, supports image + text input to generate text content. Image basic recognition: identifies main objects, elements, scenes, etc., in images. Image content creation: summarizes images, creates ad copy, social media posts, poetry, etc. Multi-turn image dialogue: enables multi-turn interactive Q&A about a single image. Image analysis and reasoning: performs statistical analysis on logic relations, math problems, code, and charts found in images. Image knowledge Q&A: answers questions about knowledge points contained in images, such as historical events or movie posters. Image OCR: recognizes text in natural and non-natural scene images.

SparkDesk-Lite

4k

-

Not supported

Conversation

SparkDesk_SparkDesk

Supports online web search functionality, responds quickly and conveniently, suitable for low-compute inference and model fine-tuning and other customized scenarios.

SparkDesk-Max

128k

-

Supported

Conversation

SparkDesk_SparkDesk

Quantized from the latest Spark large model engine 4.0 Turbo; supports web search, weather, date and other built-in plugins; core capabilities comprehensively upgraded, with improved performance across scenarios; supports System role personas and FunctionCall function invocation.

SparkDesk-Max-32k

32k

-

Supported

Conversation

SparkDesk_SparkDesk

Stronger inference: stronger context understanding and logical reasoning; longer input: supports 32K tokens of text input, suitable for long-document reading, private knowledge question answering, and similar scenarios.

SparkDesk-Pro

128k

-

Not supported

Conversation

SparkDesk_SparkDesk

Specifically optimized for math, code, healthcare, education and other scenarios; supports web search, weather, date and other built-in plugins; covers most knowledge Q&A, language understanding, and text-creation scenarios.

SparkDesk-Pro-128K

128k

-

Not supported

Conversation

SparkDesk_SparkDesk

Professional-grade large language model with tens of billions of parameters, specially optimized for medical, educational, and coding scenarios, and lower latency in search scenarios. Suitable for business scenarios that require higher performance and response speed for text and intelligent Q&A.

moonshot-v1-128k

128k

4k

Supported

Conversation

Moon's Dark Side_moonshot

Model with length 8k, suitable for generating short texts.

moonshot-v1-32k

32k

4k

Supported

Conversation

Moon's Dark Side_moonshot

Model with length 32k, suitable for generating long texts.

moonshot-v1-8k

8k

4k

Supported

Conversation

Moon's Dark Side_moonshot

Model with length 128k, suitable for generating ultra-long texts.

codegeex-4

128k

4k

Not supported

Conversation, Code

Zhipu_codegeex

Zhipu's code model: suitable for code auto-completion tasks.

charglm-3

4k

2k

Not supported

Conversation

Zhipu_glm

Personification model

emohaa

8k

4k

Not supported

Conversation

Zhipu_glm

Psychological model: has professional counseling ability, helps users understand emotions and cope with emotional issues.

glm-3-turbo

128k

4k

Not supported

Conversation

Zhipu_glm

Deprecation scheduled (June 30, 2025)

glm-4

128k

4k

Supported

Conversation

Zhipu_glm

Legacy flagship: released January 16, 2024, now superseded by GLM-4-0520

glm-4-0520

128k

4k

Supported

Conversation

Zhipu_glm

High-intelligence model: suitable for handling highly complex and diverse tasks

glm-4-air

128k

4k

Supported

Conversation

Zhipu_glm

High cost-effectiveness: the model with the best balance between inference capability and price

glm-4-airx

8k

4k

Supported

Conversation

Zhipu_glm

Ultra-fast inference: extremely fast inference speed with strong reasoning performance

glm-4-flash

128k

4k

Supported

Conversation

Zhipu_glm

High-speed low-cost: ultra-fast inference speed

glm-4-flashx

128k

4k

Supported

Conversation

Zhipu_glm

High-speed low-cost: Flash enhanced version, ultra-fast inference speed

glm-4-long

1m

4k

Supported

Conversation

Zhipu_glm

Ultra-long input: designed specifically for handling ultra-long text and memory-style tasks

glm-4-plus

128k

4k

Supported

Conversation

Zhipu_glm

High-intelligence flagship: overall performance greatly improved, with significantly enhanced long-text and complex-task capabilities

glm-4v

2k

-

Not supported

Conversation, Image Understanding

Zhipu_glm

Image understanding: possesses image understanding and reasoning capabilities

glm-4v-flash

2k

1k

Not supported

Conversation, Image Understanding

Zhipu_glm

Free model: possesses powerful image understanding capability

Last updated

Was this helpful?