Common Models Reference Information

This document was translated from Chinese by AI and has not yet been reviewed.

Common Model Reference Information

The following information is for reference only. If there are any errors, please contact us for correction. The context size and model information may vary for different providers of some models;
When inputting data in the client, "k" needs to be converted to its actual numerical value (theoretically 1k=1024 tokens; 1m=1024k tokens), e.g., 8k is 8×1024=8192 tokens. It is recommended to multiply by 1000 in actual use to prevent errors, e.g., 8k as 8×1000=8000, and 1m as 1×1000000=1000000;
A max output of "-" indicates that no clear maximum output information for the model was found from official sources.

Model Name

Max Input

Max Output

Function Calling

Model Capabilities

Provider

Introduction

360gpt-pro

Not Supported

Conversation

360AI_360gpt

The flagship hundred-billion-parameter large model in the 360 AI Brain series, with the best performance, widely applicable to complex task scenarios in various fields.

360gpt-turbo

Not Supported

Conversation

360AI_360gpt

A ten-billion-parameter large model that balances performance and effectiveness, suitable for scenarios with high requirements for performance/cost.

360gpt-turbo-responsibility-8k

Not Supported

Conversation

360AI_360gpt

A ten-billion-parameter large model that balances performance and effectiveness, suitable for scenarios with high requirements for performance/cost.

360gpt2-pro

Not Supported

Conversation

360AI_360gpt

The flagship hundred-billion-parameter large model in the 360 AI Brain series, with the best performance, widely applicable to complex task scenarios in various fields.

claude-3-5-sonnet-20240620

200k

16k

Not Supported

Conversation, Vision

Anthropic_claude

A snapshot version released on June 20, 2024. Claude 3.5 Sonnet is a model that balances performance and speed, offering top-tier performance while maintaining high speed, and supports multimodal input.

claude-3-5-haiku-20241022

200k

16k

Not Supported

Conversation

Anthropic_claude

A snapshot version released on October 22, 2024. Claude 3.5 Haiku has improved across various skills, including coding, tool use, and reasoning. As the fastest model in the Anthropic family, it provides rapid response times, suitable for applications requiring high interactivity and low latency, such as user-facing chatbots and instant code completion. It also excels in specialized tasks like data extraction and real-time content moderation, making it a versatile tool for wide application across industries. It does not support image input.

claude-3-5-sonnet-20241022

200k

Not Supported

Conversation, Vision

Anthropic_claude

A snapshot version released on October 22, 2024. Claude 3.5 Sonnet offers capabilities surpassing Opus and faster speeds than Sonnet, while maintaining the same price as Sonnet. Sonnet is particularly adept at programming, data science, visual processing, and agentic tasks.

claude-3-5-sonnet-latest

200K

Not Supported

Conversation, Vision

Anthropic_claude

Dynamically points to the latest Claude 3.5 Sonnet version. Claude 3.5 Sonnet offers capabilities surpassing Opus and faster speeds than Sonnet, while maintaining the same price as Sonnet. Sonnet is particularly adept at programming, data science, visual processing, and agentic tasks. This model points to the latest version.

claude-3-haiku-20240307

200k

Not Supported

Conversation, Vision

Anthropic_claude

Claude 3 Haiku is Anthropic's fastest and most compact model, designed for near-instantaneous responses. It features fast and accurate targeted performance.

claude-3-opus-20240229

200k

Not Supported

Conversation, Vision

Anthropic_claude

Claude 3 Opus is Anthropic's most powerful model for handling highly complex tasks. It excels in performance, intelligence, fluency, and comprehension.

claude-3-sonnet-20240229

200k

Not Supported

Conversation, Vision

Anthropic_claude

A snapshot version released on February 29, 2024. Sonnet is particularly adept at: - Coding: Can autonomously write, edit, and run code, with reasoning and troubleshooting capabilities - Data Science: Enhances human data science expertise; can process unstructured data when using multiple tools to gain insights - Visual Processing: Excels at interpreting charts, graphs, and images, accurately transcribing text to extract insights beyond the text itself - Agentic Tasks: Excellent tool use, making it ideal for handling agentic tasks (i.e., complex, multi-step problem-solving that requires interaction with other systems)

google/gemma-2-27b-it

Not Supported

Conversation

Google_gamma

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are decoder-only large language models that support English and come with open weights, pre-trained, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning.

google/gemma-2-9b-it

Not Supported

Conversation

Google_gamma

Gemma is one of the lightweight, state-of-the-art open model series developed by Google. It is a decoder-only large language model that supports English, with open weights, pre-trained, and instruction-tuned variants available. Gemma models are suitable for various text generation tasks, including question answering, summarization, and reasoning. This 9B model was trained on 8 trillion tokens.

gemini-1.5-pro

Not Supported

Conversation

Google_gemini

The latest stable version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. It is particularly suitable for tasks requiring complex reasoning.

gemini-1.0-pro-001

33k

Not Supported

Conversation

Google_gemini

This is a stable version of Gemini 1.0 Pro. As an NLP model, it specializes in tasks like multi-turn text and code chat, as well as code generation. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.

gemini-1.0-pro-002

32k

Not Supported

Conversation

Google_gemini

gemini-1.0-pro-latest

33k

Not Supported

Conversation, Deprecated or soon to be deprecated

Google_gemini

This is the latest version of Gemini 1.0 Pro. As an NLP model, it specializes in tasks like multi-turn text and code chat, as well as code generation. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.

gemini-1.0-pro-vision-001

16k

Not Supported

Conversation

Google_gemini

This is the vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.

gemini-1.0-pro-vision-latest

16k

Not Supported

Vision

Google_gemini

This is the latest vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.

gemini-1.5-flash

Not Supported

Conversation, Vision

Google_gemini

This is the latest stable version of Gemini 1.5 Flash. As a balanced multimodal model, it can process audio, image, video, and text inputs.

gemini-1.5-flash-001

Not Supported

Conversation, Vision

Google_gemini

This is a stable version of Gemini 1.5 Flash. It offers the same basic features as gemini-1.5-flash but is version-pinned, making it suitable for production environments.

gemini-1.5-flash-002

Not Supported

Conversation, Vision

Google_gemini

This is a stable version of Gemini 1.5 Flash. It offers the same basic features as gemini-1.5-flash but is version-pinned, making it suitable for production environments.

gemini-1.5-flash-8b

Not Supported

Conversation, Vision

Google_gemini

Gemini 1.5 Flash-8B is Google's latest multimodal AI model, designed for efficient handling of large-scale tasks. With 8 billion parameters, the model supports text, image, audio, and video inputs, making it suitable for various application scenarios such as chat, transcription, and translation. Compared to other Gemini models, Flash-8B is optimized for speed and cost-effectiveness, especially for cost-sensitive users. Its rate limit is doubled, allowing developers to handle large-scale tasks more efficiently. Additionally, Flash-8B uses "knowledge distillation" technology to extract key knowledge from larger models, ensuring it is lightweight and efficient while retaining core capabilities.

gemini-1.5-flash-exp-0827

Not Supported

Conversation, Vision

Google_gemini

This is an experimental version of Gemini 1.5 Flash, which is regularly updated with the latest improvements. It is suitable for exploratory testing and prototyping, but not recommended for production environments.

gemini-1.5-flash-latest

Not Supported

Conversation, Vision

Google_gemini

This is the cutting-edge version of Gemini 1.5 Flash, which is regularly updated with the latest improvements. It is suitable for exploratory testing and prototyping, but not recommended for production environments.

gemini-1.5-pro-001

Not Supported

Conversation, Vision

Google_gemini

This is a stable version of Gemini 1.5 Pro, offering fixed model behavior and performance characteristics. It is suitable for production environments that require stability.

gemini-1.5-pro-002

Not Supported

Conversation, Vision

Google_gemini

This is a stable version of Gemini 1.5 Pro, offering fixed model behavior and performance characteristics. It is suitable for production environments that require stability.

gemini-1.5-pro-exp-0801

Not Supported

Conversation, Vision

Google_gemini

An experimental version of Gemini 1.5 Pro. As a powerful multimodal model, it can handle up to 60,000 lines of code or 2,000 pages of text. It is particularly suitable for tasks requiring complex reasoning.

gemini-1.5-pro-exp-0827

Not Supported

Conversation, Vision

Google_gemini

gemini-1.5-pro-latest

Not Supported

Conversation, Vision

Google_gemini

This is the latest version of Gemini 1.5 Pro, dynamically pointing to the most recent snapshot version.

gemini-2.0-flash

Not Supported

Conversation, Vision

Google_gemini

Gemini 2.0 Flash is Google's latest model, featuring a faster Time to First Token (TTFT) compared to the 1.5 version, while maintaining a quality level comparable to Gemini Pro 1.5. This model shows significant improvements in multimodal understanding, coding ability, complex instruction following, and function calling, thereby providing a smoother and more powerful intelligent experience.

gemini-2.0-flash-exp

100k

Supported

Conversation, Vision

Google_gemini

Gemini 2.0 Flash introduces a real-time multimodal API, improved speed and performance, enhanced quality, stronger agent capabilities, and adds image generation and voice conversion functions.

gemini-2.0-flash-lite-preview-02-05

Not Supported

Conversation, Vision

Google_gemini

Gemini 2.0 Flash-Lite is Google's latest cost-effective AI model, offering better quality at the same speed as 1.5 Flash. It supports a 1 million token context window and can handle multimodal tasks involving images, audio, and code. As Google's most cost-effective model currently, it uses a simplified single pricing strategy, making it particularly suitable for large-scale application scenarios that require cost control.

gemini-2.0-flash-thinking-exp

40k

Not Supported

Conversation, Reasoning

Google_gemini

gemini-2.0-flash-thinking-exp is an experimental model that can generate the "thinking process" it goes through when formulating a response. Therefore, "thinking mode" responses have stronger reasoning capabilities compared to the basic Gemini 2.0 Flash model.

gemini-2.0-flash-thinking-exp-01-21

64k

Not Supported

Conversation, Reasoning

Google_gemini

Gemini 2.0 Flash Thinking EXP-01-21 is Google's latest AI model, focusing on enhancing reasoning abilities and user interaction experience. The model has strong reasoning capabilities, especially in math and programming, and supports a context window of up to 1 million tokens, suitable for complex tasks and in-depth analysis scenarios. Its unique feature is the ability to generate its thinking process, improving the comprehensibility of AI thinking. It also supports native code execution, enhancing the flexibility and practicality of interactions. By optimizing algorithms, the model reduces logical contradictions, further improving the accuracy and consistency of its answers.

gemini-2.0-flash-thinking-exp-1219

40k

Not Supported

Conversation, Reasoning, Vision

Google_gemini

gemini-2.0-flash-thinking-exp-1219 is an experimental model that can generate the "thinking process" it goes through when formulating a response. Therefore, "thinking mode" responses have stronger reasoning capabilities compared to the basic Gemini 2.0 Flash model.

gemini-2.0-pro-exp-01-28

64k

Not Supported

Conversation, Vision

Google_gemini

Pre-announced model, not yet online.

gemini-2.0-pro-exp-02-05

Not Supported

Conversation, Vision

Google_gemini

Gemini 2.0 Pro Exp 02-05 is Google's latest experimental model released in February 2024, excelling in world knowledge, code generation, and long-text understanding. The model supports an ultra-long context window of 2 million tokens, capable of processing content equivalent to 2 hours of video, 22 hours of audio, over 60,000 lines of code, and more than 1.4 million words. As part of the Gemini 2.0 series, this model adopts a new Flash Thinking training strategy, significantly improving its performance and ranking high on several LLM leaderboards, demonstrating strong comprehensive capabilities.

gemini-exp-1114

Not Supported

Conversation, Vision

Google_gemini

This is an experimental model released on November 14, 2024, primarily focusing on quality improvements.

gemini-exp-1121

Not Supported

Conversation, Vision, Code

Google_gemini

This is an experimental model released on November 21, 2024, with improvements in coding, reasoning, and visual capabilities.

gemini-exp-1206

Not Supported

Conversation, Vision

Google_gemini

This is an experimental model released on December 6, 2024, with improvements in coding, reasoning, and visual capabilities.

gemini-exp-latest

Not Supported

Conversation, Vision

Google_gemini

This is an experimental model, dynamically pointing to the latest version.

gemini-pro

33k

Not Supported

Conversation

Google_gemini

Same as gemini-1.0-pro, it is an alias for gemini-1.0-pro.

gemini-pro-vision

16k

Not Supported

Conversation, Vision

Google_gemini

This is the vision version of Gemini 1.0 Pro. This model will be discontinued on February 15, 2025, and it is recommended to migrate to the 1.5 series models.

grok-2

128k

Not Supported

Conversation

Grok_grok

A new version of the grok model released by X.ai on December 12, 2024.

grok-2-1212

128k

Not Supported

Conversation

Grok_grok

A new version of the grok model released by X.ai on December 12, 2024.

grok-2-latest

128k

Not Supported

Conversation

Grok_grok

A new version of the grok model released by X.ai on December 12, 2024.

grok-2-vision-1212

32k

Not Supported

Conversation, Vision

Grok_grok

The grok vision version model released by X.ai on December 12, 2024.

grok-beta

100k

Not Supported

Conversation

Grok_grok

Performance comparable to Grok 2, but with improved efficiency, speed, and functionality.

grok-vision-beta

Not Supported

Conversation, Vision

Grok_grok

The latest image understanding model can process various visual information, including documents, charts, screenshots, and photos.

internlm/internlm2_5-20b-chat

32k

Supported

Conversation

internlm

InternLM2.5-20B-Chat is an open-source large-scale conversational model developed based on the InternLM2 architecture. With 20 billion parameters, this model excels in mathematical reasoning, surpassing comparable models like Llama3 and Gemma2-27B. InternLM2.5-20B-Chat has significantly improved tool-calling capabilities, supporting information collection from hundreds of web pages for analysis and reasoning, and possessing stronger instruction understanding, tool selection, and result reflection abilities.

meta-llama/Llama-3.2-11B-Vision-Instruct

Not Supported

Conversation, Vision

Meta_llama

The current Llama series models can not only process text data but also image data. Some models in Llama 3.2 have added visual understanding functions. This model supports simultaneous input of text and image data, understands the image, and outputs text information.

meta-llama/Llama-3.2-3B-Instruct

32k

Not Supported

Conversation

Meta_llama

Meta Llama 3.2 multilingual Large Language Models (LLMs), where 1B and 3B are lightweight models that can run on edge and mobile devices. This model is the 3B version.

meta-llama/Llama-3.2-90B-Vision-Instruct

Not Supported

Conversation, Vision

Meta_llama

meta-llama/Llama-3.3-70B-Instruct

131k

Not Supported

Conversation

Meta_llama

Meta's latest 70B LLM, with performance comparable to Llama 3.1 405B.

meta-llama/Meta-Llama-3.1-405B-Instruct

32k

Not Supported

Conversation

Meta_llama

The Meta Llama 3.1 multilingual Large Language Model (LLM) collection is a set of pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. This model is the 405B version. The Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual conversations and outperform many available open-source and closed-source chat models on common industry benchmarks.

meta-llama/Meta-Llama-3.1-70B-Instruct

32k

Not Supported

Conversation

Meta_llama

Meta Llama 3.1 is a family of multilingual large language models developed by Meta, including pre-trained and instruction-tuned variants in 8B, 70B, and 405B parameter sizes. This 70B instruction-tuned model is optimized for multilingual conversation scenarios and performs excellently on several industry benchmarks. The model was trained on over 15 trillion tokens of public data and uses techniques like supervised fine-tuning and reinforcement learning with human feedback to enhance its usefulness and safety.

meta-llama/Meta-Llama-3.1-8B-Instruct

32k

Not Supported

Conversation

Meta_llama

The Meta Llama 3.1 multilingual Large Language Model (LLM) collection is a set of pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. This model is the 8B version. The Llama 3.1 instruction-tuned text models (8B, 70B, 405B) are optimized for multilingual conversations and outperform many available open-source and closed-source chat models on common industry benchmarks.

abab5.5-chat

16k

Supported

Conversation

Minimax_abab

Chinese persona conversation scenarios.

abab5.5s-chat

Supported

Conversation

Minimax_abab

Chinese persona conversation scenarios.

abab6.5g-chat

Supported

Conversation

Minimax_abab

Persona conversation scenarios in English and other languages.

abab6.5s-chat

245k

Supported

Conversation

Minimax_abab

General scenarios.

abab6.5t-chat

Supported

Conversation

Minimax_abab

Chinese persona conversation scenarios.

chatgpt-4o-latest

128k

16k

Not Supported

Conversation, Vision

OpenAI

The chatgpt-4o-latest model version continuously points to the GPT-4o version used in ChatGPT and is updated the fastest when there are significant changes.

gpt-4o-2024-11-20

128k

16k

Supported

Conversation

OpenAI

The latest gpt-4o snapshot version from November 20, 2024.

gpt-4o-audio-preview

128k

16k

Not Supported

Conversation

OpenAI

OpenAI's real-time voice conversation model.

gpt-4o-audio-preview-2024-10-01

128k

16k

Supported

Conversation

OpenAI

OpenAI's real-time voice conversation model.

128k

32k

Not Supported

Conversation, Reasoning, Vision

OpenAI

OpenAI's new reasoning model for complex tasks that require extensive common sense. The model has a 200k context, is currently the most powerful model in the world, and supports image recognition.

o1-mini-2024-09-12

128k

64k

Not Supported

Conversation, Reasoning

OpenAI

A fixed snapshot version of o1-mini. It is smaller, faster, and 80% cheaper than o1-preview, performing well in code generation and small-context operations.

o1-preview-2024-09-12

128k

32k

Not Supported

Conversation, Reasoning

OpenAI

A fixed snapshot version of o1-preview.

gpt-3.5-turbo

16k

Supported

Conversation

OpenAI_gpt-3

Based on GPT-3.5: GPT-3.5 Turbo is an improved version built on the GPT-3.5 model, developed by OpenAI. Performance Goals: Designed to improve model inference speed, processing efficiency, and resource utilization through optimized model structure and algorithms. Increased Inference Speed: Compared to GPT-3.5, GPT-3.5 Turbo typically offers faster inference speeds on the same hardware, which is particularly beneficial for applications requiring large-scale text processing. Higher Throughput: When processing a large number of requests or data, GPT-3.5 Turbo can achieve higher concurrent processing capabilities, thereby increasing overall system throughput. Optimized Resource Consumption: While maintaining performance, it may have reduced demand for hardware resources (such as memory and computing resources), which helps lower operating costs and improve system scalability. Wide Range of NLP Tasks: GPT-3.5 Turbo is suitable for a variety of natural language processing tasks, including but not limited to text generation, semantic understanding, dialogue systems, and machine translation. Developer Tools and API Support: Provides API interfaces that are easy for developers to integrate and use, supporting rapid application development and deployment.

gpt-3.5-turbo-0125

16k

Supported

Conversation

OpenAI_gpt-3

An updated GPT 3.5 Turbo model with higher accuracy in responding to requested formats and a fix for a bug that caused text encoding issues for non-English language function calls. Returns a maximum of 4,096 output tokens.

gpt-3.5-turbo-0613

16k

Supported

Conversation

OpenAI_gpt-3

Updated fixed snapshot version of GPT 3.5 Turbo. Now deprecated.

gpt-3.5-turbo-1106

16k

Supported

Conversation

OpenAI_gpt-3

Features improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens.

gpt-3.5-turbo-16k

16k

Supported

Conversation, Deprecated or soon to be deprecated

OpenAI_gpt-3

(Deprecated)

gpt-3.5-turbo-16k-0613

16k

Supported

Conversation, Deprecated or soon to be deprecated

OpenAI_gpt-3

A snapshot of gpt-3.5-turbo from June 13, 2023. (Deprecated)

gpt-3.5-turbo-instruct

Supported

Conversation

OpenAI_gpt-3

Capabilities similar to GPT-3 era models. Compatible with the legacy Completions endpoint, not for Chat Completions.

gpt-3.5o

16k

Not Supported

Conversation

OpenAI_gpt-3

Same as gpt-4o-lite.

gpt-4

Supported

Conversation

OpenAI_gpt-4

Currently points to gpt-4-0613.

gpt-4-0125-preview

128k

Supported

Conversation

OpenAI_gpt-4

The latest GPT-4 model, designed to reduce "laziness" where the model does not complete tasks. Returns a maximum of 4,096 output tokens.

gpt-4-0314

Supported

Conversation

OpenAI_gpt-4

A snapshot of gpt-4 from March 14, 2023.

gpt-4-0613

Supported

Conversation

OpenAI_gpt-4

A snapshot of gpt-4 from June 13, 2023, with enhanced function calling support.

gpt-4-1106-preview

128k

Supported

Conversation

OpenAI_gpt-4

A GPT-4 Turbo model with improved instruction following, JSON mode, reproducible outputs, function calling, and more. Returns a maximum of 4,096 output tokens. This is a preview model.

gpt-4-32k

32k

Supported

Conversation

OpenAI_gpt-4

gpt-4-32k will be deprecated on 2025-06-06.

gpt-4-32k-0613

32k

Supported

Conversation, Deprecated or soon to be deprecated

OpenAI_gpt-4

Will be deprecated on 2025-06-06.

gpt-4-turbo

128k

Supported

Conversation

OpenAI_gpt-4

The latest version of the GPT-4 Turbo model adds vision capabilities, supporting visual requests via JSON mode and function calling. The current version of this model is gpt-4-turbo-2024-04-09.

gpt-4-turbo-2024-04-09

128k

Supported

Conversation

OpenAI_gpt-4

GPT-4 Turbo model with vision capabilities. Vision requests can now be made via JSON mode and function calling. gpt-4-turbo currently points to this version.

gpt-4-turbo-preview

128k

Supported

Conversation, Vision

OpenAI_gpt-4

Currently points to gpt-4-0125-preview.

gpt-4o

128k

16k

Supported

Conversation, Vision

OpenAI_gpt-4

OpenAI's highly intelligent flagship model, suitable for complex, multi-step tasks. GPT-4o is cheaper and faster than GPT-4 Turbo.

gpt-4o-2024-05-13

128k

Supported

Conversation, Vision

OpenAI_gpt-4

The original gpt-4o snapshot from May 13, 2024.

gpt-4o-2024-08-06

128k

16k

Supported

Conversation, Vision

OpenAI_gpt-4

The first snapshot to support structured outputs. gpt-4o currently points to this version.

gpt-4o-mini

128k

16k

Supported

Conversation, Vision

OpenAI_gpt-4

OpenAI's affordable version of gpt-4o, suitable for fast, lightweight tasks. GPT-4o mini is cheaper and more powerful than GPT-3.5 Turbo. Currently points to gpt-4o-mini-2024-07-18.

gpt-4o-mini-2024-07-18

128k

16k

Supported

Conversation, Vision

OpenAI_gpt-4

A fixed snapshot version of gpt-4o-mini.

gpt-4o-realtime-preview

128k

Supported

Conversation, Real-time Voice

OpenAI_gpt-4

OpenAI's real-time voice conversation model.

gpt-4o-realtime-preview-2024-10-01

128k

Supported

Conversation, Real-time Voice, Vision

OpenAI_gpt-4

gpt-4o-realtime-preview currently points to this snapshot version.

o1-mini

128k

64k

Not Supported

Conversation, Reasoning

OpenAI_o1

Smaller, faster, and 80% cheaper than o1-preview, performing well in code generation and small-context operations.

o1-preview

128k

32k

Not Supported

Conversation, Reasoning

OpenAI_o1

o1-preview is a new reasoning model for complex tasks that require extensive common sense. The model has a 128K context and a knowledge cutoff of October 2023. It focuses on advanced reasoning and solving complex problems, including mathematical and scientific tasks. It is ideal for applications requiring deep contextual understanding and autonomous workflows.

o3-mini

200k

100k

Supported

Conversation, Reasoning

OpenAI_o1

o3-mini is OpenAI's latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on science, math, and coding tasks, supports developer features like structured output, function calling, and batch API, with a knowledge cutoff of October 2023, demonstrating a significant balance in reasoning capability and cost-effectiveness.

o3-mini-2025-01-31

200k

100k

Supported

Conversation, Reasoning

OpenAI_o1

o3-mini currently points to this version. o3-mini-2025-01-31 is OpenAI's latest small reasoning model, offering high intelligence while maintaining the same cost and latency as o1-mini. It focuses on science, math, and coding tasks, supports developer features like structured output, function calling, and batch API, with a knowledge cutoff of October 2023, demonstrating a significant balance in reasoning capability and cost-effectiveness.

Baichuan2-Turbo

32k

Not Supported

Conversation

Baichuan_baichuan

Compared to similarly sized models in the industry, this model maintains a leading performance while significantly reducing the price.

Baichuan3-Turbo

32k

Not Supported

Conversation

Baichuan_baichuan

Compared to similarly sized models in the industry, this model maintains a leading performance while significantly reducing the price.

Baichuan3-Turbo-128k

128k

Not Supported

Conversation

Baichuan_baichuan

The Baichuan model processes complex text with a 128k ultra-long context window, is specifically optimized for industries like finance, and significantly reduces costs while maintaining high performance, providing a cost-effective solution for enterprises.

Baichuan4

32k

Not Supported

Conversation

Baichuan_baichuan

Baichuan's MoE model provides a highly efficient and cost-effective solution for enterprise applications through specialized optimization, cost reduction, and performance enhancement.

Baichuan4-Air

32k

Not Supported

Conversation

Baichuan_baichuan

Baichuan's MoE model provides a highly efficient and cost-effective solution for enterprise applications through specialized optimization, cost reduction, and performance enhancement.

Baichuan4-Turbo

32k

Not Supported

Conversation

Baichuan_baichuan

Trained on massive high-quality scenario data, usability in high-frequency enterprise scenarios is improved by 10%+ compared to Baichuan4, information summarization by 50%, multilingual capabilities by 31%, and content generation by 13%. Specially optimized for inference performance, the first token response speed is increased by 51% and token stream speed by 73% compared to Baichuan4.

ERNIE-3.5-128K

128k

Supported

Conversation

Baidu_ernie

Baidu's self-developed flagship large language model, covering massive Chinese and English corpora, with powerful general capabilities to meet most dialogue, Q&A, creative generation, and plugin application requirements. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information.

ERNIE-3.5-8K

Supported

Conversation

Baidu_ernie

ERNIE-3.5-8K-Preview

Supported

Conversation

Baidu_ernie

ERNIE-4.0-8K

Supported

Conversation

Baidu_ernie

Baidu's self-developed flagship ultra-large-scale language model. Compared to ERNIE 3.5, it has a comprehensive upgrade in model capabilities, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information.

ERNIE-4.0-8K-Latest

Supported

Conversation

Baidu_ernie

ERNIE-4.0-8K-Latest has fully improved capabilities compared to ERNIE-4.0-8K, with significant enhancements in role-playing and instruction-following abilities. Compared to ERNIE 3.5, it has a comprehensive upgrade in model capabilities, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information, and supports 5K tokens input + 2K tokens output. This document introduces the method for calling the ERNIE-4.0-8K-Latest API.

ERNIE-4.0-8K-Preview

Supported

Conversation

Baidu_ernie

ERNIE-4.0-Turbo-128K

128k

Supported

Conversation

Baidu_ernie

ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large-scale language model with outstanding overall performance, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information. It has better performance compared to ERNIE 4.0. ERNIE-4.0-Turbo-128K is a version of the model with better overall performance on long documents than ERNIE-3.5-128K. This document introduces the relevant API and its usage.

ERNIE-4.0-Turbo-8K

Supported

Conversation

Baidu_ernie

ERNIE 4.0 Turbo is Baidu's self-developed flagship ultra-large-scale language model with outstanding overall performance, widely applicable to complex task scenarios in various fields. Supports automatic integration with the Baidu search plugin to ensure the timeliness of Q&A information. It has better performance compared to ERNIE 4.0. ERNIE-4.0-Turbo-8K is a version of the model. This document introduces the relevant API and its usage.

ERNIE-4.0-Turbo-8K-Latest

Supported

Conversation

Baidu_ernie

ERNIE-4.0-Turbo-8K-Preview

Supported

Conversation

Baidu_ernie

ERNIE-Character-8K

Not Supported

Conversation

Baidu_ernie

Baidu's self-developed vertical large language model, suitable for application scenarios such as game NPCs, customer service dialogues, and dialogue role-playing. It has a more distinct and consistent persona style, stronger instruction-following ability, and better inference performance.

ERNIE-Lite-8K

Not Supported

Conversation

Baidu_ernie

Baidu's self-developed lightweight large language model, balancing excellent model performance with inference efficiency, suitable for inference on low-power AI accelerator cards.

ERNIE-Lite-Pro-128K

128k

Supported

Conversation

Baidu_ernie

Baidu's self-developed lightweight large language model, with better performance than ERNIE Lite, balancing excellent model performance with inference efficiency, suitable for inference on low-power AI accelerator cards. ERNIE-Lite-Pro-128K supports a 128K context length and has better performance than ERNIE-Lite-128K.

ERNIE-Novel-8K

Not Supported

Conversation

Baidu_ernie

ERNIE-Novel-8K is Baidu's self-developed general-purpose large language model, with a significant advantage in novel continuation capabilities. It can also be used in scenarios like short dramas and movies.

ERNIE-Speed-128K

128k

Not Supported

Conversation

Baidu_ernie

Baidu's latest self-developed high-performance large language model released in 2024, with excellent general capabilities. It is suitable as a base model for fine-tuning to better handle specific scenario problems, while also having excellent inference performance.

ERNIE-Speed-8K

Not Supported

Conversation

Baidu_ernie

ERNIE-Speed-Pro-128K

128k

Not Supported

Conversation

Baidu_ernie

ERNIE Speed Pro is Baidu's latest self-developed high-performance large language model released in 2024, with excellent general capabilities. It is suitable as a base model for fine-tuning to better handle specific scenario problems, while also having excellent inference performance. ERNIE-Speed-Pro-128K is the initial version released on August 30, 2024, supporting a 128K context length and having better performance than ERNIE-Speed-128K.

ERNIE-Tiny-8K

Not Supported

Conversation

Baidu_ernie

Baidu's self-developed ultra-high-performance large language model, with the lowest deployment and fine-tuning costs in the ERNIE series.

Doubao-1.5-lite-32k

32k

12k

Supported

Conversation

Doubao_doubao

Doubao1.5-lite is also among the world's top-tier lightweight language models, matching or surpassing GPT-4o mini and Claude 3.5 Haiku on authoritative evaluation benchmarks for general knowledge (MMLU_pro), reasoning (BBH), math (MATH), and professional knowledge (GPQA).

Doubao-1.5-pro-256k

256k

12k

Supported

Conversation

Doubao_doubao

Doubao-1.5-Pro-256k, a fully upgraded version based on Doubao-1.5-Pro. Compared to Doubao-pro-256k/241115, the overall performance is significantly improved by 10%. The output length is greatly increased, supporting up to 12k tokens.

Doubao-1.5-pro-32k

32k

12k

Supported

Conversation

Doubao_doubao

Doubao-1.5-pro, a new generation flagship model with comprehensive performance upgrades, excelling in knowledge, code, reasoning, and more. It achieves world-leading performance on multiple public evaluation benchmarks, especially achieving the best scores on knowledge, code, reasoning, and Chinese authoritative benchmarks, with a composite score superior to top industry models like GPT4o and Claude 3.5 Sonnet.

Doubao-1.5-vision-pro

32k

12k

Not Supported

Conversation, Vision

Doubao_doubao

Doubao-1.5-vision-pro, a newly upgraded multimodal large model, supports image recognition of any resolution and extreme aspect ratios, enhancing visual reasoning, document recognition, detailed information understanding, and instruction-following capabilities.

Doubao-embedding

Supported

Embedding

Doubao_doubao

Doubao-embedding is a semantic vectorization model developed by ByteDance, primarily for vector retrieval scenarios. It supports Chinese and English, with a maximum context length of 4K. The following versions are currently available: text-240715: Maximum vector dimension of 2560, supports dimensionality reduction to 512, 1024, and 2048. Chinese and English retrieval performance is significantly improved compared to the text-240515 version, and this version is recommended. text-240515: Maximum vector dimension of 2048, supports dimensionality reduction to 512 and 1024.

Doubao-embedding-large

Not Supported

Embedding

Doubao_doubao

Chinese and English retrieval performance is significantly improved compared to the Doubao-embedding/text-240715 version.

Doubao-embedding-vision

Not Supported

Embedding

Doubao_doubao

Doubao-embedding-vision, a newly upgraded image-text multimodal vectorization model, is primarily for image-text multi-vector retrieval scenarios. It supports image input and Chinese/English text input, with a maximum context length of 8K.

Doubao-lite-128k

128k

Supported

Conversation

Doubao_doubao

Doubao-lite offers extremely fast response speeds and better cost-effectiveness, providing more flexible choices for customers in different scenarios. Supports inference and fine-tuning with a 128k context window.

Doubao-lite-32k

32k

Supported

Conversation

Doubao_doubao

Doubao-lite-4k

Supported

Conversation

Doubao_doubao

Doubao-pro-128k

128k

Supported

Conversation

Doubao_doubao

The flagship model with the best performance, suitable for handling complex tasks, with excellent results in reference Q&A, summarization, creation, text classification, role-playing, and other scenarios. Supports inference and fine-tuning with a 128k context window.

Doubao-pro-32k

32k

Supported

Conversation

Doubao_doubao

Doubao-pro-4k

Supported

Conversation

Doubao_doubao

step-1-128k

128k

Supported

Conversation

StepFun

The step-1-128k model is an ultra-large-scale language model capable of processing inputs of up to 128,000 tokens. This capability gives it a significant advantage in generating long-form content and performing complex reasoning, making it suitable for applications that require rich context, such as writing novels and scripts.

step-1-256k

256k

Supported

Conversation

StepFun

The step-1-256k model is one of the largest language models available, supporting inputs of 256,000 tokens. It is designed to meet extremely complex task requirements, such as large-scale data analysis and multi-turn dialogue systems, and can provide high-quality output in various domains.

step-1-32k

32k

Supported

Conversation

StepFun

The step-1-32k model extends the context window to support 32,000 tokens of input. This makes it perform excellently when handling long articles and complex conversations, suitable for tasks that require deep understanding and analysis, such as legal documents and academic research.

step-1-8k

Supported

Conversation

StepFun

The step-1-8k model is an efficient language model designed for processing shorter texts. It can perform reasoning within a context of 8,000 tokens, making it suitable for application scenarios that require quick responses, such as chatbots and real-time translation.

step-1-flash

Supported

Conversation

StepFun

The step-1-flash model focuses on rapid response and efficient processing, suitable for real-time applications. Its design allows it to provide high-quality language understanding and generation capabilities even with limited computing resources, making it suitable for mobile devices and edge computing scenarios.

step-1.5v-mini

32k

Supported

Conversation, Vision

StepFun

The step-1.5v-mini model is a lightweight version designed to run in resource-constrained environments. Despite its small size, it still retains good language processing capabilities, making it suitable for embedded systems and low-power devices.

step-1v-32k

32k

Supported

Conversation, Vision

StepFun

The step-1v-32k model supports inputs of 32,000 tokens, suitable for applications requiring longer context. It performs excellently in handling complex dialogues and long texts, making it suitable for fields such as customer service and content creation.

step-1v-8k

Supported

Conversation, Vision

StepFun

The step-1v-8k model is an optimized version designed for 8,000-token inputs, suitable for fast generation and processing of short texts. It strikes a good balance between speed and accuracy, making it suitable for real-time applications.

step-2-16k

16k

Supported

Conversation

StepFun

The step-2-16k model is a medium-sized language model supporting 16,000 tokens of input. It performs well in various tasks and is suitable for application scenarios such as education, training, and knowledge management.

yi-lightning

16k

Supported

Conversation

01.AI_yi

The latest high-performance model, ensuring high-quality output while significantly increasing inference speed. Suitable for real-time interaction and highly complex reasoning scenarios, its extremely high cost-effectiveness can provide excellent support for commercial products.

yi-vision-v2

16K

Supported

Conversation, Vision

01.AI_yi

Suitable for scenarios that require analyzing and interpreting images and charts, such as image Q&A, chart understanding, OCR, visual reasoning, education, research report understanding, or multilingual document reading.

qwen-14b-chat

Supported

Conversation

Qwen_qwen

Alibaba Cloud's official open-source version of Tongyi Qianwen.

qwen-72b-chat

32k

Supported

Conversation

Qwen_qwen

Alibaba Cloud's official open-source version of Tongyi Qianwen.

qwen-7b-chat

7.5k

1.5k

Supported

Conversation

Qwen_qwen

Alibaba Cloud's official open-source version of Tongyi Qianwen.

qwen-coder-plus

128k

Supported

Conversation, Code

Qwen_qwen

Qwen-Coder-Plus is a programming-specific model in the Qwen series, designed to enhance code generation and understanding capabilities. Trained on a large scale of programming data, this model can handle multiple programming languages and supports functions like code completion, error detection, and code refactoring. Its design goal is to provide developers with more efficient programming assistance and improve development efficiency.

qwen-coder-plus-latest

128k

Supported

Conversation, Code

Qwen_qwen

Qwen-Coder-Plus-Latest is the newest version of Qwen-Coder-Plus, incorporating the latest algorithm optimizations and dataset updates. This model shows significant performance improvements, enabling it to understand context more accurately and generate code that better meets developers' needs. It also introduces support for more programming languages, enhancing its multilingual programming capabilities.

qwen-coder-turbo

128k

Supported

Conversation, Code

Qwen_qwen

The Tongyi Qianwen series of code and programming models are language models specifically for programming and code generation, featuring fast inference speed and low cost. This version always points to the latest stable snapshot.

qwen-coder-turbo-latest

128k

Supported

Conversation, Code

Qwen_qwen

qwen-long

10m

Supported

Conversation

Qwen_qwen

Qwen-Long is a large language model from Tongyi Qianwen for ultra-long context processing scenarios. It supports input in different languages such as Chinese and English, and supports ultra-long context dialogues of up to 10 million tokens (about 15 million words or 15,000 pages of documents). Combined with the synchronously launched document service, it can parse and have dialogues on various document formats such as Word, PDF, Markdown, EPUB, and MOBI. Note: For requests submitted directly via HTTP, it supports a length of 1M tokens. For lengths exceeding this, it is recommended to submit via file.

qwen-math-plus

Supported

Conversation

Qwen_qwen

Qwen-Math-Plus is a model focused on solving mathematical problems, designed to provide efficient mathematical reasoning and calculation capabilities. Trained on a large number of math problems, this model can handle complex mathematical expressions and problems, supporting a variety of calculation needs from basic arithmetic to higher mathematics. Its application scenarios include education, scientific research, and engineering.

qwen-math-plus-latest

Supported

Conversation

Qwen_qwen

Qwen-Math-Plus-Latest is the newest version of Qwen-Math-Plus, integrating the latest mathematical reasoning techniques and algorithm improvements. This model performs better in handling complex mathematical problems, providing more accurate solutions and reasoning processes. It also expands its understanding of mathematical symbols and formulas, making it suitable for a wider range of mathematical applications.

qwen-math-turbo

Supported

Conversation

Qwen_qwen

Qwen-Math-Turbo is a high-performance mathematical model designed for fast calculation and real-time inference. This model optimizes calculation speed, enabling it to process a large number of mathematical problems in a very short time, suitable for application scenarios that require quick feedback, such as online education and real-time data analysis. Its efficient algorithms allow users to get instant results in complex calculations.

qwen-math-turbo-latest

Supported

Conversation

Qwen_qwen

Qwen-Math-Turbo-Latest is the newest version of Qwen-Math-Turbo, further improving calculation efficiency and accuracy. This model has undergone multiple algorithmic optimizations, enabling it to handle more complex mathematical problems and maintain high efficiency in real-time inference. It is suitable for mathematical applications that require rapid response, such as financial analysis and scientific computing.

qwen-max

32k

Supported

Conversation

Qwen_qwen

The Tongyi Qianwen 2.5 series hundred-billion-level ultra-large-scale language model supports input in different languages such as Chinese and English. As the model is upgraded, qwen-max will be updated on a rolling basis.

qwen-max-latest

32k

Supported

Conversation

Qwen_qwen

The best-performing model in the Tongyi Qianwen series. This model is a dynamically updated version, and model updates will not be announced in advance. It is suitable for complex, multi-step tasks. The model's comprehensive abilities in Chinese and English are significantly improved, human preference is significantly enhanced, reasoning ability and complex instruction understanding are significantly strengthened, performance on difficult tasks is better, and math and code abilities are significantly improved. It also has enhanced understanding and generation capabilities for structured data like tables and JSON.

qwen-plus

128k

Supported

Conversation

Qwen_qwen

A well-balanced model in the Tongyi Qianwen series, with inference performance and speed between Tongyi Qianwen-Max and Tongyi Qianwen-Turbo, suitable for moderately complex tasks. The model's comprehensive abilities in Chinese and English are significantly improved, human preference is significantly enhanced, reasoning ability and complex instruction understanding are significantly strengthened, performance on difficult tasks is better, and math and code abilities are significantly improved.

qwen-plus-latest

128k

Supported

Conversation

Qwen_qwen

Qwen-Plus is an enhanced version of the visual language model in the Tongyi Qianwen series, designed to improve detail recognition and text recognition capabilities. This model supports images with resolutions over one million pixels and any aspect ratio, performing excellently in a wide range of visual language tasks, making it suitable for applications requiring high-precision image understanding.

qwen-turbo

128k

Supported

Conversation

Qwen_qwen

The fastest and most cost-effective model in the Tongyi Qianwen series, suitable for simple tasks. The model's comprehensive abilities in Chinese and English are significantly improved, human preference is significantly enhanced, reasoning ability and complex instruction understanding are significantly strengthened, performance on difficult tasks is better, and math and code abilities are significantly improved.

qwen-turbo-latest

Supported

Conversation

Qwen_qwen

Qwen-Turbo is an efficient model designed for simple tasks, emphasizing speed and cost-effectiveness. It performs excellently in basic visual language tasks and is suitable for applications with strict response time requirements, such as real-time image recognition and simple Q&A systems.

qwen-vl-max

32k

Supported

Conversation

Qwen_qwen

Tongyi Qianwen VL-Max (qwen-vl-max), the ultra-large-scale visual language model from Tongyi Qianwen. Compared to the enhanced version, it further improves visual reasoning and instruction-following capabilities, providing a higher level of visual perception and cognition. It offers the best performance on more complex tasks.

qwen-vl-max-latest

32k

Supported

Conversation, Vision

Qwen_qwen

Qwen-VL-Max is the most advanced version in the Qwen-VL series, designed to solve complex multimodal tasks. It combines advanced visual and language processing technologies, capable of understanding and analyzing high-resolution images with extremely strong reasoning abilities, suitable for applications requiring deep understanding and complex reasoning.

qwen-vl-ocr

34k

Supported

Conversation, Vision

Qwen_qwen

Only supports OCR, not conversation.

qwen-vl-ocr-latest

34k

Supported

Conversation, Vision

Qwen_qwen

Only supports OCR, not conversation.

qwen-vl-plus

Supported

Conversation, Vision

Qwen_qwen

Tongyi Qianwen VL-Plus (qwen-vl-plus), the enhanced version of the Tongyi Qianwen large-scale visual language model. It significantly improves detail recognition and text recognition capabilities, supports images with resolutions over one million pixels and any aspect ratio. It provides excellent performance on a wide range of visual tasks.

qwen-vl-plus-latest

32k

Supported

Conversation, Vision

Qwen_qwen

Qwen-VL-Plus-Latest is the newest version of Qwen-VL-Plus, enhancing the model's multimodal understanding capabilities. It excels in the combined processing of images and text, making it suitable for applications that need to efficiently handle multiple input formats, such as intelligent customer service and content generation.

Qwen/Qwen2-1.5B-Instruct

32k

Not Supported

Conversation

Qwen_qwen

Qwen2-1.5B-Instruct is an instruction-tuned large language model in the Qwen2 series with a parameter size of 1.5B. Based on the Transformer architecture, the model uses SwiGLU activation functions, attention QKV biases, and group query attention. It excels in multiple benchmark tests for language understanding, generation, multilingual capabilities, coding, math, and reasoning, surpassing most open-source models.

Qwen/Qwen2-72B-Instruct

128k

Not Supported

Conversation

Qwen_qwen

Qwen2-72B-Instruct is an instruction-tuned large language model in the Qwen2 series with a parameter size of 72B. Based on the Transformer architecture, the model uses SwiGLU activation functions, attention QKV biases, and group query attention. It can handle large-scale inputs. The model excels in multiple benchmark tests for language understanding, generation, multilingual capabilities, coding, math, and reasoning, surpassing most open-source models.

Qwen/Qwen2-7B-Instruct

128k

Not Supported

Conversation

Qwen_qwen

Qwen2-7B-Instruct is an instruction-tuned large language model in the Qwen2 series with a parameter size of 7B. Based on the Transformer architecture, the model uses SwiGLU activation functions, attention QKV biases, and group query attention. It can handle large-scale inputs. The model excels in multiple benchmark tests for language understanding, generation, multilingual capabilities, coding, math, and reasoning, surpassing most open-source models.

Qwen/Qwen2-VL-72B-Instruct

32k

Not Supported

Conversation

Qwen_qwen

Qwen2-VL is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance in visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can understand videos over 20 minutes long for high-quality video-based Q&A, dialogue, and content creation. It also has complex reasoning and decision-making capabilities, and can be integrated with mobile devices, robots, etc., for automated operations based on visual environments and text instructions.

Qwen/Qwen2-VL-7B-Instruct

32k

Not Supported

Conversation

Qwen_qwen

Qwen2-VL-7B-Instruct is the latest iteration of the Qwen-VL model, achieving state-of-the-art performance in visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, and MTVQA. Qwen2-VL can be used for high-quality video-based Q&A, dialogue, and content creation, and also has complex reasoning and decision-making capabilities, and can be integrated with mobile devices, robots, etc., for automated operations based on visual environments and text instructions.

Qwen/Qwen2.5-72B-Instruct

128k

Not Supported

Conversation

Qwen_qwen

Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 72B model has significantly improved capabilities in areas such as coding and mathematics. It supports inputs of up to 128K tokens and can generate long texts of over 8K tokens.

Qwen/Qwen2.5-72B-Instruct-128K

128k

Not Supported

Conversation

Qwen_qwen

Qwen/Qwen2.5-7B-Instruct

128k

Not Supported

Conversation

Qwen_qwen

Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model has significantly improved capabilities in areas such as coding and mathematics. The model also provides multilingual support, covering over 29 languages, including Chinese and English. The model has significant improvements in instruction following, understanding structured data, and generating structured output (especially JSON).

Qwen/Qwen2.5-Coder-32B-Instruct

128k

Not Supported

Conversation, Code

Qwen_qwen

Qwen2.5-32B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 32B model has significantly improved capabilities in areas such as coding and mathematics. The model also provides multilingual support, covering over 29 languages, including Chinese and English. The model has significant improvements in instruction following, understanding structured data, and generating structured output (especially JSON).

Qwen/Qwen2.5-Coder-7B-Instruct

128k

Not Supported

Conversation

Qwen_qwen

Qwen/QwQ-32B-Preview

32k

16k

Not Supported

Conversation, Reasoning

Qwen_qwen

QwQ-32B-Preview is an experimental research model developed by the Qwen team, aimed at enhancing the reasoning capabilities of artificial intelligence. As a preview version, it demonstrates excellent analytical abilities, but also has some important limitations: 1. Language mixing and code-switching: The model may mix languages or switch between languages unexpectedly, affecting the clarity of the response. 2. Recursive reasoning loops: The model may enter a cyclic reasoning mode, leading to lengthy answers without a clear conclusion. 3. Safety and ethical considerations: The model requires strengthened safety measures to ensure reliable and safe performance, and users should exercise caution when using it. 4. Performance and benchmark limitations: The model performs excellently in mathematics and programming, but there is still room for improvement in other areas such as common sense reasoning and nuanced language understanding.

qwen1.5-110b-chat

32k

Not Supported

Conversation

Qwen_qwen

qwen1.5-14b-chat

Not Supported

Conversation

Qwen_qwen

qwen1.5-32b-chat

32k

Not Supported

Conversation

Qwen_qwen

qwen1.5-72b-chat

32k

Not Supported

Conversation

Qwen_qwen

qwen1.5-7b-chat

Not Supported

Conversation

Qwen_qwen

qwen2-57b-a14b-instruct

65k

Not Supported

Conversation

Qwen_qwen

Qwen2-72B-Instruct

Not Supported

Conversation

Qwen_qwen

qwen2-7b-instruct

128k

Not Supported

Conversation

Qwen_qwen

qwen2-math-72b-instruct

Not Supported

Conversation

Qwen_qwen

qwen2-math-7b-instruct

Not Supported

Conversation

Qwen_qwen

qwen2.5-14b-instruct

128k

Not Supported

Conversation

Qwen_qwen

qwen2.5-32b-instruct

128k

Not Supported

Conversation

Qwen_qwen

qwen2.5-72b-instruct

128k

Not Supported

Conversation

Qwen_qwen

qwen2.5-7b-instruct

128k

Not Supported

Conversation

Qwen_qwen

qwen2.5-coder-14b-instruct

128k

Not Supported

Conversation, Code

Qwen_qwen

qwen2.5-coder-32b-instruct

128k

Not Supported

Conversation, Code

Qwen_qwen

qwen2.5-coder-7b-instruct

128k

Not Supported

Conversation, Code

Qwen_qwen

qwen2.5-math-72b-instruct

Not Supported

Conversation

Qwen_qwen

qwen2.5-math-7b-instruct

Not Supported

Conversation

Qwen_qwen

deepseek-ai/DeepSeek-R1

64k

Not Supported

Conversation, Reasoning

DeepSeek_deepseek

The DeepSeek-R1 model is an open-source reasoning model based purely on reinforcement learning. It excels in tasks such as mathematics, code, and natural language reasoning, with performance comparable to OpenAI's o1 model and achieving excellent results in several benchmark tests.

deepseek-ai/DeepSeek-V2-Chat

128k

Not Supported

Conversation

DeepSeek_deepseek

DeepSeek-V2 is a powerful, cost-effective Mixture-of-Experts (MoE) language model. It was pre-trained on a high-quality corpus of 8.1 trillion tokens and further enhanced with Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). Compared to DeepSeek 67B, DeepSeek-V2 achieves stronger performance while saving 42.5% in training costs, reducing KV cache by 93.3%, and increasing maximum generation throughput by 5.76 times.

deepseek-ai/DeepSeek-V2.5

32k

Supported

Conversation

DeepSeek_deepseek

DeepSeek-V2.5 is an upgraded version of DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating the general and coding capabilities of the two previous versions. This model has been optimized in several aspects, including writing and instruction-following abilities, to better align with human preferences.

deepseek-ai/DeepSeek-V3

128k

Not Supported

Conversation

DeepSeek_deepseek

Open-source version of deepseek. Compared to the official version, it has a longer context and no issues with sensitive word refusal.

deepseek-chat

64k

Supported

Conversation

DeepSeek_deepseek

236B parameters, 64K context (API), top-ranked on the open-source leaderboard for Chinese comprehensive ability (AlignBench), and in the same tier as closed-source models like GPT-4-Turbo and ERNIE 4.0 in evaluations.

deepseek-coder

64k

Supported

Conversation, Code

DeepSeek_deepseek

deepseek-reasoner

64k

Supported

Conversation, Reasoning

DeepSeek_deepseek

DeepSeek-Reasoner (DeepSeek-R1) is the latest reasoning model from DeepSeek, designed to enhance reasoning capabilities through reinforcement learning training. The model's reasoning process involves a large amount of reflection and validation, enabling it to handle complex logical reasoning tasks, with a chain-of-thought length that can reach tens of thousands of words. DeepSeek-R1 excels in solving mathematical, coding, and other complex problems and has been widely applied in various scenarios, demonstrating its powerful reasoning ability and flexibility. Compared to other models, DeepSeek-R1's reasoning performance is close to that of top-tier closed-source models, showcasing the potential and competitiveness of open-source models in the field of reasoning.

hunyuan-code

Not Supported

Conversation, Code

Tencent_hunyuan

Hunyuan's latest code generation model. The base model was augmented with 200B high-quality code data and trained with high-quality SFT data for half a year. The context window length has been increased to 8K. It ranks at the top in automatic evaluation metrics for code generation in five major languages. In high-quality manual evaluations of 10 comprehensive code tasks across five major languages, its performance is in the top tier.

hunyuan-functioncall

28k

Supported

Conversation

Tencent_hunyuan

Hunyuan's latest MOE architecture FunctionCall model, trained with high-quality FunctionCall data, with a context window of up to 32K, leading in evaluation metrics across multiple dimensions.

hunyuan-large

28k

Not Supported

Conversation

Tencent_hunyuan

The Hunyuan-large model has a total of about 389B parameters, with about 52B activated parameters, making it the open-source MoE model with the largest parameter scale and best performance in the industry.

hunyuan-large-longcontext

128k

Not Supported

Conversation

Tencent_hunyuan

Excels at handling long-text tasks such as document summarization and document Q&A, while also being capable of handling general text generation tasks. It performs excellently in the analysis and generation of long texts, effectively handling complex and detailed long-form content processing needs.

hunyuan-lite

250k

Not Supported

Conversation

Tencent_hunyuan

Upgraded to an MOE structure with a 256k context window, leading many open-source models in NLP, code, math, and industry-specific evaluation sets.

hunyuan-pro

28k

Supported

Conversation

Tencent_hunyuan

A trillion-parameter scale MOE-32K long-text model. It achieves an absolute leading level on various benchmarks, with complex instruction and reasoning capabilities, complex mathematical abilities, and supports functioncall. It is specially optimized for applications in multilingual translation, finance, law, and medicine.

hunyuan-role

28k

Not Supported

Conversation

Tencent_hunyuan

Hunyuan's latest role-playing model. This is a role-playing model officially fine-tuned and launched by Hunyuan, based on the Hunyuan model and augmented with role-playing scenario datasets, providing better foundational performance in role-playing scenarios.

hunyuan-standard

30k

Not Supported

Conversation

Tencent_hunyuan

Adopts a better routing strategy, while also alleviating the problems of load balancing and expert convergence. MOE-32K has a relatively higher cost-performance ratio and can handle long text inputs while balancing performance and price.

hunyuan-standard-256K

250k

Not Supported

Conversation

Tencent_hunyuan

Adopts a better routing strategy, while also alleviating the problems of load balancing and expert convergence. For long texts, the "needle in a haystack" metric reaches 99.9%. MOE-256K further breaks through in length and performance, greatly expanding the input length.

hunyuan-translation-lite

Not Supported

Conversation

Tencent_hunyuan

The Hunyuan translation model supports natural language conversational translation; it supports mutual translation between Chinese and 15 languages including English, Japanese, French, Portuguese, Spanish, Turkish, Russian, Arabic, Korean, Italian, German, Vietnamese, Malay, and Indonesian.

hunyuan-turbo

28k

Supported

Conversation

Tencent_hunyuan

The default version of the Hunyuan-turbo model, which uses a new Mixture-of-Experts (MoE) structure, resulting in faster inference efficiency and stronger performance compared to hunyuan-pro.

hunyuan-turbo-latest

28k

Supported

Conversation

Tencent_hunyuan

The dynamically updated version of the Hunyuan-turbo model. It is the best-performing version in the Hunyuan model series, consistent with the C-end (Tencent Yuanbao).

hunyuan-turbo-vision

Supported

Vision, Conversation

Tencent_hunyuan

Hunyuan's new generation flagship visual language model, using a new Mixture-of-Experts (MoE) structure. Its capabilities in basic recognition, content creation, knowledge Q&A, and analysis/reasoning related to image-text understanding are comprehensively improved compared to the previous generation model. Max input 6k, max output 2k.

hunyuan-vision

Supported

Conversation, Vision

Tencent_hunyuan

Hunyuan's latest multimodal model, supporting image + text input to generate text content. Basic Image Recognition: Recognizes subjects, elements, scenes, etc., in images. Image Content Creation: Summarizes images, creates advertising copy, social media posts, poems, etc. Multi-turn Image Dialogue: Engages in multi-turn interactive Q&A about a single image. Image Analysis and Reasoning: Performs statistical analysis on logical relationships, math problems, code, and charts in images. Image Knowledge Q&A: Answers questions about knowledge points contained in images, such as historical events, movie posters. Image OCR: Recognizes text in images from natural life scenes and non-natural scenes.

SparkDesk-Lite

Not Supported

Conversation

Spark_SparkDesk

Supports online web search function, with fast and convenient responses, suitable for low-power inference and model fine-tuning and other customized scenarios.

SparkDesk-Max

128k

Supported

Conversation

Spark_SparkDesk

Quantized from the latest Spark Large Model Engine 4.0 Turbo. It supports multiple built-in plugins such as web search, weather, and date. Core capabilities are fully upgraded, with universal improvements in application effects across various scenarios. Supports System role persona and FunctionCall.

SparkDesk-Max-32k

32k

Supported

Conversation

Spark_SparkDesk

Stronger reasoning: Enhanced context understanding and logical reasoning abilities. Longer input: Supports 32K tokens of text input, suitable for long document reading, private knowledge Q&A, and other scenarios.

SparkDesk-Pro

128k

Not Supported

Conversation

Spark_SparkDesk

Specially optimized for scenarios such as math, code, medicine, and education. Supports multiple built-in plugins like web search, weather, and date, covering most knowledge Q&A, language understanding, and text creation scenarios.

SparkDesk-Pro-128K

128k

Not Supported

Conversation

Spark_SparkDesk

Professional-grade large language model with tens of billions of parameters. It has been specially optimized for scenarios in medicine, education, and code, with lower latency in search scenarios. Suitable for business scenarios that have higher requirements for performance and response speed, such as text and intelligent Q&A.

moonshot-v1-128k

128k

Supported

Conversation

Moonshot AI_moonshot

A model with a length of 8k, suitable for generating short text.

moonshot-v1-32k

32k

Supported

Conversation

Moonshot AI_moonshot

A model with a length of 32k, suitable for generating long text.

moonshot-v1-8k

Supported

Conversation

Moonshot AI_moonshot

A model with a length of 128k, suitable for generating ultra-long text.

codegeex-4

128k

Not Supported

Conversation, Code

Zhipu_codegeex

Zhipu's code model: suitable for automatic code completion tasks.

charglm-3

Not Supported

Conversation

Zhipu_glm

Persona model.

emohaa

Not Supported

Conversation

Zhipu_glm

Psychology model: possesses professional counseling abilities to help users understand emotions and cope with emotional problems.

glm-3-turbo

128k

Not Supported

Conversation

Zhipu_glm

To be deprecated (June 30, 2025).

glm-4

128k

Supported

Conversation

Zhipu_glm

Old flagship: released on January 16, 2024, now replaced by GLM-4-0520.

glm-4-0520

128k

Supported

Conversation

Zhipu_glm

High-intelligence model: suitable for handling highly complex and diverse tasks.

glm-4-air

128k

Supported

Conversation

Zhipu_glm

High cost-performance: the most balanced model between inference capability and price.

glm-4-airx

Supported

Conversation

Zhipu_glm

Extremely fast inference: has ultra-fast inference speed and powerful inference effects.

glm-4-flash

128k

Supported

Conversation

Zhipu_glm

High speed, low price: ultra-fast inference speed.

glm-4-flashx

128k

Supported

Conversation

Zhipu_glm

High speed, low price: Enhanced version of Flash, ultra-fast inference speed.

glm-4-long

Supported

Conversation

Zhipu_glm

Ultra-long input: specially designed for handling ultra-long text and memory-intensive tasks.

glm-4-plus

128k

Supported

Conversation

Zhipu_glm

High-intelligence flagship: comprehensive performance improvement, with significantly enhanced long-text and complex task capabilities.

glm-4v

Not Supported

Conversation, Vision

Zhipu_glm

Image understanding: possesses image understanding and reasoning capabilities.

glm-4v-flash

Not Supported

Conversation, Vision

Zhipu_glm

Free model: possesses powerful image understanding capabilities.

上一页Privacy Policy 下一页Model Leaderboard

最后更新于2天前

这有帮助吗？