Conversation Interface
Assistant and Topic
Assistant
Assistant applies some personalized settings to the selected model to use it, such as prompt presets and parameter presets, so that the selected model can better match your expected work.
System Default Assistant has preset relatively general parameters (no prompt). You can use it directly or go to the Agent page to find the preset you need to use.
Topic
Assistant is the Topic superset of, and multiple topics (i.e., conversations) can be created under a single assistant. All Topic share Assistant the parameter settings and prompt and other model settings.


Buttons in the chat box

New Topic Create a new topic within the current assistant.
Upload Image or Document Uploading images requires model support. Uploaded documents will be automatically parsed into text and provided to the model as context.
Web Search You must configure web search-related information in Settings. The search results are returned to the large model as context. For details, see Online Mode.
Knowledge Base Enable the knowledge base. For details, see Knowledge Base Tutorial.
MCP Server Enable the MCP server feature. For details, see MCP Usage Tutorial.
Generate Image Only when the selected Chat Model supports image generation will this be shown. (For non-chat image generation models, please go to Drawing)
Select Model Switch to the specified model for the upcoming conversation while retaining context.
Quick Phrases You need to preset commonly used phrases in Settings first, then invoke them here and input directly. Variables are supported.
Clear Messages Delete all content under this topic.
Expand Make the chat box larger to facilitate long-text input.
Clear Context Without deleting content, truncate the context available to the model—in other words, the model will “forget” previous conversation content.
Estimated Token Count Display the estimated token count. The four figures are, respectively, Current context count , Maximum context count (∞ indicates unlimited context), Word count in the current input box , Estimated Token Count .
Translate Translate the content in the current input box into English.
Conversation Settings

Model Settings
The model settings and the assistant settings have Model Settings parameter synchronization. For details, see Assistant Settings.
Message Settings
Message Divider:
Message Divider:Use a divider to separate the message body from the action bar.


Use serif font:
Use serif font:Switch font style. You can now also use custom CSS to change the font.
Show line numbers in code:
Show line numbers in code:Display line numbers for code blocks when the model outputs code snippets.


Collapsible code blocks:
Collapsible code blocks:When enabled, if the code in a snippet is long, the code block will be automatically collapsed.
Wrappable code blocks:
Wrappable code blocks:When enabled, if a single line in a code snippet is long (exceeds the window), it will automatically wrap.
Auto-collapse thinking content:
Auto-collapse thinking content:When enabled, models that support thinking will automatically collapse the thinking process after completion.
Message Style:
Message Style:You can switch the conversation interface to bubble style or list style.
Code Style:
Code Style:You can switch the display style of code snippets.
Math Formula Engine:
Math Formula Engine:KaTeX renders faster because it is specifically designed for performance optimization;
MathJax renders more slowly but is more feature-complete and supports more mathematical symbols and commands.
Message Font Size:
Message Font Size:Adjust the font size in the conversation interface.
Input Settings
Show Estimated Token Count:
Show Estimated Token Count:Display the estimated number of tokens consumed by the input text in the input box (not the actual tokens consumed by context, for reference only).
Paste long text as file:
Paste long text as file:When you paste long text copied from elsewhere into the input box, it will automatically display as a file style to reduce interference with subsequent input.
Render input messages with Markdown:
Render input messages with Markdown:When off, only the model’s reply messages are rendered, and sent messages are not rendered.


Translate by hitting space 3 times quickly:
Translate by hitting space 3 times quickly:After entering a message in the conversation input box, hitting the spacebar three times in a row will translate the input into English.
Note: This operation will overwrite the original text.
Target Language:
Target Language:Set the target language for the input box translation button and the quick triple-space translation.
Assistant Settings
In the assistant interface, select theAssistant Namethat needs to be set → in thecontext menuchoose the corresponding setting
Edit Assistant

Prompt Settings
Name:
Name:You can customize an assistant name that is easy to recognize.
Prompt:
Prompt:i.e., prompt. You can refer to the prompt writing style on the Agent page to edit the content.
Model Settings
Default Model:
Default Model:You can fix a default model for this assistant. When adding from the Agent page or copying the assistant, the initial model will be this model. If not set, the initial model will be the global initial model (i.e., Default assistant model ).
Auto Reset Model:
Auto Reset Model:When on — if you switch to other models during use under this topic, creating a new topic will reset the new topic to the assistant’s default model. When this is off, the model for a new topic will follow the model used in the previous topic.
For example, if the assistant’s default model is gpt-3.5-turbo, and I create Topic 1 under this assistant, then switch to gpt-4o during the conversation in Topic 1, then:
If auto reset is on: when creating Topic 2, the default model selected for Topic 2 will be gpt-3.5-turbo;
If auto reset is off: when creating Topic 2, the default model selected for Topic 2 will be gpt-4o.
Temperature :
Temperature :The temperature parameter controls the randomness and creativity of the model’s text generation (default is 0.7). Specifically:
Low temperature (0–0.3):
More deterministic and focused outputs
Suitable for scenarios requiring accuracy, such as code generation and data analysis
Tends to choose the most likely words
Medium temperature (0.4–0.7):
Balances creativity and coherence
Suitable for daily conversation and general writing
Recommended for chatbot conversations (around 0.5)
High temperature (0.8–1.0):
Produces more creative and diverse outputs
Suitable for creative writing and brainstorming
But may reduce textual coherence
Top P (Nucleus Sampling):
Top P (Nucleus Sampling):Default is 1. The smaller the value, the more monotonous and easier to understand the AI’s generated content; the larger the value, the wider and more diverse the AI’s vocabulary in replies.
Nucleus sampling affects output by controlling the probability threshold for word selection:
Smaller values (0.1–0.3):
Consider only the highest-probability words
More conservative and controllable outputs
Suitable for scenarios like code comments and technical documentation
Medium values (0.4–0.6):
Balance vocabulary diversity and accuracy
Suitable for general conversations and writing tasks
Larger values (0.7–1.0):
Consider a broader range of word choices
Produce richer and more diverse content
Suitable for scenarios requiring diverse expression, such as creative writing
Context Window
Context WindowThe number of messages to keep in context. The larger the value, the longer the context and the more tokens consumed:
5–10: suitable for normal conversations
>10: complex tasks that require longer memory (e.g., step-by-step long-form generation according to an outline, where you need to ensure logical coherence in generated context)
Note: The more messages, the greater the token consumption
Enable message length limit (MaxToken)
Enable message length limit (MaxToken)Maximum per-response Tokens In large language models, max tokens is a key parameter that directly affects the quality and length of generated answers.
For example: In CherryStudio, after filling in the key to test whether the model is connected, you only need to know whether the model returns a message correctly and not any specific content. In such cases, set MaxToken to 1.
Most models have a MaxToken upper limit of 32k tokens. There are also 64k or even more; check the corresponding introduction page for details.
How much to set depends on your needs; you can also refer to the following suggestions.
Suggestions:
Casual chat: 500–800
Short text generation: 800–2000
Code generation: 2000–3600
Long text generation: 4000 and above (requires model support)
Generally, the model’s generated answers will be constrained within MaxToken. Of course, truncation (e.g., when writing long code) or incomplete expression may occur. In special cases, adjust flexibly based on the situation.
Streaming Output (Stream)
Streaming Output (Stream)Streaming output is a data processing method that allows data to be transmitted and processed as a continuous stream, rather than sending all data at once. This allows data to be processed and output immediately upon generation, greatly improving real-time performance and efficiency.
In the CherryStudio client and similar environments, this is simply the typewriter effect.
When off (non-streaming): the model outputs the entire segment at once after finishing generating the message (imagine receiving a message in WeChat);
When on: output character by character; you can understand it as the large model sending each character to you as soon as it generates it, until it’s all sent.
Custom Parameters
Custom ParametersAdd extra request parameters in the request body, such as presence_penalty and other fields, which most people generally won’t need.
The above parameters such as top-p, max tokens, stream are among these parameters.
Filling format: parameter name — parameter type (text, number, etc.) — value. Reference documentation:Click to go
Last updated
Was this helpful?