Voice Features
This feature was shelved because the relevant developer did not continue maintaining the PR.
Cherry Studio Voice Features Instructions
1. Overview of Voice Features
Cherry Studio provides three major voice feature modules: TTS (text-to-speech), ASR (automatic speech recognition) and voice calls. These features allow you to interact with AI naturally by voice, enhancing the user experience.
TTS (text-to-speech): converts AI reply text into spoken output
ASR (automatic speech recognition): converts your speech into text input
Voice calls: combines TTS and ASR to achieve a ChatGPT-like voice conversation experience
2. TTS (text-to-speech) Feature
Supported service types
Cherry Studio supports four types of TTS services:
OpenAI: uses OpenAI's TTS API, requires an API key
Browser TTS: uses the browser's built-in speech synthesis, free and requires no configuration
Siliconflow: uses Siliconflow's TTS service, requires an API key
Free online TTS: uses a free online TTS service, no API key required
Setup method
Go to the settings page and select the "Voice Features" tab
In the "TTS" sub-tab:
Enable the TTS feature (turn on the switch)
Select the TTS service type
Configure the corresponding parameters according to the selected service type:
OpenAI: enter the API key, API address, and choose voice and model
Browser TTS: choose voice
Siliconflow: enter the API key, API address, choose voice, model, response format and speech rate
Free online TTS: choose voice and output format
Configure TTS filtering options (optional):
Filter out thinking process
Filter out Markdown markup
Filter out code blocks
Set whether to show the TTS progress bar
Click the "Test TTS" button to test whether the configuration is correct
How to use
After enabling TTS, AI replies will be automatically converted to spoken output
In the chat interface, a TTS play button will appear under each AI reply
Click the play button to play/pause the audio
If the TTS progress bar is enabled, playback progress will be shown under the text
Long texts will be automatically segmented for synthesis and played continuously
3. ASR (automatic speech recognition) Feature
Supported service types
Cherry Studio supports three types of ASR services:
OpenAI: uses OpenAI's Whisper model, requires an API key
Browser: uses the browser's built-in speech recognition, free and requires no configuration
Local server: connects to a local WebSocket server for speech recognition
Setup method
Go to the settings page and select the "Voice Features" tab
In the "ASR" sub-tab:
Enable the ASR feature (turn on the switch)
Select the ASR service type
Configure the corresponding parameters according to the selected service type:
OpenAI: enter the API key, API address, and choose the model
Browser: no additional configuration required
Local server: you can set whether to automatically start the ASR server when the app launches
Select the speech recognition language (default is Chinese)
Click the "Test ASR" button to test whether the configuration is correct
How to use
After enabling ASR, a speech recognition button will appear next to the input box
Click the speech recognition button to start recording
After speaking, the speech will be converted to text and filled into the input box
Click the button again to end recording
Speech recognition supports continuous recognition of multiple sentences, using an accumulation mode
4. Voice Call Feature
Feature characteristics
Combines TTS and ASR to achieve a ChatGPT-like voice conversation experience
Uses a draggable floating window interface
Supports long-press-to-speak mode
Supports custom hotkeys
Supports window collapsing
You can choose a dedicated voice call model
Supports custom prompt phrases
Setup method
Go to the settings page and select the "Voice Features" tab
In the "Call Feature" sub-tab:
Enable the voice call feature (turn on the switch)
Click the "Select Model" button to choose the AI model used for voice calls
Customize voice call prompt phrases in the prompt text box (optional)
Click the "Save" button to save prompts, or click "Reset" to restore default prompts
How to use
In the chat interface, click the voice call button (phone icon) to the right of the input box
The voice call window will open and play a welcome voice message
Long-press the "Hold to Talk" button to start recording (or use the configured hotkey)
Release the button to end recording and send it to the AI for processing
AI generates a reply and plays it via TTS
Use the control buttons in the window:
Mute/Unmute button: controls TTS output
Pause/Resume button: pauses or resumes the conversation
Settings button: configure hotkeys
Collapse button: collapse the window, leaving only the hold-to-talk row
Click the close button to end the call
Hotkey settings
In the voice call window, click the settings button
In the popped-up settings panel, click the hotkey button
Press the key you want to set (such as Space, Shift, etc.)
Click the "Save" button to save the setting
When using, hold the configured hotkey to start recording, release to end recording and send
5. FAQ and Solutions
TTS-related issues
Issue: TTS cannot play sound. Solution: Check whether TTS is enabled, ensure the correct service type is selected and required parameters are configured.
Issue: Poor TTS playback quality. Solution: Try switching to a different TTS service type or voice.
Issue: Error message displayed during TTS playback. Solution: Check whether the API key is correct and whether the network connection is normal.
ASR-related issues
Issue: ASR cannot recognize speech. Solution: Check whether ASR is enabled, ensure the correct service type is selected and required parameters are configured.
Issue: Low ASR recognition accuracy. Solution: Try switching to a different ASR service type, or adjust microphone position and volume.
Issue: ASR server connection failed. Solution: Check whether the local server is running properly, or try restarting the app.
Voice call-related issues
Issue: Voice call window cannot open. Solution: Check whether the voice call feature is enabled and ensure TTS and ASR are configured correctly.
Issue: Long-press to speak does not respond. Solution: Check whether microphone permissions have been granted, or try restarting the voice call.
Issue: AI replies have no spoken output. Solution: Check whether TTS is enabled and ensure it is not muted.
6. Advanced Settings and Customization Options
TTS advanced settings
Filtering options: you can choose to filter thinking process, Markdown markup, and code blocks to make TTS playback smoother
Progress bar display: you can choose whether to show the TTS progress bar
Custom voices and models: you can add custom voice and model options
ASR advanced settings
Auto-start server: you can set whether to automatically start the ASR server when the app launches
Language selection: you can choose different speech recognition languages
Voice call advanced settings
Custom prompt phrases: you can customize voice call prompts to guide how the AI responds in voice call mode
Dedicated model selection: you can choose a dedicated AI model for voice calls, separate from the model used in the current conversation
Hotkey customization: you can set custom hotkeys to control recording
7. Usage Recommendations
Choose a suitable TTS service:
If you pursue high-quality speech, it is recommended to use OpenAI or Siliconflow
If you don't want to configure an API, you can use Browser TTS or free online TTS
Choose a suitable ASR service:
If you pursue high accuracy, it is recommended to use OpenAI
If you don't want to configure an API, you can use the browser's built-in speech recognition
Optimize voice call experience:
Using headphones can prevent TTS output from being picked up again by ASR
Using it in a quiet environment can improve recognition accuracy
Using custom prompt phrases can make AI replies more suitable for voice playback
Adjust settings according to needs:
If you mainly use text communication, you can enable only the TTS feature
If you mainly use voice input, you can enable only the ASR feature
If you need a full voice conversation experience, enable the voice call feature
We hope this manual helps you make full use of Cherry Studio's voice features and enjoy a more natural and convenient AI interaction experience!
Last updated
Was this helpful?