# Voice Features

{% hint style="warning" %}
This feature was shelved because the relevant developer did not continue maintaining the PR.
{% endhint %}

Cherry Studio Voice Features User Guide

I. Overview of Voice Features

Cherry Studio provides three major voice feature modules: TTS (text-to-speech), ASR (automatic speech recognition), and voice calls. These features allow you to communicate naturally with AI through voice, improving the user experience.

* TTS (text-to-speech): converts AI response text into spoken output
* ASR (automatic speech recognition): converts your speech into text input
* Voice calls: combines TTS and ASR to deliver a voice conversation experience similar to ChatGPT

II. TTS (Text-to-Speech) Features

1. Supported service types

Cherry Studio supports four types of TTS services:

* OpenAI: uses OpenAI's TTS API, requires an API key
* Browser TTS: uses the browser's built-in speech synthesis feature, free and requires no configuration
* SiliconFlow: uses SiliconFlow's TTS service, requires an API key
* Free Online TTS: uses a free online TTS service, no API key required

2. Setup method
3. Go to the Settings page and select the "Voice Features" tab
4. In the "TTS" sub-tab:
   * Enable the TTS feature (turn on the switch)
   * Select the TTS service type
   * Configure the corresponding parameters according to the selected service type:
     * OpenAI: enter API key, API address, and select voice and model
     * Browser TTS: select a voice
     * SiliconFlow: enter API key, API address, select voice, model, response format, and speech rate
     * Free Online TTS: select voice and output format
5. Configure TTS filtering options (optional):
   * Filter reasoning process
   * Filter Markdown markup
   * Filter code blocks
6. Set whether to display the TTS progress bar
7. Click the "Test TTS" button to test whether the configuration is correct
8. How to use

* After enabling TTS, AI responses will automatically be converted into spoken output
* In the chat interface, a TTS play button will be shown below each AI response
* Click the play button to play/pause the voice
* If the TTS progress bar is enabled, playback progress will be shown below the text
* Long text will automatically be synthesized in segments and played continuously

III. ASR (Automatic Speech Recognition) Features

1. Supported service types

Cherry Studio supports three types of ASR services:

* OpenAI: uses OpenAI's Whisper model, requires an API key
* Browser: uses the browser's built-in speech recognition feature, free and requires no configuration
* Local server: connects to a local WebSocket server for speech recognition

2. Setup method
3. Go to the Settings page and select the "Voice Features" tab
4. In the "ASR" sub-tab:
   * Enable the ASR feature (turn on the switch)
   * Select the ASR service type
   * Configure the corresponding parameters according to the selected service type:
     * OpenAI: enter API key, API address, and select model
     * Browser: no additional configuration required
     * Local server: you can choose whether to automatically start the ASR server when the app launches
   * Select the speech recognition language (default is Chinese)
5. Click the "Test ASR" button to test whether the configuration is correct
6. How to use

* After enabling ASR, a speech recognition button will appear next to the input box
* Click the speech recognition button to start recording
* After speaking, the speech will be converted to text and entered into the input box
* Click the button again to stop recording
* Speech recognition supports continuous recognition of multiple sentences using accumulation mode

IV. Voice Call Features

1. Features

* Combines TTS and ASR to deliver a voice conversation experience similar to ChatGPT
* Uses a draggable floating window interface
* Supports press-and-hold-to-speak mode
* Supports custom hotkeys
* Supports window collapsing
* You can choose a dedicated voice call model
* Supports custom prompt text

2. Setup method
3. Go to the Settings page and select the "Voice Features" tab
4. In the "Call Features" sub-tab:
   * Enable the voice call feature (turn on the switch)
   * Click the "Select Model" button to choose the AI model for voice calls
   * Customize the voice call prompt in the prompt text box (optional)
   * Click the "Save" button to save the prompt, or click the "Reset" button to restore the default prompt
5. How to use
6. In the chat interface, click the voice call button (phone icon) to the right of the input box
7. The voice call window will open and play a welcome voice
8. Press and hold the "Press and Hold to Speak" button to start recording (or use the configured hotkey)
9. Release the button to stop recording and send it to AI for processing
10. AI generates a response and plays it through TTS
11. Use the control buttons in the window:
    * Mute/Unmute button: controls TTS output
    * Pause/Resume button: pauses or resumes the conversation
    * Settings button: configure hotkeys
    * Collapse button: collapses the window, leaving only the press-and-hold-to-speak row
12. Click the close button to end the call
13. Hotkey settings
14. In the voice call window, click the Settings button
15. In the pop-up settings panel, click the Hotkey button
16. Press the key you want to set (such as the Space key, Shift key, etc.)
17. Click the "Save" button to save the settings
18. When using it, hold down the configured hotkey to start recording, and release it to end recording and send

V. Common Issues and Solutions

1. TTS-related issues

* Issue: TTS cannot play sound. Solution: Check whether TTS is enabled, and make sure the correct service type is selected and the necessary parameters are configured.
* Issue: TTS playback quality is poor. Solution: Try switching to a different TTS service type or voice.
* Issue: An error message is shown during TTS playback. Solution: Check whether the API key is correct and whether the network connection is normal.

2. ASR-related issues

* Issue: ASR cannot recognize speech. Solution: Check whether ASR is enabled, and make sure the correct service type is selected and the necessary parameters are configured.
* Issue: ASR recognition accuracy is low. Solution: Try switching to a different ASR service type, or adjust the microphone position and volume.
* Issue: ASR server connection failed. Solution: Check whether the local server is running properly, or try restarting the app.

3. Voice call-related issues

* Issue: The voice call window cannot be opened. Solution: Check whether the voice call feature is enabled, and ensure that TTS and ASR are configured correctly.
* Issue: Press-and-hold-to-speak does not respond. Solution: Check whether microphone permission has been granted, or try restarting the voice call.
* Issue: AI responses have no voice output. Solution: Check whether TTS is enabled and make sure it is not muted.

VI. Advanced Settings and Custom Options

1. TTS advanced settings

* Filtering options: you can choose to filter the reasoning process, Markdown markup, and code blocks to make TTS playback smoother
* Progress bar display: you can choose whether to display the TTS progress bar
* Custom voices and models: you can add custom voice and model options

2. ASR advanced settings

* Auto-start server: you can set whether to automatically start the ASR server when the app launches
* Language selection: you can choose different speech recognition languages

3. Voice call advanced settings

* Custom prompt: you can customize the voice call prompt to guide how AI responds in voice call mode
* Dedicated model selection: you can choose a dedicated AI model for voice calls, separate from the model used in the current conversation
* Hotkey customization: you can set custom hotkeys to control recording

VII. Usage Tips

1. Choose the right TTS service:
   * If you are pursuing high-quality voice, OpenAI or SiliconFlow is recommended
   * If you do not want to configure an API, you can use Browser TTS or Free Online TTS
2. Choose the right ASR service:
   * If you are pursuing high accuracy, OpenAI is recommended
   * If you do not want to configure an API, you can use the browser's built-in speech recognition
3. Optimize the voice call experience:
   * Using headphones can prevent TTS output from being captured again by ASR
   * Using it in a quiet environment can improve recognition accuracy
   * Using custom prompts can make AI responses more suitable for voice playback
4. Adjust settings according to your needs:
   * If you mainly use text communication, you can enable only the TTS feature
   * If you mainly use voice input, you can enable only the ASR feature
   * If you need a complete voice conversation experience, enable the voice call feature

We hope this user guide helps you make full use of Cherry Studio's voice features and enjoy a more natural and convenient AI interaction experience!


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.cherry-ai.com/docs/en-us/pre-basic/settings/yu-yin-gong-neng.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.