phone-arrow-up-rightVoice Features

circle-exclamation

Cherry Studio Voice Features Instructions

1. Overview of Voice Features

Cherry Studio provides three major voice feature modules: TTS (text-to-speech), ASR (automatic speech recognition) and voice calls. These features allow you to interact with AI naturally by voice, enhancing the user experience.

  • TTS (text-to-speech): converts AI reply text into spoken output

  • ASR (automatic speech recognition): converts your speech into text input

  • Voice calls: combines TTS and ASR to achieve a ChatGPT-like voice conversation experience

2. TTS (text-to-speech) Feature

  1. Supported service types

Cherry Studio supports four types of TTS services:

  • OpenAI: uses OpenAI's TTS API, requires an API key

  • Browser TTS: uses the browser's built-in speech synthesis, free and requires no configuration

  • Siliconflow: uses Siliconflow's TTS service, requires an API key

  • Free online TTS: uses a free online TTS service, no API key required

  1. Setup method

  2. Go to the settings page and select the "Voice Features" tab

  3. In the "TTS" sub-tab:

    • Enable the TTS feature (turn on the switch)

    • Select the TTS service type

    • Configure the corresponding parameters according to the selected service type:

      • OpenAI: enter the API key, API address, and choose voice and model

      • Browser TTS: choose voice

      • Siliconflow: enter the API key, API address, choose voice, model, response format and speech rate

      • Free online TTS: choose voice and output format

  4. Configure TTS filtering options (optional):

    • Filter out thinking process

    • Filter out Markdown markup

    • Filter out code blocks

  5. Set whether to show the TTS progress bar

  6. Click the "Test TTS" button to test whether the configuration is correct

  7. How to use

  • After enabling TTS, AI replies will be automatically converted to spoken output

  • In the chat interface, a TTS play button will appear under each AI reply

  • Click the play button to play/pause the audio

  • If the TTS progress bar is enabled, playback progress will be shown under the text

  • Long texts will be automatically segmented for synthesis and played continuously

3. ASR (automatic speech recognition) Feature

  1. Supported service types

Cherry Studio supports three types of ASR services:

  • OpenAI: uses OpenAI's Whisper model, requires an API key

  • Browser: uses the browser's built-in speech recognition, free and requires no configuration

  • Local server: connects to a local WebSocket server for speech recognition

  1. Setup method

  2. Go to the settings page and select the "Voice Features" tab

  3. In the "ASR" sub-tab:

    • Enable the ASR feature (turn on the switch)

    • Select the ASR service type

    • Configure the corresponding parameters according to the selected service type:

      • OpenAI: enter the API key, API address, and choose the model

      • Browser: no additional configuration required

      • Local server: you can set whether to automatically start the ASR server when the app launches

    • Select the speech recognition language (default is Chinese)

  4. Click the "Test ASR" button to test whether the configuration is correct

  5. How to use

  • After enabling ASR, a speech recognition button will appear next to the input box

  • Click the speech recognition button to start recording

  • After speaking, the speech will be converted to text and filled into the input box

  • Click the button again to end recording

  • Speech recognition supports continuous recognition of multiple sentences, using an accumulation mode

4. Voice Call Feature

  1. Feature characteristics

  • Combines TTS and ASR to achieve a ChatGPT-like voice conversation experience

  • Uses a draggable floating window interface

  • Supports long-press-to-speak mode

  • Supports custom hotkeys

  • Supports window collapsing

  • You can choose a dedicated voice call model

  • Supports custom prompt phrases

  1. Setup method

  2. Go to the settings page and select the "Voice Features" tab

  3. In the "Call Feature" sub-tab:

    • Enable the voice call feature (turn on the switch)

    • Click the "Select Model" button to choose the AI model used for voice calls

    • Customize voice call prompt phrases in the prompt text box (optional)

    • Click the "Save" button to save prompts, or click "Reset" to restore default prompts

  4. How to use

  5. In the chat interface, click the voice call button (phone icon) to the right of the input box

  6. The voice call window will open and play a welcome voice message

  7. Long-press the "Hold to Talk" button to start recording (or use the configured hotkey)

  8. Release the button to end recording and send it to the AI for processing

  9. AI generates a reply and plays it via TTS

  10. Use the control buttons in the window:

    • Mute/Unmute button: controls TTS output

    • Pause/Resume button: pauses or resumes the conversation

    • Settings button: configure hotkeys

    • Collapse button: collapse the window, leaving only the hold-to-talk row

  11. Click the close button to end the call

  12. Hotkey settings

  13. In the voice call window, click the settings button

  14. In the popped-up settings panel, click the hotkey button

  15. Press the key you want to set (such as Space, Shift, etc.)

  16. Click the "Save" button to save the setting

  17. When using, hold the configured hotkey to start recording, release to end recording and send

5. FAQ and Solutions

  1. TTS-related issues

  • Issue: TTS cannot play sound. Solution: Check whether TTS is enabled, ensure the correct service type is selected and required parameters are configured.

  • Issue: Poor TTS playback quality. Solution: Try switching to a different TTS service type or voice.

  • Issue: Error message displayed during TTS playback. Solution: Check whether the API key is correct and whether the network connection is normal.

  1. ASR-related issues

  • Issue: ASR cannot recognize speech. Solution: Check whether ASR is enabled, ensure the correct service type is selected and required parameters are configured.

  • Issue: Low ASR recognition accuracy. Solution: Try switching to a different ASR service type, or adjust microphone position and volume.

  • Issue: ASR server connection failed. Solution: Check whether the local server is running properly, or try restarting the app.

  1. Voice call-related issues

  • Issue: Voice call window cannot open. Solution: Check whether the voice call feature is enabled and ensure TTS and ASR are configured correctly.

  • Issue: Long-press to speak does not respond. Solution: Check whether microphone permissions have been granted, or try restarting the voice call.

  • Issue: AI replies have no spoken output. Solution: Check whether TTS is enabled and ensure it is not muted.

6. Advanced Settings and Customization Options

  1. TTS advanced settings

  • Filtering options: you can choose to filter thinking process, Markdown markup, and code blocks to make TTS playback smoother

  • Progress bar display: you can choose whether to show the TTS progress bar

  • Custom voices and models: you can add custom voice and model options

  1. ASR advanced settings

  • Auto-start server: you can set whether to automatically start the ASR server when the app launches

  • Language selection: you can choose different speech recognition languages

  1. Voice call advanced settings

  • Custom prompt phrases: you can customize voice call prompts to guide how the AI responds in voice call mode

  • Dedicated model selection: you can choose a dedicated AI model for voice calls, separate from the model used in the current conversation

  • Hotkey customization: you can set custom hotkeys to control recording

7. Usage Recommendations

  1. Choose a suitable TTS service:

    • If you pursue high-quality speech, it is recommended to use OpenAI or Siliconflow

    • If you don't want to configure an API, you can use Browser TTS or free online TTS

  2. Choose a suitable ASR service:

    • If you pursue high accuracy, it is recommended to use OpenAI

    • If you don't want to configure an API, you can use the browser's built-in speech recognition

  3. Optimize voice call experience:

    • Using headphones can prevent TTS output from being picked up again by ASR

    • Using it in a quiet environment can improve recognition accuracy

    • Using custom prompt phrases can make AI replies more suitable for voice playback

  4. Adjust settings according to needs:

    • If you mainly use text communication, you can enable only the TTS feature

    • If you mainly use voice input, you can enable only the ASR feature

    • If you need a full voice conversation experience, enable the voice call feature

We hope this manual helps you make full use of Cherry Studio's voice features and enjoy a more natural and convenient AI interaction experience!

Last updated

Was this helpful?