docs(tts): mention xAI custom voice support (#18776)
Some checks failed
Tests / e2e (push) Failing after 1m44s
Tests / test (push) Failing after 2m24s
Nix / nix (macos-latest) (push) Waiting to run
Deploy Site / deploy-vercel (push) Has been skipped
Deploy Site / deploy-docs (push) Has been skipped
Docker Build and Publish / build-and-push (push) Has been skipped
Nix / nix (ubuntu-latest) (push) Successful in 11m23s

Point users to xAI's custom voices feature — clone your voice in the
console, paste the voice_id into tts.xai.voice_id. No code changes
needed; the existing TTS pipeline already handles arbitrary voice IDs.

- config.py: link to xAI custom voices docs in voice_id comment
- setup.py: prompt accepts custom voice IDs during xAI TTS setup
- tts.md: short section linking to xAI console and docs
This commit is contained in:
Siddharth Balyan 2026-05-02 16:08:01 +05:30 committed by GitHub
parent af98122793
commit 5d3be898a8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 22 additions and 2 deletions

View File

@ -830,7 +830,7 @@ DEFAULT_CONFIG = {
# Voices: alloy, echo, fable, onyx, nova, shimmer # Voices: alloy, echo, fable, onyx, nova, shimmer
}, },
"xai": { "xai": {
"voice_id": "eve", "voice_id": "eve", # or custom voice ID — see https://docs.x.ai/developers/model-capabilities/audio/custom-voices
"language": "en", "language": "en",
"sample_rate": 24000, "sample_rate": 24000,
"bit_rate": 128000, "bit_rate": 128000,

View File

@ -1190,6 +1190,13 @@ def _setup_tts_provider(config: dict):
"Falling back to Edge TTS." "Falling back to Edge TTS."
) )
selected = "edge" selected = "edge"
if selected == "xai":
print()
voice_id = prompt("xAI voice_id (Enter for 'eve', or paste a custom voice ID)")
if voice_id and voice_id.strip():
config.setdefault("tts", {}).setdefault("xai", {})["voice_id"] = voice_id.strip()
print_success(f"xAI voice_id set to: {voice_id.strip()}")
elif selected == "minimax": elif selected == "minimax":
existing = get_env_value("MINIMAX_API_KEY") existing = get_env_value("MINIMAX_API_KEY")

View File

@ -69,7 +69,7 @@ tts:
model: "gemini-2.5-flash-preview-tts" # or gemini-2.5-pro-preview-tts model: "gemini-2.5-flash-preview-tts" # or gemini-2.5-pro-preview-tts
voice: "Kore" # 30 prebuilt voices: Zephyr, Puck, Kore, Enceladus, Gacrux, etc. voice: "Kore" # 30 prebuilt voices: Zephyr, Puck, Kore, Enceladus, Gacrux, etc.
xai: xai:
voice_id: "eve" # xAI TTS voice (see https://docs.x.ai/docs/api-reference#tts) voice_id: "eve" # or a custom voice ID — see docs below
language: "en" # ISO 639-1 code language: "en" # ISO 639-1 code
sample_rate: 24000 # 22050 / 24000 (default) / 44100 / 48000 sample_rate: 24000 # 22050 / 24000 (default) / 44100 / 48000
bit_rate: 128000 # MP3 bitrate; only applies when codec=mp3 bit_rate: 128000 # MP3 bitrate; only applies when codec=mp3
@ -127,6 +127,19 @@ Without ffmpeg, Edge TTS, MiniMax TTS, NeuTTS, KittenTTS, and Piper audio are se
If you want voice bubbles without installing ffmpeg, switch to the OpenAI, ElevenLabs, or Mistral provider. If you want voice bubbles without installing ffmpeg, switch to the OpenAI, ElevenLabs, or Mistral provider.
::: :::
### xAI Custom Voices (voice cloning)
xAI supports cloning your voice and using it with TTS. Create a custom voice in the [xAI Console](https://console.x.ai/team/default/voice/voice-library), then set the resulting `voice_id` in your config:
```yaml
tts:
provider: xai
xai:
voice_id: "nlbqfwie" # your custom voice ID
```
See the [xAI Custom Voices docs](https://docs.x.ai/developers/model-capabilities/audio/custom-voices) for details on recording, supported formats, and limits.
### Piper (local, 44 languages) ### Piper (local, 44 languages)
Piper is a fast, local neural TTS engine from the Open Home Foundation (the Home Assistant maintainers). It runs entirely on CPU, supports **44 languages** with pre-trained voices, and needs no API key. Piper is a fast, local neural TTS engine from the Open Home Foundation (the Home Assistant maintainers). It runs entirely on CPU, supports **44 languages** with pre-trained voices, and needs no API key.