feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885)

Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the
Home Assistant project that supports 44 languages with zero API keys.
Adds it as a native built-in provider alongside edge/neutts/kittentts,
installable via 'hermes tools' with one keystroke.

What ships:

- New 'piper' built-in provider in tools/tts_tool.py
  - Lazy import via _import_piper()
  - Module-level voice cache keyed on (model_path, use_cuda) so switching
    voices doesn't invalidate older cached voices
  - _resolve_piper_voice_path() accepts either an absolute .onnx path or a
    voice name (auto-downloaded on first use via 'python -m
    piper.download_voices --download-dir <cache>')
  - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via
    get_hermes_dir)
  - Optional SynthesisConfig knobs: length_scale, noise_scale,
    noise_w_scale, volume, normalize_audio, use_cuda — passed through
    only when configured, so older piper-tts versions aren't broken
  - WAV output then ffmpeg conversion path (same as neutts/kittentts) so
    Telegram voice bubbles work when ffmpeg is present
  - Piper added to BUILTIN_TTS_PROVIDERS so a user's
    tts.providers.piper.command cannot shadow the native provider
    (regression test included)

- 'hermes tools' wizard entry
  - Piper appears under Voice and TTS as local free, with
    'pip install piper-tts' auto-install via post_setup handler
  - Prints voice-catalog URL and default-voice info after install

- config.yaml defaults
  - tts.piper.voice defaults to en_US-lessac-medium
  - Commented advanced knobs for discoverability

- Docs
  - New 'Piper (local, 44 languages)' section in features/tts.md
    explaining install path, voice switching, pre-downloaded voices,
    and advanced knobs
  - Piper listed in the ten-provider table and ffmpeg table
  - Custom-command-providers section updated to drop the Piper example
    (now native) and add a piper-custom example for users with their own
    trained .onnx models
  - overview.md bumps provider count to ten

- Tests (tests/tools/test_tts_piper.py, 16 tests)
  - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH)
  - _resolve_piper_voice_path across every branch: direct .onnx path,
    cached voice name, fresh download with correct CLI args, download
    failure, successful-exit-but-missing-files, empty voice to default
  - _generate_piper_tts: loads voice once, reuses cache, voice-name
    download wiring, advanced knobs flow through SynthesisConfig
  - text_to_speech_tool end-to-end dispatch and missing-package error
  - check_tts_requirements: piper availability toggles the return value
  - Regression guard: piper cannot be shadowed by a command provider
    with the same name
  - Pre-existing test_tts_mistral test broadened to mock the new
    piper/kittentts/command-provider checks (otherwise it false-passes
    when piper is installed in the test venv)

E2E verification (live):

Actual pip install piper-tts, config piper + en_US-lessac-low,
text_to_speech_tool call, voice auto-downloaded from HuggingFace,
WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the
cache (~60ms). Cache dir populated with .onnx and .onnx.json.

This caught a real bug during development: the first pass used '-d' as
the download-dir flag; the actual piper.download_voices CLI wants
'--download-dir'. Fixed before PR opened.
This commit is contained in:
Teknium 2026-04-30 02:53:20 -07:00 committed by GitHub
parent 2662bfb756
commit 8d302e37a8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
8 changed files with 634 additions and 34 deletions

View File

@ -779,7 +779,7 @@ DEFAULT_CONFIG = {
# limit (OpenAI 4096, xAI 15000, MiniMax 10000, ElevenLabs 5k-40k model-aware,
# Gemini 5000, Edge 5000, Mistral 4000, NeuTTS/KittenTTS 2000).
"tts": {
"provider": "edge", # "edge" (free) | "elevenlabs" (premium) | "openai" | "xai" | "minimax" | "mistral" | "neutts" (local)
"provider": "edge", # "edge" (free) | "elevenlabs" (premium) | "openai" | "xai" | "minimax" | "mistral" | "gemini" | "neutts" (local) | "kittentts" (local) | "piper" (local)
"edge": {
"voice": "en-US-AriaNeural",
# Popular: AriaNeural, JennyNeural, AndrewNeural, BrianNeural, SoniaNeural
@ -809,6 +809,19 @@ DEFAULT_CONFIG = {
"model": "neuphonic/neutts-air-q4-gguf", # HuggingFace model repo
"device": "cpu", # cpu, cuda, or mps
},
"piper": {
# Voice name (e.g. "en_US-lessac-medium") downloaded on first
# use, OR an absolute path to a pre-downloaded .onnx file.
# Full voice list: https://github.com/OHF-Voice/piper1-gpl/blob/main/docs/VOICES.md
"voice": "en_US-lessac-medium",
# "voices_dir": "", # Override voice cache dir; default = ~/.hermes/cache/piper-voices/
# "use_cuda": False, # Requires onnxruntime-gpu
# "length_scale": 1.0, # 2.0 = twice as slow
# "noise_scale": 0.667,
# "noise_w_scale": 0.8,
# "volume": 1.0,
# "normalize_audio": True,
},
},
"stt": {

View File

@ -227,6 +227,14 @@ TOOL_CATEGORIES = {
"tts_provider": "kittentts",
"post_setup": "kittentts",
},
{
"name": "Piper",
"badge": "local · free",
"tag": "Local neural TTS, 44 languages (voices ~20-90MB)",
"env_vars": [],
"tts_provider": "piper",
"post_setup": "piper",
},
],
},
"web": {
@ -624,6 +632,33 @@ def _run_post_setup(post_setup_key: str):
_print_warning(" kittentts install timed out (>5min)")
_print_info(f" Run manually: python -m pip install -U '{wheel_url}' soundfile")
elif post_setup_key == "piper":
try:
__import__("piper")
_print_success(" piper-tts is already installed")
except ImportError:
import subprocess
_print_info(" Installing piper-tts (~14MB wheel, voices downloaded on first use)...")
try:
result = subprocess.run(
[sys.executable, "-m", "pip", "install", "-U", "piper-tts", "--quiet"],
capture_output=True, text=True, timeout=300,
)
if result.returncode == 0:
_print_success(" piper-tts installed")
else:
_print_warning(" piper-tts install failed:")
_print_info(f" {result.stderr.strip()[:300]}")
_print_info(" Run manually: python -m pip install -U piper-tts")
return
except subprocess.TimeoutExpired:
_print_warning(" piper-tts install timed out (>5min)")
_print_info(" Run manually: python -m pip install -U piper-tts")
return
_print_info(" Default voice: en_US-lessac-medium (downloaded on first TTS call)")
_print_info(" Full voice list: https://github.com/OHF-Voice/piper1-gpl/blob/main/docs/VOICES.md")
_print_info(" Switch voices by setting tts.piper.voice in ~/.hermes/config.yaml")
elif post_setup_key == "spotify":
# Run the full `hermes auth spotify` flow — if the user has no
# client_id yet, this drops them into the interactive wizard

View File

@ -81,29 +81,39 @@ class TestResolveCommandProviderConfig:
def test_user_declared_command_provider_resolves(self):
cfg = {
"providers": {
"piper": {"type": "command", "command": "piper foo"},
"piper-cli": {"type": "command", "command": "piper-cli foo"},
},
}
resolved = _resolve_command_provider_config("piper", cfg)
resolved = _resolve_command_provider_config("piper-cli", cfg)
assert resolved is not None
assert resolved["command"] == "piper foo"
assert resolved["command"] == "piper-cli foo"
def test_type_command_is_implied_when_command_is_set(self):
cfg = {"providers": {"piper": {"command": "piper foo"}}}
resolved = _resolve_command_provider_config("piper", cfg)
cfg = {"providers": {"piper-cli": {"command": "piper-cli foo"}}}
resolved = _resolve_command_provider_config("piper-cli", cfg)
assert resolved is not None
def test_other_type_values_reject(self):
cfg = {"providers": {"piper": {"type": "python", "command": "piper foo"}}}
assert _resolve_command_provider_config("piper", cfg) is None
cfg = {"providers": {"piper-cli": {"type": "python", "command": "piper-cli foo"}}}
assert _resolve_command_provider_config("piper-cli", cfg) is None
def test_empty_command_rejects(self):
cfg = {"providers": {"piper": {"type": "command", "command": " "}}}
assert _resolve_command_provider_config("piper", cfg) is None
cfg = {"providers": {"piper-cli": {"type": "command", "command": " "}}}
assert _resolve_command_provider_config("piper-cli", cfg) is None
def test_case_insensitive_lookup(self):
cfg = {"providers": {"piper": {"type": "command", "command": "x"}}}
assert _resolve_command_provider_config("PIPER", cfg) is not None
cfg = {"providers": {"piper-cli": {"type": "command", "command": "x"}}}
assert _resolve_command_provider_config("PIPER-CLI", cfg) is not None
def test_native_piper_cannot_be_shadowed_by_command_entry(self):
"""Regression guard for PR that added native Piper as a built-in.
A user's ``tts.providers.piper`` must not override the built-in."""
cfg = {
"providers": {
"piper": {"type": "command", "command": "some-script"},
},
}
assert _resolve_command_provider_config("piper", cfg) is None
class TestGetNamedProviderConfig:
@ -145,16 +155,16 @@ class TestIterCommandProviders:
cfg = {
"providers": {
"openai": {"type": "command", "command": "shouldnt show up"},
"piper": {"type": "command", "command": "piper"},
"piper-cli": {"type": "command", "command": "piper-cli"},
"voxcpm": {"type": "command", "command": "voxcpm"},
"broken": {"type": "command", "command": ""},
},
}
names = sorted(name for name, _ in _iter_command_providers(cfg))
assert names == ["piper", "voxcpm"]
assert names == ["piper-cli", "voxcpm"]
def test_has_any_command_provider_detects_declared(self):
cfg = {"providers": {"piper": {"type": "command", "command": "piper"}}}
cfg = {"providers": {"piper-cli": {"type": "command", "command": "piper-cli"}}}
assert _has_any_command_tts_provider(cfg) is True
def test_has_any_command_provider_when_none(self):
@ -216,16 +226,16 @@ class TestConfigGetters:
class TestMaxTextLengthForCommandProviders:
def test_default_for_command_provider(self):
cfg = {"providers": {"piper": {"type": "command", "command": "x"}}}
assert _resolve_max_text_length("piper", cfg) == DEFAULT_COMMAND_TTS_MAX_TEXT_LENGTH
cfg = {"providers": {"piper-cli": {"type": "command", "command": "x"}}}
assert _resolve_max_text_length("piper-cli", cfg) == DEFAULT_COMMAND_TTS_MAX_TEXT_LENGTH
def test_override_under_providers(self):
cfg = {"providers": {"piper": {"type": "command", "command": "x", "max_text_length": 2500}}}
assert _resolve_max_text_length("piper", cfg) == 2500
cfg = {"providers": {"piper-cli": {"type": "command", "command": "x", "max_text_length": 2500}}}
assert _resolve_max_text_length("piper-cli", cfg) == 2500
def test_override_under_legacy_tts_name_block(self):
cfg = {"piper": {"type": "command", "command": "x", "max_text_length": 7777}}
assert _resolve_max_text_length("piper", cfg) == 7777
cfg = {"piper-cli": {"type": "command", "command": "x", "max_text_length": 7777}}
assert _resolve_max_text_length("piper-cli", cfg) == 7777
def test_non_command_unknown_provider_still_falls_back(self):
assert _resolve_max_text_length("unknown", {}) > 0

View File

@ -216,5 +216,8 @@ class TestCheckTtsRequirementsMistral:
with patch("tools.tts_tool._import_edge_tts", side_effect=ImportError), \
patch("tools.tts_tool._import_elevenlabs", side_effect=ImportError), \
patch("tools.tts_tool._import_openai_client", side_effect=ImportError), \
patch("tools.tts_tool._check_neutts_available", return_value=False):
patch("tools.tts_tool._check_neutts_available", return_value=False), \
patch("tools.tts_tool._check_kittentts_available", return_value=False), \
patch("tools.tts_tool._check_piper_available", return_value=False), \
patch("tools.tts_tool._has_any_command_tts_provider", return_value=False):
assert check_tts_requirements() is False

View File

@ -0,0 +1,306 @@
"""
Tests for the native Piper TTS provider.
These tests pin the resolution / caching / dispatch paths for Piper
without requiring the ``piper-tts`` package to actually be installed
(the synthesis step is monkey-patched to avoid needing the ONNX wheel).
"""
import json
import os
import sys
from pathlib import Path
from unittest.mock import MagicMock, patch
import pytest
from tools import tts_tool
from tools.tts_tool import (
BUILTIN_TTS_PROVIDERS,
DEFAULT_PIPER_VOICE,
PROVIDER_MAX_TEXT_LENGTH,
_check_piper_available,
_resolve_piper_voice_path,
check_tts_requirements,
text_to_speech_tool,
)
# ---------------------------------------------------------------------------
# Registry / constants
# ---------------------------------------------------------------------------
class TestPiperRegistration:
def test_piper_is_a_builtin_provider(self):
assert "piper" in BUILTIN_TTS_PROVIDERS
def test_piper_has_a_text_length_cap(self):
assert PROVIDER_MAX_TEXT_LENGTH.get("piper", 0) > 0
# ---------------------------------------------------------------------------
# _check_piper_available
# ---------------------------------------------------------------------------
class TestCheckPiperAvailable:
def test_returns_bool_without_raising(self):
# We don't care about the current environment's answer — just that
# the probe never raises on a machine without piper installed.
assert isinstance(_check_piper_available(), bool)
# ---------------------------------------------------------------------------
# _resolve_piper_voice_path
# ---------------------------------------------------------------------------
class TestResolvePiperVoicePath:
def test_direct_onnx_path_returned_as_is(self, tmp_path):
model = tmp_path / "custom.onnx"
model.write_bytes(b"fake onnx bytes")
result = _resolve_piper_voice_path(str(model), tmp_path)
assert result == str(model)
def test_cached_voice_name_not_redownloaded(self, tmp_path):
"""If both <voice>.onnx and <voice>.onnx.json exist in the
download dir, no subprocess is spawned."""
voice = "en_US-test-medium"
(tmp_path / f"{voice}.onnx").write_bytes(b"model")
(tmp_path / f"{voice}.onnx.json").write_text("{}")
with patch("tools.tts_tool.subprocess.run") as mock_run:
result = _resolve_piper_voice_path(voice, tmp_path)
mock_run.assert_not_called()
assert result == str(tmp_path / f"{voice}.onnx")
def test_missing_voice_triggers_download(self, tmp_path):
voice = "en_US-new-medium"
def fake_run(cmd, *a, **kw):
# Simulate a successful download: write the expected files.
(tmp_path / f"{voice}.onnx").write_bytes(b"model")
(tmp_path / f"{voice}.onnx.json").write_text("{}")
return MagicMock(returncode=0, stderr="", stdout="")
with patch("tools.tts_tool.subprocess.run", side_effect=fake_run) as mock_run:
result = _resolve_piper_voice_path(voice, tmp_path)
mock_run.assert_called_once()
# Verify the command shape: python -m piper.download_voices <voice> --download-dir <dir>
call_args = mock_run.call_args.args[0]
assert "piper.download_voices" in " ".join(call_args)
assert voice in call_args
assert "--download-dir" in call_args
assert str(tmp_path) in call_args
assert result == str(tmp_path / f"{voice}.onnx")
def test_download_failure_raises_runtime(self, tmp_path):
voice = "en_US-broken-medium"
fake_result = MagicMock(returncode=1, stderr="voice not found", stdout="")
with patch("tools.tts_tool.subprocess.run", return_value=fake_result):
with pytest.raises(RuntimeError, match="Piper voice download failed"):
_resolve_piper_voice_path(voice, tmp_path)
def test_download_success_but_missing_file_raises(self, tmp_path):
voice = "en_US-weird-medium"
fake_result = MagicMock(returncode=0, stderr="", stdout="")
# Subprocess "succeeds" but doesn't actually write the files.
with patch("tools.tts_tool.subprocess.run", return_value=fake_result):
with pytest.raises(RuntimeError, match="completed but .+ is missing"):
_resolve_piper_voice_path(voice, tmp_path)
def test_empty_voice_falls_back_to_default_name(self, tmp_path):
(tmp_path / f"{DEFAULT_PIPER_VOICE}.onnx").write_bytes(b"model")
(tmp_path / f"{DEFAULT_PIPER_VOICE}.onnx.json").write_text("{}")
result = _resolve_piper_voice_path("", tmp_path)
assert result.endswith(f"{DEFAULT_PIPER_VOICE}.onnx")
# ---------------------------------------------------------------------------
# _generate_piper_tts — stubbed so we don't need piper-tts installed
# ---------------------------------------------------------------------------
class _StubPiperVoice:
"""Stand-in for piper.PiperVoice used by the synthesis tests."""
loaded: list[str] = []
calls: list[tuple] = []
@classmethod
def load(cls, model_path, use_cuda=False):
cls.loaded.append(model_path)
instance = cls()
instance.model_path = model_path
instance.use_cuda = use_cuda
return instance
def synthesize_wav(self, text, wav_file, syn_config=None):
# Minimal valid WAV: an empty frame set is fine for our size check.
# The wave module accepts any frames; we just need the file to exist
# with non-zero bytes after close.
wav_file.setnchannels(1)
wav_file.setsampwidth(2)
wav_file.setframerate(22050)
wav_file.writeframes(b"\x00\x00" * 1024)
_StubPiperVoice.calls.append((text, getattr(self, "model_path", ""), syn_config))
@pytest.fixture(autouse=True)
def _reset_piper_cache():
"""Clear the module-level voice cache between tests."""
tts_tool._piper_voice_cache.clear()
_StubPiperVoice.loaded = []
_StubPiperVoice.calls = []
yield
tts_tool._piper_voice_cache.clear()
class TestGeneratePiperTts:
def _prepare_voice_files(self, tmp_path, voice=DEFAULT_PIPER_VOICE):
model = tmp_path / f"{voice}.onnx"
model.write_bytes(b"model")
(tmp_path / f"{voice}.onnx.json").write_text("{}")
return model
def test_loads_voice_and_writes_wav(self, tmp_path, monkeypatch):
model = self._prepare_voice_files(tmp_path)
monkeypatch.setattr(tts_tool, "_import_piper", lambda: _StubPiperVoice)
out_path = str(tmp_path / "out.wav")
config = {"piper": {"voice": str(model)}}
result = tts_tool._generate_piper_tts("hello", out_path, config)
assert result == out_path
assert Path(out_path).exists()
assert Path(out_path).stat().st_size > 0
assert _StubPiperVoice.loaded == [str(model)]
assert _StubPiperVoice.calls[0][0] == "hello"
def test_voice_cache_reused_across_calls(self, tmp_path, monkeypatch):
model = self._prepare_voice_files(tmp_path)
monkeypatch.setattr(tts_tool, "_import_piper", lambda: _StubPiperVoice)
config = {"piper": {"voice": str(model)}}
tts_tool._generate_piper_tts("one", str(tmp_path / "a.wav"), config)
tts_tool._generate_piper_tts("two", str(tmp_path / "b.wav"), config)
# load() should have been called exactly once for the same model+cuda key.
assert _StubPiperVoice.loaded == [str(model)]
# But both synthesize calls went through.
assert [c[0] for c in _StubPiperVoice.calls] == ["one", "two"]
def test_voice_name_triggers_download(self, tmp_path, monkeypatch):
"""A config voice of ``en_US-lessac-medium`` should be resolved via
_resolve_piper_voice_path (which would normally download)."""
monkeypatch.setattr(tts_tool, "_import_piper", lambda: _StubPiperVoice)
def fake_resolve(voice, download_dir):
model = download_dir / f"{voice}.onnx"
model.write_bytes(b"model")
return str(model)
monkeypatch.setattr(tts_tool, "_resolve_piper_voice_path", fake_resolve)
config = {"piper": {"voice": "en_US-lessac-medium", "voices_dir": str(tmp_path)}}
result = tts_tool._generate_piper_tts("hi", str(tmp_path / "out.wav"), config)
assert Path(result).exists()
assert _StubPiperVoice.loaded[0].endswith("en_US-lessac-medium.onnx")
def test_advanced_knobs_passed_as_synconfig(self, tmp_path, monkeypatch):
model = self._prepare_voice_files(tmp_path)
monkeypatch.setattr(tts_tool, "_import_piper", lambda: _StubPiperVoice)
# Fake SynthesisConfig so we can assert the knobs flowed through.
fake_syn_cls = MagicMock()
class FakePiperModule:
SynthesisConfig = fake_syn_cls
# The SynthesisConfig import happens inline inside _generate_piper_tts
# via ``from piper import SynthesisConfig``. Inject a fake piper
# module so that import resolves.
monkeypatch.setitem(sys.modules, "piper", FakePiperModule)
config = {
"piper": {
"voice": str(model),
"length_scale": 2.0,
"volume": 0.8,
},
}
tts_tool._generate_piper_tts(
"slow voice", str(tmp_path / "out.wav"), config,
)
# SynthesisConfig was constructed with the advanced knobs.
fake_syn_cls.assert_called_once()
kwargs = fake_syn_cls.call_args.kwargs
assert kwargs["length_scale"] == 2.0
assert kwargs["volume"] == 0.8
# ---------------------------------------------------------------------------
# text_to_speech_tool end-to-end (provider == "piper")
# ---------------------------------------------------------------------------
class TestTextToSpeechToolWithPiper:
def test_dispatches_to_piper(self, tmp_path, monkeypatch):
model = tmp_path / f"{DEFAULT_PIPER_VOICE}.onnx"
model.write_bytes(b"model")
(tmp_path / f"{DEFAULT_PIPER_VOICE}.onnx.json").write_text("{}")
monkeypatch.setattr(tts_tool, "_import_piper", lambda: _StubPiperVoice)
cfg = {"provider": "piper", "piper": {"voice": str(model)}}
monkeypatch.setattr(tts_tool, "_load_tts_config", lambda: cfg)
result = text_to_speech_tool(text="hi", output_path=str(tmp_path / "clip.wav"))
data = json.loads(result)
assert data["success"] is True, data
assert data["provider"] == "piper"
assert Path(data["file_path"]).exists()
def test_missing_package_surfaces_error(self, tmp_path, monkeypatch):
def raise_import():
raise ImportError("No module named 'piper'")
monkeypatch.setattr(tts_tool, "_import_piper", raise_import)
cfg = {"provider": "piper"}
monkeypatch.setattr(tts_tool, "_load_tts_config", lambda: cfg)
result = text_to_speech_tool(text="hi", output_path=str(tmp_path / "clip.wav"))
data = json.loads(result)
assert data["success"] is False
assert "piper-tts" in data["error"]
# ---------------------------------------------------------------------------
# check_tts_requirements
# ---------------------------------------------------------------------------
class TestCheckTtsRequirementsPiper:
def test_piper_install_satisfies_requirements(self, monkeypatch):
# Drop every other provider so we can isolate the piper signal.
monkeypatch.setattr(tts_tool, "_import_edge_tts", lambda: (_ for _ in ()).throw(ImportError()))
monkeypatch.setattr(tts_tool, "_import_elevenlabs", lambda: (_ for _ in ()).throw(ImportError()))
monkeypatch.setattr(tts_tool, "_import_openai_client", lambda: (_ for _ in ()).throw(ImportError()))
monkeypatch.setattr(tts_tool, "_import_mistral_client", lambda: (_ for _ in ()).throw(ImportError()))
monkeypatch.setattr(tts_tool, "_check_neutts_available", lambda: False)
monkeypatch.setattr(tts_tool, "_check_kittentts_available", lambda: False)
monkeypatch.setattr(tts_tool, "_has_any_command_tts_provider", lambda: False)
monkeypatch.setattr(tts_tool, "_has_openai_audio_backend", lambda: False)
for env in ("MINIMAX_API_KEY", "XAI_API_KEY", "GEMINI_API_KEY",
"GOOGLE_API_KEY", "MISTRAL_API_KEY", "ELEVENLABS_API_KEY"):
monkeypatch.delenv(env, raising=False)
# Now toggle the piper check on and off.
monkeypatch.setattr(tts_tool, "_check_piper_available", lambda: False)
assert check_tts_requirements() is False
monkeypatch.setattr(tts_tool, "_check_piper_available", lambda: True)
assert check_tts_requirements() is True

View File

@ -12,6 +12,7 @@ Built-in TTS providers:
- xAI TTS: Grok voices, needs XAI_API_KEY
- NeuTTS (local, free, no API key): On-device TTS via neutts
- KittenTTS (local, free, no API key): On-device 25MB model
- Piper (local, free, no API key): OHF-Voice/piper1-gpl neural VITS, 44 languages
Custom command providers:
- Users can declare any number of named providers with ``type: command``
@ -109,6 +110,18 @@ def _import_kittentts():
return KittenTTS
def _import_piper():
"""Lazy import Piper. Returns the PiperVoice class or raises ImportError.
Piper is an optional, fully-local neural TTS engine (Home Assistant /
Open Home Foundation). ``pip install piper-tts`` provides cross-platform
wheels (Linux / macOS / Windows, x86_64 + ARM64) with embedded espeak-ng.
Voice models (.onnx + .onnx.json) are downloaded on first use.
"""
from piper import PiperVoice
return PiperVoice
# ===========================================================================
# Defaults
# ===========================================================================
@ -120,6 +133,7 @@ DEFAULT_ELEVENLABS_STREAMING_MODEL_ID = "eleven_flash_v2_5"
DEFAULT_OPENAI_MODEL = "gpt-4o-mini-tts"
DEFAULT_KITTENTTS_MODEL = "KittenML/kitten-tts-nano-0.8-int8" # 25MB
DEFAULT_KITTENTTS_VOICE = "Jasper"
DEFAULT_PIPER_VOICE = "en_US-lessac-medium" # balanced size/quality
DEFAULT_OPENAI_VOICE = "alloy"
DEFAULT_OPENAI_BASE_URL = "https://api.openai.com/v1"
DEFAULT_MINIMAX_MODEL = "speech-2.8-hd"
@ -163,6 +177,7 @@ PROVIDER_MAX_TEXT_LENGTH: Dict[str, int] = {
"elevenlabs": 10000, # fallback when model-aware lookup can't resolve (multilingual_v2)
"neutts": 2000, # local model, quality falls off on long text
"kittentts": 2000, # local 25MB model
"piper": 5000, # local VITS model, phoneme-based; practical cap
}
# ElevenLabs caps vary by model_id. https://elevenlabs.io/docs/overview/models
@ -308,6 +323,7 @@ BUILTIN_TTS_PROVIDERS = frozenset({
"gemini",
"neutts",
"kittentts",
"piper",
})
DEFAULT_COMMAND_TTS_TIMEOUT_SECONDS = 120
@ -1293,6 +1309,167 @@ def _generate_neutts(text: str, output_path: str, tts_config: Dict[str, Any]) ->
return output_path
# ===========================================================================
# Provider: Piper (local, neural VITS, 44 languages)
# ===========================================================================
# Module-level cache for Piper voice instances. Voices are keyed on their
# absolute .onnx model path so switching voices doesn't invalidate older
# cached voices.
_piper_voice_cache: Dict[str, Any] = {}
def _check_piper_available() -> bool:
"""Check whether the piper-tts package is importable."""
try:
import importlib.util
return importlib.util.find_spec("piper") is not None
except Exception:
return False
def _get_piper_voices_dir() -> Path:
"""Return the directory where Hermes caches Piper voice models.
Resolves to ``~/.hermes/cache/piper-voices/`` under the active
HERMES_HOME so voice downloads follow profile boundaries.
"""
from hermes_constants import get_hermes_dir
root = Path(get_hermes_dir("cache/piper-voices", "piper_voices_cache"))
root.mkdir(parents=True, exist_ok=True)
return root
def _resolve_piper_voice_path(voice: str, download_dir: Path) -> str:
"""Resolve *voice* (a model name or path) to a concrete .onnx file path.
Accepts any of:
- Absolute / expanded path to an .onnx file the user already has
- A voice *name* like ``en_US-lessac-medium`` (downloads to
``download_dir`` on first use via ``python -m piper.download_voices``)
Raises RuntimeError if the model can't be located or downloaded.
"""
if not voice:
voice = DEFAULT_PIPER_VOICE
# Case 1: user gave a direct file path.
candidate = Path(voice).expanduser()
if candidate.suffix.lower() == ".onnx" and candidate.exists():
return str(candidate)
# Case 2: user gave a voice *name*. See if it's already downloaded.
cached = download_dir / f"{voice}.onnx"
if cached.exists() and (download_dir / f"{voice}.onnx.json").exists():
return str(cached)
# Case 3: download the voice. piper ships a download helper module.
import sys as _sys
logger.info("[Piper] Downloading voice '%s' to %s (first use)", voice, download_dir)
try:
result = subprocess.run(
[_sys.executable, "-m", "piper.download_voices", voice,
"--download-dir", str(download_dir)],
capture_output=True, text=True, timeout=300,
)
except subprocess.TimeoutExpired as exc:
raise RuntimeError(
f"Piper voice download timed out after 300s for '{voice}'"
) from exc
if result.returncode != 0:
stderr = (result.stderr or "").strip() or "no stderr output"
raise RuntimeError(
f"Piper voice download failed for '{voice}': {stderr[:400]}"
)
if not cached.exists():
raise RuntimeError(
f"Piper voice download completed but {cached} is missing — "
f"check voice name (see: https://github.com/OHF-Voice/piper1-gpl/"
f"blob/main/docs/VOICES.md)"
)
return str(cached)
def _generate_piper_tts(text: str, output_path: str, tts_config: Dict[str, Any]) -> str:
"""Generate speech using the local Piper engine.
Loads the voice model once per process (cached by absolute path) and
writes a WAV file. Caller is responsible for converting to MP3/Opus
via ffmpeg when a different output format is required.
"""
PiperVoice = _import_piper()
import wave
piper_config = tts_config.get("piper", {}) if isinstance(tts_config, dict) else {}
voice_name = piper_config.get("voice") or DEFAULT_PIPER_VOICE
download_dir = Path(piper_config.get("voices_dir") or _get_piper_voices_dir()).expanduser()
download_dir.mkdir(parents=True, exist_ok=True)
use_cuda = bool(piper_config.get("use_cuda", False))
model_path = _resolve_piper_voice_path(voice_name, download_dir)
cache_key = f"{model_path}::cuda={use_cuda}"
global _piper_voice_cache
if cache_key not in _piper_voice_cache:
logger.info("[Piper] Loading voice: %s", model_path)
_piper_voice_cache[cache_key] = PiperVoice.load(model_path, use_cuda=use_cuda)
logger.info("[Piper] Voice loaded")
voice = _piper_voice_cache[cache_key]
# Optional synthesis knobs — only pass a SynthesisConfig when at least
# one advanced knob is configured, so we don't depend on a newer Piper
# version than the user's installed one unless we need to.
syn_config = None
has_advanced = any(
k in piper_config
for k in ("length_scale", "noise_scale", "noise_w_scale", "volume", "normalize_audio")
)
if has_advanced:
try:
from piper import SynthesisConfig # type: ignore
syn_config = SynthesisConfig(
length_scale=float(piper_config.get("length_scale", 1.0)),
noise_scale=float(piper_config.get("noise_scale", 0.667)),
noise_w_scale=float(piper_config.get("noise_w_scale", 0.8)),
volume=float(piper_config.get("volume", 1.0)),
normalize_audio=bool(piper_config.get("normalize_audio", True)),
)
except ImportError:
logger.warning(
"[Piper] SynthesisConfig not available in this piper-tts "
"version — advanced knobs ignored"
)
# Piper outputs WAV. Caller handles downstream MP3/Opus conversion.
wav_path = output_path
if not output_path.endswith(".wav"):
wav_path = output_path.rsplit(".", 1)[0] + ".wav"
with wave.open(wav_path, "wb") as wav_file:
if syn_config is not None:
voice.synthesize_wav(text, wav_file, syn_config=syn_config)
else:
voice.synthesize_wav(text, wav_file)
# Convert to desired format if caller requested mp3/ogg
if wav_path != output_path:
ffmpeg = shutil.which("ffmpeg")
if ffmpeg:
conv_cmd = [ffmpeg, "-i", wav_path, "-y", "-loglevel", "error", output_path]
subprocess.run(conv_cmd, check=True, timeout=30)
try:
os.remove(wav_path)
except OSError:
pass
else:
# No ffmpeg — keep WAV and return that path
os.rename(wav_path, output_path)
return output_path
# ===========================================================================
# Provider: KittenTTS (local, lightweight)
# ===========================================================================
@ -1517,6 +1694,19 @@ def text_to_speech_tool(
logger.info("Generating speech with KittenTTS (local, ~25MB)...")
_generate_kittentts(text, file_str, tts_config)
elif provider == "piper":
try:
_import_piper()
except ImportError:
return json.dumps({
"success": False,
"error": "Piper provider selected but 'piper-tts' package not installed. "
"Run 'hermes tools' and select Piper under TTS, or install manually: "
"pip install piper-tts",
}, ensure_ascii=False)
logger.info("Generating speech with Piper (local)...")
_generate_piper_tts(text, file_str, tts_config)
else:
# Default: Edge TTS (free), with NeuTTS as local fallback
edge_available = True
@ -1566,7 +1756,7 @@ def text_to_speech_tool(
if opus_path:
file_str = opus_path
voice_compatible = file_str.endswith(".ogg")
elif provider in ("edge", "neutts", "minimax", "xai", "kittentts") and not file_str.endswith(".ogg"):
elif provider in ("edge", "neutts", "minimax", "xai", "kittentts", "piper") and not file_str.endswith(".ogg"):
opus_path = _convert_to_opus(file_str)
if opus_path:
file_str = opus_path
@ -1657,6 +1847,8 @@ def check_tts_requirements() -> bool:
return True
if _check_kittentts_available():
return True
if _check_piper_available():
return True
return False
@ -1954,6 +2146,7 @@ if __name__ == "__main__":
f"{'set' if resolve_openai_audio_api_key() else 'not set (VOICE_TOOLS_OPENAI_KEY or OPENAI_API_KEY)'}"
)
print(f" MiniMax: {'API key set' if get_env_value('MINIMAX_API_KEY') else 'not set (MINIMAX_API_KEY)'}")
print(f" Piper: {'installed' if _check_piper_available() else 'not installed (pip install piper-tts)'}")
print(f" ffmpeg: {'✅ found' if _has_ffmpeg() else '❌ not found (needed for Telegram Opus)'}")
print(f"\n Output dir: {DEFAULT_OUTPUT_DIR}")

View File

@ -31,7 +31,7 @@ Hermes Agent includes a rich set of capabilities that extend far beyond basic ch
- **[Browser Automation](browser.md)** — Full browser automation with multiple backends: Browserbase cloud, Browser Use cloud, local Chrome via CDP, or local Chromium. Navigate websites, fill forms, and extract information.
- **[Vision & Image Paste](vision.md)** — Multimodal vision support. Paste images from your clipboard into the CLI and ask the agent to analyze, describe, or work with them using any vision-capable model.
- **[Image Generation](image-generation.md)** — Generate images from text prompts using FAL.ai. Nine models supported (FLUX 2 Klein/Pro, GPT-Image 1.5/2, Nano Banana Pro, Ideogram V3, Recraft V4 Pro, Qwen, Z-Image Turbo); pick one via `hermes tools`.
- **[Voice & TTS](tts.md)** — Text-to-speech output and voice message transcription across all messaging platforms, with nine provider options: Edge TTS (free), ElevenLabs, OpenAI TTS, MiniMax, Mistral Voxtral, Google Gemini, xAI, NeuTTS, and KittenTTS.
- **[Voice & TTS](tts.md)** — Text-to-speech output and voice message transcription across all messaging platforms, with ten native provider options: Edge TTS (free), ElevenLabs, OpenAI TTS, MiniMax, Mistral Voxtral, Google Gemini, xAI, NeuTTS, KittenTTS, and Piper — plus custom command providers for any local TTS CLI.
## Integrations

View File

@ -14,7 +14,7 @@ If you have a paid [Nous Portal](https://portal.nousresearch.com) subscription,
## Text-to-Speech
Convert text to speech with nine providers:
Convert text to speech with ten providers:
| Provider | Quality | Cost | API Key |
|----------|---------|------|---------|
@ -27,6 +27,7 @@ Convert text to speech with nine providers:
| **xAI TTS** | Excellent | Paid | `XAI_API_KEY` |
| **NeuTTS** | Good | Free (local) | None needed |
| **KittenTTS** | Good | Free (local) | None needed |
| **Piper** | Good | Free (local) | None needed |
### Platform Delivery
@ -42,7 +43,7 @@ Convert text to speech with nine providers:
```yaml
# In ~/.hermes/config.yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts" | "kittentts"
provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts" | "kittentts" | "piper"
speed: 1.0 # Global speed multiplier (provider-specific settings override this)
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
@ -83,6 +84,15 @@ tts:
voice: Jasper # Jasper, Bella, Luna, Bruno, Rosie, Hugo, Kiki, Leo
speed: 1.0 # 0.5 - 2.0
clean_text: true # Expand numbers, currencies, units
piper:
voice: en_US-lessac-medium # voice name (auto-downloaded) OR absolute path to .onnx
# voices_dir: '' # default: ~/.hermes/cache/piper-voices/
# use_cuda: false # requires onnxruntime-gpu
# length_scale: 1.0 # 2.0 = twice as slow
# noise_scale: 0.667
# noise_w_scale: 0.8
# volume: 1.0 # 0.5 = half as loud
# normalize_audio: true
```
**Speed control**: The global `tts.speed` value applies to all providers by default. Each provider can override it with its own `speed` setting (e.g., `tts.openai.speed: 1.5`). Provider-specific speed takes precedence over the global value. Default is `1.0` (normal speed).
@ -98,6 +108,7 @@ Telegram voice bubbles require Opus/OGG audio format:
- **xAI TTS** outputs MP3 and needs **ffmpeg** to convert for Telegram voice bubbles
- **NeuTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
- **KittenTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
- **Piper** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
```bash
# Ubuntu/Debian
@ -110,27 +121,51 @@ brew install ffmpeg
sudo dnf install ffmpeg
```
Without ffmpeg, Edge TTS, MiniMax TTS, NeuTTS, and KittenTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
Without ffmpeg, Edge TTS, MiniMax TTS, NeuTTS, KittenTTS, and Piper audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
:::tip
If you want voice bubbles without installing ffmpeg, switch to the OpenAI, ElevenLabs, or Mistral provider.
:::
### Piper (local, 44 languages)
Piper is a fast, local neural TTS engine from the Open Home Foundation (the Home Assistant maintainers). It runs entirely on CPU, supports **44 languages** with pre-trained voices, and needs no API key.
**Install via `hermes tools`** → Voice & TTS → Piper — Hermes runs `pip install piper-tts` for you. Or install manually: `pip install piper-tts`.
**Switch to Piper:**
```yaml
tts:
provider: piper
piper:
voice: en_US-lessac-medium
```
On the first TTS call for a voice that isn't cached locally, Hermes runs `python -m piper.download_voices <name>` and downloads the model (~20-90MB depending on quality tier) into `~/.hermes/cache/piper-voices/`. Subsequent calls reuse the cached model.
**Picking a voice.** The [full voice catalog](https://github.com/OHF-Voice/piper1-gpl/blob/main/docs/VOICES.md) covers English, Spanish, French, German, Italian, Dutch, Portuguese, Russian, Polish, Turkish, Chinese, Arabic, Hindi, and more — each with `x_low` / `low` / `medium` / `high` quality tiers. Sample voices at [rhasspy.github.io/piper-samples](https://rhasspy.github.io/piper-samples/).
**Using a pre-downloaded voice.** Set `tts.piper.voice` to an absolute path ending in `.onnx`:
```yaml
tts:
piper:
voice: /path/to/my-custom-voice.onnx
```
**Advanced knobs** (`tts.piper.length_scale` / `noise_scale` / `noise_w_scale` / `volume` / `normalize_audio`, `use_cuda`) correspond 1:1 to Piper's `SynthesisConfig`. They're ignored on older `piper-tts` versions.
### Custom command providers
If a TTS engine you want isn't natively supported (Piper, VoxCPM, MLX-Kokoro, XTTS CLI, a voice-cloning script, anything else that exposes a CLI), you can wire it in as a **command-type provider** without writing any Python. Hermes writes the input text to a temp UTF-8 file, runs your shell command, and reads the audio file the command produced.
If a TTS engine you want isn't natively supported (VoxCPM, MLX-Kokoro, XTTS CLI, a voice-cloning script, anything else that exposes a CLI), you can wire it in as a **command-type provider** without writing any Python. Hermes writes the input text to a temp UTF-8 file, runs your shell command, and reads the audio file the command produced.
Declare one or more providers under `tts.providers.<name>` and switch between them with `tts.provider: <name>` — the same way you switch between built-ins like `edge` and `openai`.
```yaml
tts:
provider: piper-en # pick any name under tts.providers
provider: voxcpm # pick any name under tts.providers
providers:
piper-en:
type: command
command: "piper -m ~/models/en_US-amy.onnx -f {output_path} < {input_path}"
output_format: wav
voxcpm:
type: command
command: "voxcpm --ref ~/voice.wav --text-file {input_path} --out {output_path}"
@ -143,6 +178,11 @@ tts:
command: "python -m mlx_kokoro --in {input_path} --out {output_path} --voice {voice}"
voice: af_sky
output_format: wav
piper-custom: # native Piper also supports custom .onnx via tts.piper.voice
type: command
command: "piper -m /path/to/custom.onnx -f {output_path} < {input_path}"
output_format: wav
```
#### Placeholders