Resourcesrepository2026-06-18Safety reviewed
Clone Any Voice on Your Own Laptop
The private, on-device answer to ElevenLabs... seven TTS engines and voice cloning that never leaves your machine.
- 33.3K stars
- TypeScript/Rust
- MIT
- 7TTS engines
- 23languages
- 33.3Kstars

Voicebox is a local-first voice studio: seven synthesis engines, 23 languages, voice cloning, a global dictation hotkey, and an MCP bridge so agents can talk back in a cloned voice.
Everything runs on-device... models, voice data, and captures stay local. A multi-track stories editor handles podcasts and conversations end to end.
What it does
- Clone voices from seconds of audio, generate speech instantly
- 7 TTS engines: Qwen3-TTS, LuxTTS, Chatterbox Turbo, Kokoro
- 23 languages across every engine
- Global dictation hotkey... speak anywhere, auto-paste
- Effects: pitch shift, reverb, delay, compression, filters
- Multi-track Stories editor for podcasts and conversations
- Unlimited length with auto-chunking and crossfades
- Expressive tags like [laugh], [sigh], and [gasp]
- Voice personalities... attach traits, rewrite in-character
- Agents speak in cloned voices over MCP
- Whisper speech-to-text with a Turbo model
- Captures tab keeps every dictation and recording
- REST API plus a built-in MCP server
- Complete privacy... everything runs on your machine
- GPU acceleration for Apple Silicon, NVIDIA, AMD, Intel
- Versioned takes with provenance tracking
TTS engines8 total

- Qwen3-TTS (0.6B)
- Qwen3-TTS (1.7B)
- Qwen CustomVoice
- LuxTTS
- Chatterbox Multilingual
- Chatterbox Turbo
- TADA (HumeAI)
- Kokoro
Supported languages23 total

- English
- Arabic
- Danish
- Finnish
- Greek
- Hebrew
- Hindi
- Japanese
- Malay
- Norwegian
- Polish
- Swahili
- Swedish
- Turkish
- Chinese
- Korean
- Spanish
- French
- German
- Italian
- Portuguese
- Russian
- Dutch
Audio effects9 total

- Pitch Shift
- Reverb
- Delay
- Chorus
- Flanger
- Compressor
- Gain
- High-Pass Filter
- Low-Pass Filter