Kokoro TTS Studio

A polished desktop GUI for Kokoro TTS (hexgrad/Kokoro-82M), built with Python + pywebview. Runs fully offline after the initial model download.

Features

Three-column interface: text input · voice & generation settings · audio post-processing
All Kokoro languages and voices with gender filtering
Voice blending (up to 3 voices with weighted mixing)
Audio post-processing: normalize, trim silence, noise gate, fade in/out
Output formats: WAV, FLAC, MP3
Chunked generation with progress bar — handles long texts reliably
Plays preview audio without saving; plays last output on demand
Outputs are DAW-ready (DaVinci Resolve, Logic Pro, etc.)

System Requirements

espeak-ng (required)

Kokoro uses espeak-ng for phoneme generation. Install it before running the app:

macOS

brew install espeak-ng

Linux (Debian/Ubuntu)

sudo apt-get install espeak-ng

Windows Download the MSI installer from: https://github.com/espeak-ng/espeak-ng/releases

Python Setup

Clone / download this project folder.

Create a virtual environment (Python 3.10+):

python3 -m venv .venv
source .venv/bin/activate      # macOS/Linux
.venv\Scripts\activate         # Windows

Install Python dependencies:
```
pip install -r requirements.txt
```
Run the app:
```
python main.py
```

First Run

On first launch, Kokoro will download model weights (~330 MB) from Hugging Face and cache them locally (~/.cache/huggingface/hub/). Subsequent launches are fully offline.

Debug Mode

Set the environment variable TTS_DEBUG=1 to open the pywebview developer tools:

TTS_DEBUG=1 python main.py

Output Files

Generated audio is saved to ./outputs/ by default (configurable in the UI). Files are named output_YYYYMMDD_HHMMSS.wav unless you set a custom filename.

WAV and FLAC outputs are lossless and import cleanly into:

DaVinci Resolve — File → Import → Media
Logic Pro — drag into timeline or browser
Any other DAW that accepts PCM or FLAC

MP3 export requires pydub and ffmpeg (install ffmpeg separately or via brew install ffmpeg).

Voice Blending

Enable blending in the Voice panel to mix up to 3 voices by weighted average of their style tensors. The resulting blend spec (e.g. af_heart:60,am_echo:40) is displayed in real time. This requires the voice .pt tensor files to be present in the Kokoro installation.

Troubleshooting

Symptom	Fix
"espeak-ng not found" dialog	Install espeak-ng per the instructions above
Voice shows "(unavailable)"	That voice's tensor file wasn't found; try reinstalling kokoro
MP3 export falls back to WAV	Install pydub (`pip install pydub`) and ffmpeg
Black window on launch	Ensure pywebview ≥ 5.0 is installed

Building from Source

Produces a self-contained installer — no Python or espeak-ng required by end users. Model weights (~330 MB) are downloaded on first launch.

macOS (.dmg — arm64)

brew install espeak-ng create-dmg
pip install -r requirements.txt
bash build_mac.sh
# Output: dist/KokoroTTSStudio-mac.dmg

Icons: Drop assets/icon.icns (1024×1024) before building for a polished result. The build script generates a teal placeholder if absent.

Windows (.exe installer — x64)

choco install espeak innosetup
pip install -r requirements.txt
build_windows.bat
:: Output: dist\KokoroTTSStudio-Setup.exe

Icons: Drop assets/icon.ico (256×256) before building.

GitHub Actions (automated)

Push to main or trigger Build Installers manually from the Actions tab. Both .dmg and .exe are uploaded as 30-day artifacts on each run.

PyInstaller notes

Torch is large (~2 GB unpacked in the bundle) — first build takes time.
numba is excluded; librosa works without it for the operations used here.
The espeakng_loader pip package bundles pre-built espeak-ng binaries, so end users don't need to install it separately.
Model weights are not bundled — they download to ~/Library/Application Support/KokoroTTSStudio/models/ (macOS) or %APPDATA%\KokoroTTSStudio\models\ (Windows) on first launch.

Attributions

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
frontend		frontend
setup_frontend		setup_frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app_paths.py		app_paths.py
audio_processing.py		audio_processing.py
build_mac.sh		build_mac.sh
build_windows.bat		build_windows.bat
installer_windows.iss		installer_windows.iss
kokoro_studio.spec		kokoro_studio.spec
main.py		main.py
model_downloader.py		model_downloader.py
piper_engine.py		piper_engine.py
requirements.txt		requirements.txt
setup_offline.py		setup_offline.py
tts_engine.py		tts_engine.py
voice_data.py		voice_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kokoro TTS Studio

Features

System Requirements

espeak-ng (required)

Python Setup

First Run

Debug Mode

Output Files

Voice Blending

Troubleshooting

Building from Source

macOS (.dmg — arm64)

Windows (.exe installer — x64)

GitHub Actions (automated)

PyInstaller notes

Attributions

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kokoro TTS Studio

Features

System Requirements

espeak-ng (required)

Python Setup

First Run

Debug Mode

Output Files

Voice Blending

Troubleshooting

Building from Source

macOS (.dmg — arm64)

Windows (.exe installer — x64)

GitHub Actions (automated)

PyInstaller notes

Attributions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages