Features
Local voice providers
Run local OpenAI-compatible STT/TTS servers for Local Voice.
Local Voice is a turn-based pipeline that uses OpenAI-compatible endpoints that you host yourself:
- STT:
POST /v1/audio/transcriptions - TTS:
POST /v1/audio/speech
In the app, configure Settings → Voice → Local Voice:
- STT Base URL (typically
http://<host>:<port>/v1) - TTS Base URL (typically
http://<host>:<port>/v1) - Optional API keys (stored encrypted), model/voice fields, and output format
Optional experimental toggles:
- Device STT: use on-device speech recognition (no STT HTTP endpoint required).
- Device TTS: use on-device speech synthesis (no TTS HTTP endpoint required).
What you need (minimum)
- STT is required to use voice input (talking into the mic).
- TTS is optional (it is only required if you want spoken replies).
- STT and TTS can be hosted on the same server (if it supports both endpoints), or on two separate servers.
Direct-to-session vs voice agent mode
Local Voice supports two conversation modes:
- Direct-to-session: each time you speak, the transcribed text is sent into the Happier session as a normal message.
- Voice agent mode: each time you speak, the transcribed text is sent into an ephemeral multi-turn “voice agent” chat. The voice agent does not write to the session transcript unless it explicitly calls
sendSessionMessage(for example, when you ask it to apply a decision to the session).
Voice agent mode can use:
- Daemon voice agent (recommended): uses the daemon’s per-session process (no extra HTTP endpoints required).
- OpenAI-compatible voice agent: calls a user-configured chat endpoint (
POST /v1/chat/completions). This is useful if you want the entire voice stack (STT+TTS+chat) to be local/HTTP-based.
Important networking notes
- On phones,
localhost/127.0.0.1usually refers to the phone itself, not your computer.- Use your computer’s LAN IP (e.g.
http://192.168.1.10:8000/v1) or a tunnel.
- Use your computer’s LAN IP (e.g.
- If you expose a server beyond your LAN, add authentication and HTTPS.
- On web, you may need to handle CORS.
Compatible servers (examples)
Happier doesn’t bundle these servers — you run them separately. The goal is interoperability via OpenAI-compatible APIs.
STT (speech-to-text)
etalab-ia/faster-whisper-servermatatonic/openedai-whisper(unmaintained)
TTS (text-to-speech)
remsky/Kokoro-FastAPItravisvn/chatterbox-tts-api(AGPL-3.0)travisvn/openai-edge-tts(uses Microsoft Edge TTS; requires network access)matatonic/openedai-speech(archived; OpenAI-compatible; supports local backends like piper/XTTS)
Troubleshooting
- Can’t connect from mobile: verify the server is bound to
0.0.0.0, allow the port in firewall, and use your LAN IP. - 403/401: check that your API key (if needed) is configured in the app and the server.
- Bad audio / wrong voice: confirm
modelandvoicevalues supported by your TTS server.