Happier Docs
Features

Local voice providers

Run local OpenAI-compatible STT/TTS servers for Local Voice.

Local Voice is a turn-based pipeline that uses OpenAI-compatible endpoints that you host yourself:

  • STT: POST /v1/audio/transcriptions
  • TTS: POST /v1/audio/speech

In the app, configure Settings → Voice → Local Voice:

  • STT Base URL (typically http://<host>:<port>/v1)
  • TTS Base URL (typically http://<host>:<port>/v1)
  • Optional API keys (stored encrypted), model/voice fields, and output format

Optional experimental toggles:

  • Device STT: use on-device speech recognition (no STT HTTP endpoint required).
  • Device TTS: use on-device speech synthesis (no TTS HTTP endpoint required).

What you need (minimum)

  • STT is required to use voice input (talking into the mic).
  • TTS is optional (it is only required if you want spoken replies).
  • STT and TTS can be hosted on the same server (if it supports both endpoints), or on two separate servers.

Direct-to-session vs voice agent mode

Local Voice supports two conversation modes:

  • Direct-to-session: each time you speak, the transcribed text is sent into the Happier session as a normal message.
  • Voice agent mode: each time you speak, the transcribed text is sent into an ephemeral multi-turn “voice agent” chat. The voice agent does not write to the session transcript unless it explicitly calls sendSessionMessage (for example, when you ask it to apply a decision to the session).

Voice agent mode can use:

  • Daemon voice agent (recommended): uses the daemon’s per-session process (no extra HTTP endpoints required).
  • OpenAI-compatible voice agent: calls a user-configured chat endpoint (POST /v1/chat/completions). This is useful if you want the entire voice stack (STT+TTS+chat) to be local/HTTP-based.

Important networking notes

  • On phones, localhost / 127.0.0.1 usually refers to the phone itself, not your computer.
    • Use your computer’s LAN IP (e.g. http://192.168.1.10:8000/v1) or a tunnel.
  • If you expose a server beyond your LAN, add authentication and HTTPS.
  • On web, you may need to handle CORS.

Compatible servers (examples)

Happier doesn’t bundle these servers — you run them separately. The goal is interoperability via OpenAI-compatible APIs.

STT (speech-to-text)

TTS (text-to-speech)

Troubleshooting

  • Can’t connect from mobile: verify the server is bound to 0.0.0.0, allow the port in firewall, and use your LAN IP.
  • 403/401: check that your API key (if needed) is configured in the app and the server.
  • Bad audio / wrong voice: confirm model and voice values supported by your TTS server.

On this page