Happier Docs
Features

Voice

Choose a voice mode, understand how voice conversations map to sessions, and configure ElevenLabs or local voice in Happier.

Happier supports four voice modes in Settings → Voice:

  • Off: disables all voice features.
  • Happier Voice: realtime ElevenLabs voice using Happier-managed credentials and server-side quotas.
  • Use my ElevenLabs: realtime ElevenLabs voice using your own API key and agent.
  • Local voice: a configurable STT/TTS pipeline with device, self-hosted, Google, and local-neural providers.

Start here

If you only need one rule of thumb:

  • choose Happier Voice for the simplest “just works” realtime voice experience
  • choose Use my ElevenLabs if you want full control over your own ElevenLabs account and agent
  • choose Local voice if you want device speech, self-hosted endpoints, or on-device neural speech

Common settings

All voice modes share a few top-level settings:

Preferred language

Preferred language sets the language Happier should prefer for spoken responses and voice-side interactions.

Use this when:

  • you want the assistant to answer in a specific language
  • your speech provider supports multiple languages
  • you want local voice and ElevenLabs voice to stay aligned with the same preference

Voice UI defaults

Voice settings also include UI behavior such as:

  • whether voice defaults to global or session scope
  • whether the voice surface prefers the sidebar, the session screen, or auto
  • whether the activity feed is shown and auto-expands

These are app-level UX settings. They do not change how a provider itself speaks or reasons.

Voice activity feed

Enable voice activity feed shows recent voice events directly in the UI while voice is active.

You can also enable Auto-expand on start if you want the activity feed to open automatically whenever a voice session starts.

Privacy

Happier includes privacy controls for what voice providers can see.

Defaults are intentionally conservative for sensitive data:

  • session summaries and recent messages can be shared
  • tool names and permission requests can be shared
  • file paths and tool arguments are off by default

If you need stricter privacy, start by disabling:

  • Share recent messages
  • Share file paths
  • Share tool arguments

Voice conversations and session targeting

Voice in Happier is more than a floating microphone button. It has its own conversation state and can act on real sessions.

Hidden voice conversation

When voice is active, Happier keeps a hidden voice conversation session.

Open it when you want to:

  • review what the assistant heard
  • read voice replies as text
  • continue the same conversation by typing
  • inspect request announcements, tool calls, or tool results

The hidden voice conversation does not replace your main coding session. It is the voice-side conversation layer.

Target session

Voice actions still need a real session to act on.

That target session is the session where voice can:

  • send messages
  • answer requests
  • switch modes when supported
  • continue work on your behalf

This is why voice can behave like a real assistant instead of just transcribing speech into a random text box.

Where the voice starts

For local voice agent mode, start location depends on where you launch voice and on your directory policy:

  • Start from the sidebar / global voice surface: starts from voice home
  • Start from a session surface: starts from that session’s project root
  • If “Stay in voice home” is enabled: session starts also stay in voice home

This gives you a neutral default for global voice use, while still letting session-started voice work directly against the current project when you want that behavior.

Happier Voice

Happier Voice is the managed realtime option.

Use it when you want:

  • the least setup
  • a true realtime voice session
  • server-managed access control, quota enforcement, and subscription rules

Happier Voice is available only when the connected server advertises it through /v1/features.

Native background audio

On native builds, active realtime ElevenLabs calls use a call-style audio mode so the conversation can continue more reliably when the app is backgrounded or the screen locks.

This background-call behavior applies to the realtime ElevenLabs path, not to Local voice.

Use my ElevenLabs

Use my ElevenLabs also runs a realtime ElevenLabs conversation, but it uses your own ElevenLabs account settings.

This mode is useful when you want:

  • your own ElevenLabs billing
  • your own ElevenLabs agent
  • direct control over voice, model, and voice tuning

What you configure

In Settings → Voice → Use my ElevenLabs, you can configure:

  • API key
  • Agent ID
  • Voice
  • Realtime model
  • Speaker boost
  • Voice tuning such as stability, similarity, style, and speed
  • Welcome mode

The voice picker is searchable and supports inline preview playback, so you can audition voices before choosing one.

API key permissions

When creating a restricted ElevenLabs API key for Happier, enable:

  • Text to SpeechAccess
  • VoicesRead
  • Conversational AI / AgentsRead & Write
  • optional: UserRead

Saved credentials

Your ElevenLabs API key is saved in Happier as an encrypted secret setting rather than a plaintext field, so you do not need to re-enter it every time.

Auto-provisioning a Happier agent

If you do not want to create or update the ElevenLabs agent manually, Happier can help provision it for you.

From the ElevenLabs section you can:

  • create a new Happier-compatible agent
  • update an existing Happier agent template
  • reuse an existing agent if you already have one

This is the easiest way to keep your ElevenLabs agent aligned with Happier’s current tool and prompt wiring.

Local voice

Local voice is the configurable voice pipeline.

It supports:

  • STT from the device, OpenAI-compatible endpoints, Google Gemini, or local neural STT
  • TTS from the device, OpenAI-compatible endpoints, Google Cloud, or local neural TTS

Local voice is the right choice when you want:

  • a self-hosted speech stack
  • device speech services
  • a hybrid setup such as device STT + cloud TTS
  • on-device neural speech models such as Kokoro TTS or Sherpa STT

Local voice settings also include voice-agent behavior such as backend selection, machine targeting, resumability, and working-directory policy when you use Agent mode.

Backend and model lists are machine-aware

When you use Agent mode, Happier does not guess a fake model list locally.

  • the Voice agent backend list follows your enabled backend toggles
  • the Agent machine setting decides which machine Happier probes for backend capabilities
  • the chat model and commit model lists come from that selected machine when the backend supports dynamic model probing

This means the exact model list can differ between machines.

If the selected machine cannot provide a dynamic list for the chosen backend, Happier falls back to the safe options that are always valid:

  • Use CLI settings
  • Custom…

In practice, this usually means one of these is true on the selected machine/account:

  • the backend is not installed there
  • the backend is installed but not authenticated there
  • the backend does not expose a dynamic model probe for that machine context

If you expect a richer model list, first check:

  • Settings → Voice → Agent machine
  • the selected backend is actually available on that machine
  • that machine is signed in for the backend you selected

Conversation modes

Local voice supports two conversation modes:

  • Direct to session: your transcribed speech is sent straight into the session
  • Agent: your speech goes through a dedicated voice agent first

Use Direct to session when you want speech input to behave like direct dictation into a session.

Use Agent when you want a colleague-style voice layer that can ask follow-up questions, summarize, and use structured actions before writing anything back.

TTS still matters in direct-to-session mode

Even in Direct to session, TTS can still be enabled so Happier can read agent replies back to you.

Hands-free mode

When you use Device STT, Happier can also expose hands-free endpointing controls such as silence timeout and minimum speech duration.

Test TTS

Use Test TTS to verify the currently selected local TTS provider.

That single button is the canonical end-to-end test for local voice output, regardless of whether you are using device TTS, OpenAI-compatible TTS, Google Cloud TTS, or local neural TTS.

See the full setup guide: Local voice providers.

What voice can do

Voice uses the same structured action system as the rest of Happier.

Common voice actions include:

  • sending a message to a session
  • answering permission requests
  • answering user-action requests
  • changing the active target session
  • changing tracked sessions
  • searching or resolving supported actions
  • switching session mode when supported

This is why voice is more reliable than a plain free-form speech interface: it can call typed actions instead of guessing.

Practical setup recipes

I want the simplest realtime voice setup

Use Happier Voice.

I want ElevenLabs, but with my own billing and voices

Use Use my ElevenLabs.

I want local or self-hosted speech services

Use Local voice.

I want speech input without turning every utterance into a session message

Use Local voice → Conversation mode → Agent.

Troubleshooting

Voice is connected, but nothing is spoken back

Check:

  • that TTS is enabled for the selected provider
  • that the correct TTS provider is selected
  • that Test TTS succeeds in the Local voice section

My ElevenLabs voice list is empty

Check:

  • your API key is present
  • the key has Voices → Read permission
  • the selected agent and voice still exist in ElevenLabs

Local voice does not reach my self-hosted server

On mobile, localhost usually points to the phone itself, not your computer. Use your computer’s LAN IP or a tunnel instead.

On this page