Voice

Choose a voice mode, understand how voice conversations map to sessions, and configure ElevenLabs or local voice in Happier.

Happier supports four voice modes in Settings → Voice:

Off: disables all voice features.
Happier Voice: realtime ElevenLabs voice using Happier-managed credentials and server-side quotas.
Use my ElevenLabs: realtime ElevenLabs voice using your own API key and agent.
Local voice: a configurable STT/TTS pipeline with device, self-hosted, Google, and local-neural providers.

Start here

If you only need one rule of thumb:

choose Happier Voice for the simplest “just works” realtime voice experience
choose Use my ElevenLabs if you want full control over your own ElevenLabs account and agent
choose Local voice if you want device speech, self-hosted endpoints, or on-device neural speech

Common settings

All voice modes share a few top-level settings:

Preferred language

Preferred language sets the language Happier should prefer for spoken responses and voice-side interactions.

Use this when:

you want the assistant to answer in a specific language
your speech provider supports multiple languages
you want local voice and ElevenLabs voice to stay aligned with the same preference

Voice UI defaults

Voice settings also include UI behavior such as:

whether voice defaults to global or session scope
whether the voice surface prefers the sidebar, the session screen, or auto
whether the activity feed is shown and auto-expands

These are app-level UX settings. They do not change how a provider itself speaks or reasons.

Voice activity feed

Enable voice activity feed shows recent voice events directly in the UI while voice is active.

You can also enable Auto-expand on start if you want the activity feed to open automatically whenever a voice session starts.

Privacy

Happier includes privacy controls for what voice providers can see.

Defaults are intentionally conservative for sensitive data:

session summaries and recent messages can be shared
tool names and permission requests can be shared
file paths and tool arguments are off by default

If you need stricter privacy, start by disabling:

Share recent messages
Share file paths
Share tool arguments

Voice conversations and session targeting

Voice in Happier is more than a floating microphone button. It has its own conversation state and can act on real sessions.

Hidden voice conversation

When voice is active, Happier keeps a hidden voice conversation session.

Open it when you want to:

review what the assistant heard
read voice replies as text
continue the same conversation by typing
inspect request announcements, tool calls, or tool results

The hidden voice conversation does not replace your main coding session. It is the voice-side conversation layer.

Target session

Voice actions still need a real session to act on.

That target session is the session where voice can:

send messages
answer requests
switch modes when supported
continue work on your behalf

This is why voice can behave like a real assistant instead of just transcribing speech into a random text box.

Where the voice starts

For local voice agent mode, start location depends on where you launch voice and on your directory policy:

Start from the sidebar / global voice surface: starts from voice home
Start from a session surface: starts from that session’s project root
If “Stay in voice home” is enabled: session starts also stay in voice home

This gives you a neutral default for global voice use, while still letting session-started voice work directly against the current project when you want that behavior.

Happier Voice

Happier Voice is the managed realtime option.

Use it when you want:

the least setup
a true realtime voice session
server-managed access control, quota enforcement, and subscription rules

Happier Voice is available only when the connected server advertises it through /v1/features.

Native background audio

On native builds, active realtime ElevenLabs calls use a call-style audio mode so the conversation can continue more reliably when the app is backgrounded or the screen locks.

This background-call behavior applies to the realtime ElevenLabs path, not to Local voice.

Use my ElevenLabs

Use my ElevenLabs also runs a realtime ElevenLabs conversation, but it uses your own ElevenLabs account settings.

This mode is useful when you want:

your own ElevenLabs billing
your own ElevenLabs agent
direct control over voice, model, and voice tuning

What you configure

In Settings → Voice → Use my ElevenLabs, you can configure:

API key
Agent ID
Voice
Realtime model
Speaker boost
Voice tuning such as stability, similarity, style, and speed
Welcome mode

The voice picker is searchable and supports inline preview playback, so you can audition voices before choosing one.

API key permissions

When creating a restricted ElevenLabs API key for Happier, enable:

Text to Speech → Access
Voices → Read
Conversational AI / Agents → Read & Write
optional: User → Read

Saved credentials

Your ElevenLabs API key is saved in Happier as an encrypted secret setting rather than a plaintext field, so you do not need to re-enter it every time.

Auto-provisioning a Happier agent

If you do not want to create or update the ElevenLabs agent manually, Happier can help provision it for you.

From the ElevenLabs section you can:

create a new Happier-compatible agent
update an existing Happier agent template
reuse an existing agent if you already have one

This is the easiest way to keep your ElevenLabs agent aligned with Happier’s current tool and prompt wiring.

Local voice

Local voice is the configurable voice pipeline.

It supports:

STT from the device, OpenAI-compatible endpoints, Google Gemini, or local neural STT
TTS from the device, OpenAI-compatible endpoints, Google Cloud, or local neural TTS

Local voice is the right choice when you want:

a self-hosted speech stack
device speech services
a hybrid setup such as device STT + cloud TTS
on-device neural speech models such as Kokoro TTS or Sherpa STT

Local voice settings also include voice-agent behavior such as backend selection, machine targeting, resumability, and working-directory policy when you use Agent mode.

Backend and model lists are machine-aware

When you use Agent mode, Happier does not guess a fake model list locally.

the Voice agent backend list follows your enabled backend toggles
the Agent machine setting decides which machine Happier probes for backend capabilities
the chat model and commit model lists come from that selected machine when the backend supports dynamic model probing

This means the exact model list can differ between machines.

If the selected machine cannot provide a dynamic list for the chosen backend, Happier falls back to the safe options that are always valid:

Use CLI settings
Custom…

In practice, this usually means one of these is true on the selected machine/account:

the backend is not installed there
the backend is installed but not authenticated there
the backend does not expose a dynamic model probe for that machine context

If you expect a richer model list, first check:

Settings → Voice → Agent machine
the selected backend is actually available on that machine
that machine is signed in for the backend you selected

Conversation modes

Local voice supports two conversation modes:

Direct to session: your transcribed speech is sent straight into the session
Agent: your speech goes through a dedicated voice agent first

Use Direct to session when you want speech input to behave like direct dictation into a session.

Use Agent when you want a colleague-style voice layer that can ask follow-up questions, summarize, and use structured actions before writing anything back.

TTS still matters in direct-to-session mode

Even in Direct to session, TTS can still be enabled so Happier can read agent replies back to you.

Hands-free mode

When you use Device STT, Happier can also expose hands-free endpointing controls such as silence timeout and minimum speech duration.

Test TTS

Use Test TTS to verify the currently selected local TTS provider.

That single button is the canonical end-to-end test for local voice output, regardless of whether you are using device TTS, OpenAI-compatible TTS, Google Cloud TTS, or local neural TTS.

See the full setup guide: Local voice providers.

What voice can do

Voice uses the same structured action system as the rest of Happier.

Common voice actions include:

sending a message to a session
answering permission requests
answering user-action requests
changing the active target session
changing tracked sessions
searching or resolving supported actions
switching session mode when supported

This is why voice is more reliable than a plain free-form speech interface: it can call typed actions instead of guessing.

that TTS is enabled for the selected provider
that the correct TTS provider is selected
that Test TTS succeeds in the Local voice section

My ElevenLabs voice list is empty

Check:

your API key is present
the key has Voices → Read permission
the selected agent and voice still exist in ElevenLabs

Local voice does not reach my self-hosted server

On mobile, localhost usually points to the phone itself, not your computer. Use your computer’s LAN IP or a tunnel instead.