Voice
Configure voice providers (Happier Voice, BYO ElevenLabs, Local Voice).
Happier supports optional voice conversations with these providers:
- Off: disables voice.
- Happier Voice (ElevenLabs): your server mints short-lived ElevenLabs conversation tokens and can enforce subscription/quota server-side.
- Use my ElevenLabs: the client uses your own ElevenLabs
xi-api-key(and agent id) and calls ElevenLabs directly (Happier never sees your key). - Local Voice (OpenAI-compatible + on-device): a mic → STT → voice agent → optional TTS pipeline using OpenAI-compatible endpoints you run yourself, with optional on-device STT/TTS.
Glossary (connection vs target vs STT/TTS)
Happier voice is designed as an account-scoped voice connection with session targeting:
-
Voice connection (account-scoped): a single active voice connection that can stay running while you navigate across sessions.
-
Primary action session (target): the session the voice assistant should act on when it sends messages or approves permissions.
-
Tracked sessions: additional sessions the voice assistant is allowed to monitor more closely for background updates (configurable).
-
STT (speech-to-text): converts microphone audio into text. Without STT, you cannot "talk to Happier".
-
TTS (text-to-speech): converts text into spoken audio. Without TTS, you can still use voice input, but replies won't be spoken aloud.
-
Voice agent mode (local voice): your spoken input goes to an ephemeral "colleague" voice agent.
- The voice agent does not automatically write to the session transcript.
- When you want something applied to the session, the voice agent uses tools (e.g.
sendSessionMessage) to send a single, explicit message into the target session.
App settings
In the app, open Settings → Voice and choose one of:
- Off: disables voice and hides the mic button.
- Happier Voice: uses your server to mint a conversation token (account-scoped).
- Use my ElevenLabs: stores your ElevenLabs credentials on-device and calls ElevenLabs directly.
- Local Voice: stores STT/TTS endpoint URLs (and optional keys) on-device and uses them for local turn-based voice chat.
Secure storage (BYO)
When using Use my ElevenLabs (or Local OSS endpoints with API keys), the app stores secrets using Happier’s encrypted-at-rest secret container:
- The key is encrypted locally using libsodium
secretbox. - Plaintext is not persisted; only ciphertext is stored.
ElevenLabs API key permissions (BYO)
When creating a restricted ElevenLabs API key (ElevenLabs → Developers → API Keys → Create API key), enable:
- Text to Speech → Access
- Voices → Read
- Conversational AI / Agents → Read & Write (required for Create Happier Agent / Update Agent)
- Optional: User → Read
Privacy controls (all voice providers)
In Settings → Voice → Privacy, you can control what session context gets sent to voice providers.
Defaults are chosen to be privacy-hardened by default:
- Session summaries and selected updates can be shared.
- Tool names can be shared.
- Tool arguments and local file paths are off by default (enable only if you want full context, since they can include sensitive content).
If you have privacy concerns, disable these toggles first:
- Share recent messages
Use my ElevenLabs: optional auto-provisioning
If you don’t want to configure an agent manually in the ElevenLabs dashboard, you can use:
- Create Happier Agent: uses your
xi-api-keyto create and configure a Conversational AI agent in your ElevenLabs account. - Update Agent: applies the latest Happier template to your existing agent.
This is optional — you can still manually enter your agent id + API key.
Local Voice (OpenAI-compatible STT/TTS + device options)
Local Voice is a turn-based pipeline:
- Record audio
- Send to your STT server (
POST /v1/audio/transcriptions) - Send the transcribed text to a local voice engine (direct or voice agent conversation mode)
- Optionally synthesize the next reply via TTS (
POST /v1/audio/speech) and play it
In Settings → Voice → Local Voice, configure:
- STT Base URL (typically ends with
/v1) or enable Device STT - TTS Base URL (typically ends with
/v1) or enable Device TTS - Optional API keys, model/voice fields, and format (
mp3/wav) - Optional hands-free / endpointing controls (when using Device STT)
Minimum configuration notes:
- To use the mic and send spoken input, you must configure STT Base URL (unless Device STT is enabled).
- To hear spoken replies, configure TTS Base URL and enable auto-speak (unless Device TTS is enabled).
- Voice agent mode is optional. If you don't enable it, Local Voice uses direct-to-session mode.
Voice agent mode (local voice)
Voice agent mode is designed to feel closer to a "colleague" experience, while keeping the session transcript clean unless the voice agent explicitly sends a session message via tools.
How it works:
- Each time you speak, Happier transcribes your audio and sends it to the voice agent.
- The voice agent replies (and Happier can speak the reply via TTS).
- When the user asks, the voice agent calls
sendSessionMessageto send one explicit message into the target session.
The voice agent can run against:
- a daemon-backed voice agent (recommended when available), or
- a user-configured OpenAI-compatible
POST /v1/chat/completionsendpoint.
Important: on phones, localhost usually points to the phone, not your computer. Use your computer’s LAN IP (e.g. http://192.168.x.x:PORT/v1) or a tunnel.
See the dedicated setup guide: Local voice providers.
Server requirements (Happier Voice)
Happier Voice is considered available when the server reports it via GET /v1/features (features.voice.enabled=true and features.voice.happierVoice.enabled=true).
To enable it in production, set:
ELEVENLABS_API_KEYELEVENLABS_AGENT_ID_PROD
Optional controls:
HAPPIER_FEATURE_VOICE__ENABLED(true/false, defaulttrue)HAPPIER_FEATURE_VOICE__REQUIRE_SUBSCRIPTION(true/false, defaults totruewhenNODE_ENV=production)VOICE_FREE_SESSIONS_PER_MONTH(default0)VOICE_FREE_MINUTES_PER_MONTH(default0)VOICE_MAX_CONCURRENT_SESSIONS(default1)VOICE_MAX_SESSION_SECONDS(default1200, min30)VOICE_MAX_MINUTES_PER_DAY(default0= unlimited)VOICE_TOKEN_MAX_PER_MINUTE(default10,0disables rate limiting)VOICE_COMPLETE_MAX_PER_MINUTE(default60,0disables rate limiting)VOICE_LEASE_CLEANUP(true/false, defaultfalse)VOICE_LEASE_RETENTION_DAYS(default30, clamp 7–365)VOICE_LEASE_CLEANUP_INTERVAL_MS(default21600000= 6h, min10000)
Usage accounting (minutes)
When using Happier Voice, the app starts a session using /v1/voice/token and (best-effort) reports completion via:
POST /v1/voice/session/completewith{ leaseId, providerConversationId }
The server fetches the provider’s conversation metadata to determine duration (the client does not self-report minutes).
Note: the server also exposes an account-scoped mint alias route at POST /v1/voice/lease/mint. Both routes mint the same kind of short-lived conversation token. The request body may omit sessionId (it is treated as optional correlation only).
ElevenLabs agent setup (manual BYO)
Both ElevenLabs modes require an ElevenLabs Conversational AI agent (agent_id).
If you’re using Create Happier Agent, Happier configures the agent for you.
If you configure the agent manually, for the best Happier experience (controlling Claude Code from voice), configure your agent with:
Dynamic variables
Happier provides these at session start:
sessionIdinitialConversationContext(a plaintext summary + recent history string from the current Happier session)
In ElevenLabs, reference these dynamic variables in your agent prompt/template so the voice assistant actually uses them.
Client tools
Happier exposes these client tools at runtime (the agent can call them if you enable tools in the agent configuration):
sendSessionMessage— parameters:{ "message": "string" }processPermissionRequest— parameters:{ "decision": "allow" | "deny" }setPrimaryActionSession— parameters:{ "sessionId": "string|null" }setTrackedSessions— parameters:{ "sessionIds": ["string", "..."] }listSessions— parameters:{ "limit"?: number, "cursor"?: "string|null" }getSessionActivity— parameters:{ "sessionId": "string" }getSessionRecentMessages— parameters:{ "sessionId": "string", "limit"?: number, "cursor"?: "string|null" }
Pagination (tools)
Some tools return a nextCursor for pagination:
listSessionsreturns{ sessions, nextCursor }getSessionRecentMessagesreturns{ messages, nextCursor }
To paginate, call the tool again with cursor: nextCursor until nextCursor is null.
Tool names and parameter shapes must match what you configure in ElevenLabs.
Prompt guidelines
If you configure manually, a good starting point:
- Use
initialConversationContextas the source of truth about the current coding session. - When the user asks to “do X”, call
sendSessionMessagewith a single, concrete instruction. - When the user asks to approve/deny a permission request, call
processPermissionRequest. - Keep responses short; confirm tool actions (“sent”, “done”) and then wait.