Testing & CI
Test tiers, how to run them locally, and how they are gated in CI/releases.
Test tiers (repo-wide model)
We use these lanes consistently across scripts and GitHub Actions.
1) Unit
Fast deterministic tests for pure logic/component behavior.
Root command:
yarn testThis runs:
yarn workspace @happier-dev/protocol testyarn workspace @happier-dev/transfers testyarn workspace @happier-dev/agents testyarn workspace @happier-dev/cli-common testyarn workspace @happier-dev/connection-supervisor testyarn workspace @happier-dev/app testyarn workspace @happier-dev/cli test:unityarn --cwd apps/server test:unityarn --cwd packages/relay-server testyarn --cwd apps/stack test:unit
Note: the UI workspace package name is @happier-dev/app and its directory is apps/ui.
2) Integration
Real process/filesystem/network orchestration tests at app level.
Naming convention:
*.integration.test.**.integration.spec.**.real.integration.test.*
Root command:
yarn test:integrationThis runs:
yarn workspace @happier-dev/app test:integrationyarn workspace @happier-dev/cli test:integrationyarn --cwd apps/server test:integrationyarn --cwd apps/stack test:integration
3) Server DB contract
Server-only contract suite against explicit DB providers (Postgres/MySQL in Docker lanes).
Root command:
yarn test:db-contract:dockerPackage-local command:
yarn --cwd apps/server test:db-contract4) Core E2E
End-to-end suites in packages/tests against real server/socket contracts.
Root commands:
yarn test:e2e
yarn test:e2e:core:fast
yarn test:e2e:core:slow
yarn test:e2e:core:embeddedLane intent:
test:e2e/test:e2e:core= full core gate.test:e2e:core:fast= default local loop (excludes the longest process-orchestration scenarios).test:e2e:core:slow= long-running scenarios (switching/materialization/mode orchestration).
Core E2E slow naming convention:
*.slow.e2e.test.tsfiles run only intest:e2e:core:slow.- All other
suites/core-e2e/**/*.test.tsfiles run intest:e2e:core:fast. test:e2estill runs the full core suite (fast + slow together).
4b) UI E2E (Playwright)
Browser-driven end-to-end tests against the Expo web UI, using the same server-light + daemon harness used by core E2E.
Root command:
yarn test:e2e:uiNotes:
- These tests run Playwright against
expo start --weband require Playwright browsers to be installed (see Playwright install docs). - UI E2E should stay small and focus on flows that are uniquely UI-driven (auth/login/connect, navigation, and key cross-app wiring). Lower-level server/daemon invariants belong in core E2E.
- If you suspect stale Metro transforms, you can opt into clearing the Expo/Metro cache with
HAPPIER_E2E_EXPO_CLEAR=1(default is off because--clearcan occasionally crash Metro). - Manual QA tip: if concurrent code changes cause the UI to keep reloading, you can disable Expo web Fast Refresh/HMR per browser tab by opening the UI with
?happier_hmr=0(re-enable with?happier_hmr=1). This is web-only + dev-only.
4c) Native E2E (Maestro) (iOS/Android)
Native end-to-end tests using Maestro. These are intended to cover mobile-specific regressions (touch/keyboard/back/gesture/popup rendering).
Root commands:
yarn test:e2e:mobile
yarn test:e2e:mobile:android
yarn test:e2e:mobile:iosNotes:
- These lanes are not PR checks by default; they’re intended for manual dispatch / release validation while we stabilize CI emulator runs.
- You need Java 17+ and an iOS simulator / Android emulator.
- Start Metro for the Expo Dev Client (Maestro targets React Native
testIDvia native view identifiers). - Set the target server URL (host):
HAPPIER_E2E_SERVER_URL=http://127.0.0.1:<port>.- Android emulator runs automatically map
127.0.0.1/localhostto10.0.2.2for device-visible networking.
- Android emulator runs automatically map
- If the server/Metro run on your host machine, enable
adb reverseso Android can reach them:HAPPIER_E2E_ANDROID_ADB_REVERSE=1. - Artifacts are written under
packages/tests/.project/logs/e2e/mobile-maestro/. - Override Maestro binary (if needed):
HAPPIER_E2E_MAESTRO_BIN=/path/to/maestro.
4d) WSREPL Lima matrix (macOS/Linux hosts)
The Lima host↔guest WSREPL matrix is exposed as an opt-in tests-owned lane.
Root command:
yarn test:e2e:ui:wsrepl:lima -- happier-wsrepl-qaPackage-local command:
yarn workspace @happier-dev/tests test:ui:e2e:wsrepl:lima -- happier-wsrepl-qaHarness self-tests:
yarn test:e2e:ui:wsrepl:lima:selfNotes:
- This lane is intentionally not part of the default GitHub-hosted PR matrix yet. Use it locally or on a self-hosted runner with Lima available.
- The tests workspace now owns the lane entrypoint, the raw WSREPL Lima harness, and the Lima bootstrap helper:
packages/tests/scripts/run-wsrepl-lima-matrix.mjs,packages/tests/scripts/wsrepl-lima-matrix.sh,packages/tests/scripts/lima-vm.sh. - The stack copies under
apps/stack/scripts/provision/are compatibility shims only and are excluded from the published stack package. - Pass VM names and existing WSREPL env overrides after
--. The wrapper always runs the underlying harness fromapps/stack/, so existing report paths and relative script calls remain stable.
5) Providers
Provider contract/baseline suites in packages/tests (real provider CLIs).
Root commands:
yarn test:providers
yarn test:providers:all:smoke
yarn test:providers:claude:extended
yarn test:providers:codex:smoke
yarn test:providers:opencode:extended6) Stress
Repeat/chaos reliability suites in packages/tests.
Root command:
yarn test:stress7) Test governance
Repo-level governance checks for test file naming/lane placement and feature-tag validity.
Root commands:
yarn test:wiring:self
yarn test:wiring
yarn test:policy:self
yarn test:policy
yarn test:inventory
yarn test:migration:inventoryNotes:
test:wiring:selfandtest:policy:selfare fast governance self-tests for the validators themselves.test:wiringfails on invalid feature tags, obvious lane naming mistakes, and root-script/workflow/docs parity drift.test:policyenforces only the low-noise rules that are safe today; noisier rules stay report-only until the repo cleanup catches up.test:inventoryandtest:migration:inventoryare report-only. They print lane/package-local coverage and write governance inventory artifacts under.project/testing/reports/governance/.- The current rollout is:
enforce:.only, forbidden runtime imports from@happier-dev/testsinternals, root-script/workflow/docs parityreport-only: direct testconsole.*, direct.skip/.todo, hidden skip aliases, deprecated helper imports, duplicate-pattern inventory
7b) Package-local-only lanes
These lanes are intentionally not exposed as canonical root scripts:
- CLI slow lane:
yarn --cwd apps/cli test:slow - Website tests: package-local only under
apps/website - Release runtime tests:
yarn --cwd packages/release-runtime test - Stack unit/integration/native real-integration lanes:
yarn --cwd apps/stack test:unit,yarn --cwd apps/stack test:integration, and the guarded real-integration files inapps/stack
8) Typecheck
TypeScript typechecking for the main runtime packages (UI/CLI/server).
Root command:
yarn typecheck9) Release contracts
Security/release-pipeline contract checks for workflow/install/signing invariants.
Root command:
yarn test:release:contractsThis runs:
scripts/release/*.test.mjs- including installer sync/security/publication invariants
- and workflow policy guards like the production Tauri signing/notarization fail gate
10) Extended DB matrix (Postgres/MySQL)
Optional validation that core E2E and DB contract tests behave on non-embedded engines. In CI this runs via a dedicated workflow using service containers.
Local commands:
yarn test:e2e:core:docker
yarn test:db-contract:dockerPackage-level lane scripts
UI (apps/ui, workspace @happier-dev/app)
- Unit:
yarn --cwd apps/ui test:unit - Integration:
yarn --cwd apps/ui test:integration
CLI (apps/cli)
- Unit:
yarn --cwd apps/cli test:unit - Integration:
yarn --cwd apps/cli test:integration - Slow-only lane:
yarn --cwd apps/cli test:slow
Server (apps/server)
- Unit:
yarn --cwd apps/server test:unit - Integration:
yarn --cwd apps/server test:integration - DB contract:
yarn --cwd apps/server test:db-contract
Stack (apps/stack)
- Unit:
yarn --cwd apps/stack test:unit - Integration:
yarn --cwd apps/stack test:integration
E2E workspace (packages/tests, workspace @happier-dev/tests)
- Core (full):
yarn workspace @happier-dev/tests test - Core fast:
yarn workspace @happier-dev/tests test:core:fast - Core slow:
yarn workspace @happier-dev/tests test:core:slow - UI WSREPL Lima matrix (macOS/Linux host):
yarn workspace @happier-dev/tests test:ui:e2e:wsrepl:lima -- happier-wsrepl-qa - Providers:
yarn workspace @happier-dev/tests test:providers - Stress:
yarn workspace @happier-dev/tests test:stress
Env-gated integration suites
Some integration suites intentionally require explicit environment flags:
- CLI daemon reattach/pid safety:
HAPPIER_CLI_DAEMON_REATTACH_INTEGRATION=1 - CLI tmux integration paths in CI:
HAPPIER_CLI_TMUX_INTEGRATION=1
CI sets these for relevant jobs. Local runs can set them manually when you need those suites.
Prerequisites matrix
Unit (yarn test)
- Requires: Node + Yarn only.
- Should not require tmux, Docker, provider CLIs, or API keys.
Integration (yarn test:integration)
- UI integration:
- Requires: Node + Yarn.
- CLI integration:
- Requires: Node + Yarn.
- Optional gated suites:
HAPPIER_CLI_TMUX_INTEGRATION=1andtmuxinstalled for*.real.integration.test.tstmux coverage.HAPPIER_CLI_DAEMON_REATTACH_INTEGRATION=1for real PID/reattach process inspection suites.
- Server integration:
- Requires: Node + Yarn.
- Does not require Docker for regular integration lane.
- Stack integration:
- Requires: Node + Yarn +
gitavailable in PATH. - Spawns real child processes and validates process ownership/sweep behavior.
- Requires: Node + Yarn +
DB contract (yarn --cwd apps/server test:db-contract)
- Requires: DB provider runtime (
postgres/mysql) and matching connection env. - Docker-backed helper commands:
yarn test:db-contract:docker
Core E2E (yarn test:e2e, yarn test:e2e:core:embedded)
- Requires: Node + Yarn.
- Uses real server/socket/runtime contracts via
packages/tests. - Does not require provider API keys.
- Recommended local loop:
yarn test:e2e:core:fastduring iteration.yarn test:e2e:core:slowbefore handoff for orchestration-heavy changes.yarn test:e2efor the full core gate.
Provider suites (yarn test:providers, yarn test:providers:*)
yarn test:providersby itself runs provider-suite checks with provider execution disabled by default (HAPPIER_E2E_PROVIDERSunset), so it does not require provider CLIs or API keys.yarn test:providers:all:smoke/yarn test:providers:claude:extended(and similar wrappers) enable real provider execution and then require provider CLIs/binaries for selected providers.- Direct workspace command (advanced):
yarn workspace @happier-dev/tests providers:run <preset> <lane>. - Auth mode is provider-spec driven:
hostmode: uses existing local provider CLI login/session (can run without API keys).envmode: requires provider keys/secrets declared by provider spec.
- Practical guidance:
smoke: minimal scenario tier, but still real provider execution when run viatest:providers:*wrappers orproviders:run.extended: broader scenario tier with more scenarios; in CI this typically runs with env-key/secrets overlays.
Real Claude Agent Teams probe (opt-in)
Happier includes a small opt-in probe that runs your locally installed claude CLI to discover the exact Agent Teams tool names and capture representative payload shapes:
- Test file:
packages/tests/suites/providers/claude.agentTeams.toolNames.realProbe.test.ts
Run it locally (requires Claude Code installed and already authenticated on the host):
HAPPIER_TEST_REAL_CLAUDE=1 yarn -s workspace @happier-dev/tests test:providers claude.agentTeams.toolNames.realProbe.test.tsExtended signal (spawns teammates and sends messages; may take longer):
HAPPIER_TEST_REAL_CLAUDE=1 HAPPIER_TEST_REAL_CLAUDE_FULL=1 yarn -s workspace @happier-dev/tests test:providers claude.agentTeams.toolNames.realProbe.test.tsStress (yarn test:stress)
- Requires: Node + Yarn.
- Longer-running reliability loops; keep out of PR default local loop.
Artifacts
Core E2E / Providers / Stress write per-test logs under:
.project/logs/e2e/...
CI uploads this directory as an artifact on failure.
GitHub Actions (what runs where)
Default PR/push gate
Workflow: CI — Tests (.github/workflows/tests.yml)
Default gate includes:
- Unit (shared workspace packages + app/unit lanes)
- Integration (UI/CLI/Server/Stack app lanes)
- Governance validators (
test:wiring,test:policy, plus report-only inventory commands) - Typecheck
- Release contracts (release/workflow/install/signing guards)
- Daemon integration E2E (
CLI + Server (light/sqlite) E2E) - Core E2E fast (sqlite)
The daemon integration E2E job is intentionally separate from app unit/integration lanes because it verifies cross-package boot/auth/daemon behavior against a real light server process.
On-demand provider contracts
Workflow: CI — Provider Contracts (.github/workflows/providers-contracts.yml)
Required secrets:
OPENAI_API_KEY(required forcodexandopencode)ANTHROPIC_API_KEY(required forclaude)CODEX_API_KEY(optional alternative toOPENAI_API_KEYfor Codex)
Nightly stress
Workflow: CI — Stress Tests (.github/workflows/stress-tests.yml)
Nightly extended DB matrix
Workflow: CI — Extended DB Matrix (.github/workflows/extended-db-tests.yml)
Contributor guidance
- Default local loop: run the smallest unit target first.
- If your test needs real process/filesystem/network orchestration, name it as integration and keep it out of unit lane.
- Before handoff, run the relevant app/unit and app/integration lanes for touched areas.
- For protocol/runtime/socket/database contract changes, also run core E2E.