Clara Voice — Swarm Acceptance Criteria

Set by: Amen Ra (CTO) Deadline: Friday March 28, 2026 (demo) Repo: claraagents/desktop/

Business AC (A. Philip)

Voice IS the prompt — Mo speaks into Clara Desktop, transcript lands in the Claude Code session via Channels MCP. No typing required.
Opus IS the brain — This Claude Code session (with full vault context) responds. NOT Groq, NOT a separate LLM. The SAME session.
Agent responds in cloned voice — Reply goes through speak.py → MiniMax TTS → Mac speakers in the agent’s cloned voice.
Clara Desktop IS the product — replaces Slack/Zoom/Teams. Not a developer tool.
Huddle feature — Stream-powered voice/video calls with human + AI agent participants.

Technical AC (Granville)

Clara Desktop mic → Deepgram STT → POST to MCP channel (port 8789)
Channel delivers transcript to warm Claude Code session
Opus responds in-session (1-3 sentences, conversational)
Reply tool → speak.py → MiniMax TTS → speakers (cloned voice)
Latency (updated March 24 by Mo):
- STT + POST to 8789: <5s (already achieved ~1.5-2s)
- Session turn time: minimize via watcher/notify/blocking-wait — no hard <5s guarantee while Claude Code turn model is request-response
- NO fake instant replies via Messages API or separate LLM — this session IS the brain
- Metric: median time-to-first-listen improved via discipline, not by changing who the brain is
- If product later needs marketing-grade “always <5s voice,” re-open as separate AC decision (hybrid A+B)
Locked build for Quik (CLARA_LOCKED=true, .dmg packaged)
Huddle: Stream call, multiple humans + agents in same room
Desktop/ committed to claraagents repo on GitHub

Decision Record: Real-time voice in Claude Code NOT POSSIBLE (March 24, 2026)

Set by: Amen Ra Finding: Claude Code’s turn model (request-response) prevents real-time voice conversation. MCP channel notifications don’t reliably trigger turns. 4 days spent confirming this (Sessions 19-23). Resolution: Real-time voice requires Messages API (same Opus model, vault injected as system prompt). Claude Code stays the command center for coding, tools, swarm, vault — not for voice conversation. Feature request to Anthropic: Event-driven turns from MCP channel notifications would solve this. Third-party integrations (Telegram, Slack) already do real-time with Claude via API — Claude Code should too.

What’s Already Done (Session 19)

What’s Left for Swarm

AC 1-4: Wire Clara Desktop → STT only → POST to 8789 → Opus responds → reply tool → speak.py
AC #5: Test round trip timing
AC #6: Package locked .dmg
AC #7: Stream Huddle with multiple participants
AC #8: Push desktop/ to claraagents GitHub

Architecture (CORRECT — from vault)

Clara Desktop mic → Deepgram STT → voice server /voice-channel
  → POST to MCP channel (port 8789) → notifications/claude/channel
  → THIS Claude Code session responds → reply tool → speak.py → speakers

Voice server does STT + TTS ONLY. Opus is the brain. NOT Groq.

Auset Brain

Explorer

clara-voice-ac-swarm