Clara Voice — The Real Product

What Quik tells clients: “Talk to Clara. She knows your business, your vault, your agents. She responds in your agent’s cloned voice in under 5 seconds. Every conversation syncs to the development team automatically.”

Architecture: One Brain, Two Speeds

Fast Path (Voice Conversation) — Sub-5s

Mo speaks → Clara Desktop mic → Deepgram STT (~1.5s)
  → Bun watcher → Anthropic Messages API (Opus, vault as system prompt) (~2s streaming)
  → MiniMax TTS (cloned voice) (~1.5s) → speakers
  → POST transcript + response to MCP channel (async, non-blocking)

Deep Path (Development) — This Claude Code Session

Claude Code session ← MCP inbox (transcripts + responses from voice)
  → Full context: conversation history, tools, vault, memory, 145 commands, 22 agents
  → Code generation, swarm, file editing, git, agent management

How They Connect

  • Voice watcher posts every transcript + response to http://127.0.0.1:8789/
  • Claude Code sees them via listen tool — full awareness of every voice conversation
  • Mo can reference voice conversations in Claude Code: “remember what I told Mary about FMO?”
  • Claude Code is the command center. Voice is the conversation channel.

What’s Already Built

  • Deepgram STT pipeline (1.2-1.6s) ✅
  • MiniMax TTS with 22 cloned agent voices ✅
  • MCP channel (voice-channel.ts on port 8789) ✅
  • Echo suppression (3 belts: pre-Deepgram lock, post-STT lock, similarity discard) ✅
  • File-backed inbox with fs.watch ✅
  • Clara Desktop Electron app ✅
  • speak.py with agent voice routing ✅

What Needs to Be Built

  • Bun watcher: /wait-for-transcript → Messages API call → speak.py → POST response to channel
  • Vault snapshot injector: reads vault context, builds system prompt for API call
  • Conversation memory: last N exchanges stored for API context window
  • Locked .dmg for Quik
  • Push desktop/ to claraagents GitHub

Why This Is Honest

  • Claude Code’s turn model cannot do real-time voice. That’s a product limitation, not a bug.
  • Third-party integrations (Telegram, Slack) already do real-time with Claude via API.
  • The tech will catch up — Anthropic may add event-driven turns from MCP channels.
  • Until then, this architecture delivers a real product that Quik can sell.
  • Feature request sent to Anthropic for real-time MCP channel events.

What NOT to Promise

  • “Claude Code responds to your voice in real time” — FALSE
  • “The AI remembers everything from your session” — only via vault snapshot, not full session history
  • “Sub-1 second responses” — unrealistic with STT + API + TTS chain

What TO Promise

  • “Talk to Clara, she responds in under 5 seconds in her own voice”
  • “Every conversation is logged and available to the development team”
  • “Clara knows your business from the vault”
  • “22 AI agents, each with their own personality and cloned voice”