Clara Voice — The Real Product

What Quik tells clients: “Talk to Clara. She knows your business, your vault, your agents. She responds in your agent’s cloned voice in under 5 seconds. Every conversation syncs to the development team automatically.”

Architecture: One Brain, Two Speeds

Fast Path (Voice Conversation) — Sub-5s

Mo speaks → Clara Desktop mic → Deepgram STT (~1.5s)
  → Bun watcher → Anthropic Messages API (Opus, vault as system prompt) (~2s streaming)
  → MiniMax TTS (cloned voice) (~1.5s) → speakers
  → POST transcript + response to MCP channel (async, non-blocking)

Deep Path (Development) — This Claude Code Session

Claude Code session ← MCP inbox (transcripts + responses from voice)
  → Full context: conversation history, tools, vault, memory, 145 commands, 22 agents
  → Code generation, swarm, file editing, git, agent management

How They Connect

Voice watcher posts every transcript + response to http://127.0.0.1:8789/
Claude Code sees them via listen tool — full awareness of every voice conversation
Mo can reference voice conversations in Claude Code: “remember what I told Mary about FMO?”
Claude Code is the command center. Voice is the conversation channel.

What’s Already Built

Deepgram STT pipeline (1.2-1.6s) ✅
MiniMax TTS with 22 cloned agent voices ✅
MCP channel (voice-channel.ts on port 8789) ✅
Echo suppression (3 belts: pre-Deepgram lock, post-STT lock, similarity discard) ✅
File-backed inbox with fs.watch ✅
Clara Desktop Electron app ✅
speak.py with agent voice routing ✅

What Needs to Be Built

Bun watcher: /wait-for-transcript → Messages API call → speak.py → POST response to channel
Vault snapshot injector: reads vault context, builds system prompt for API call
Conversation memory: last N exchanges stored for API context window
Locked .dmg for Quik
Push desktop/ to claraagents GitHub

Why This Is Honest

Claude Code’s turn model cannot do real-time voice. That’s a product limitation, not a bug.
Third-party integrations (Telegram, Slack) already do real-time with Claude via API.
The tech will catch up — Anthropic may add event-driven turns from MCP channels.
Until then, this architecture delivers a real product that Quik can sell.
Feature request sent to Anthropic for real-time MCP channel events.

What NOT to Promise

“Claude Code responds to your voice in real time” — FALSE
“The AI remembers everything from your session” — only via vault snapshot, not full session history
“Sub-1 second responses” — unrealistic with STT + API + TTS chain

What TO Promise

“Talk to Clara, she responds in under 5 seconds in her own voice”
“Every conversation is logged and available to the development team”
“Clara knows your business from the vault”
“22 AI agents, each with their own personality and cloned voice”

Auset Brain

Explorer

clara-voice-real-product