Clara Voice — The Real Product
What Quik tells clients: “Talk to Clara. She knows your business, your vault, your agents. She responds in your agent’s cloned voice in under 5 seconds. Every conversation syncs to the development team automatically.”
Architecture: One Brain, Two Speeds
Fast Path (Voice Conversation) — Sub-5s
Mo speaks → Clara Desktop mic → Deepgram STT (~1.5s)
→ Bun watcher → Anthropic Messages API (Opus, vault as system prompt) (~2s streaming)
→ MiniMax TTS (cloned voice) (~1.5s) → speakers
→ POST transcript + response to MCP channel (async, non-blocking)
Deep Path (Development) — This Claude Code Session
Claude Code session ← MCP inbox (transcripts + responses from voice)
→ Full context: conversation history, tools, vault, memory, 145 commands, 22 agents
→ Code generation, swarm, file editing, git, agent management
How They Connect
- Voice watcher posts every transcript + response to
http://127.0.0.1:8789/ - Claude Code sees them via
listentool — full awareness of every voice conversation - Mo can reference voice conversations in Claude Code: “remember what I told Mary about FMO?”
- Claude Code is the command center. Voice is the conversation channel.
What’s Already Built
- Deepgram STT pipeline (1.2-1.6s) ✅
- MiniMax TTS with 22 cloned agent voices ✅
- MCP channel (voice-channel.ts on port 8789) ✅
- Echo suppression (3 belts: pre-Deepgram lock, post-STT lock, similarity discard) ✅
- File-backed inbox with fs.watch ✅
- Clara Desktop Electron app ✅
- speak.py with agent voice routing ✅
What Needs to Be Built
- Bun watcher:
/wait-for-transcript→ Messages API call → speak.py → POST response to channel - Vault snapshot injector: reads vault context, builds system prompt for API call
- Conversation memory: last N exchanges stored for API context window
- Locked .dmg for Quik
- Push desktop/ to claraagents GitHub
Why This Is Honest
- Claude Code’s turn model cannot do real-time voice. That’s a product limitation, not a bug.
- Third-party integrations (Telegram, Slack) already do real-time with Claude via API.
- The tech will catch up — Anthropic may add event-driven turns from MCP channels.
- Until then, this architecture delivers a real product that Quik can sell.
- Feature request sent to Anthropic for real-time MCP channel events.
What NOT to Promise
- “Claude Code responds to your voice in real time” — FALSE
- “The AI remembers everything from your session” — only via vault snapshot, not full session history
- “Sub-1 second responses” — unrealistic with STT + API + TTS chain
What TO Promise
- “Talk to Clara, she responds in under 5 seconds in her own voice”
- “Every conversation is logged and available to the development team”
- “Clara knows your business from the vault”
- “22 AI agents, each with their own personality and cloned voice”