Clara Desktop — Real-Time Voice Requirements
Date: 2026-03-22 Author: Granville (Architect) Priority: P0 — Most important piece of functionality
What Mo Wants
- Open Clara Desktop → talk to agents → real conversation (1-5 sec latency)
- Quik gets the same experience on his machine
- Conference mode — Mo and Quik both talking to agents
- 5-4-3-2-1 countdown audio while AI processes (no dead air)
- Locked build for Quik (no DevTools, no source access)
- Local server for Mo (zero network latency), EC2 for everyone else
Current State
- Electron app EXISTS at
/Volumes/X10-Pro/Native-Projects/apps/clara-desktop/ - Voice server EXISTS at
infrastructure/voice/server/ /voice-directendpoint works (HTTP batch: upload audio → process → return)- Pipecat streaming pipeline in bot.py (switched to Groq, not deployed yet)
- CSP blocking localhost connections (needs fix)
- Dev mode works but has security warnings
Architecture
Voice Flow (Mo — Local)
Mic → Electron app → localhost:7860/voice-direct → Deepgram STT → Groq LLM → MiniMax TTS → Speakers
^ |
|_________________________ 5,4,3,2,1 countdown plays while waiting __________________|
Voice Flow (Quik — Remote)
Mic → Electron app → EC2 voice server/voice-direct → Deepgram STT → Groq LLM → MiniMax TTS → Speakers
Conference Mode
Mo's mic ──→ Daily.co room ←── Quik's mic
↓
Pipecat pipeline
(Deepgram → Groq → MiniMax)
↓
Agents respond in room
↓
Mo's speakers ←──────────→ Quik's speakers
TDD Approach — Test Before Code
Test 1: CSP allows localhost
- Fix CSP in index.html AND forge.config.ts devContentSecurityPolicy
- Verify:
fetch('http://localhost:7860/agents')succeeds in renderer
Test 2: Mic capture works
- Verify: MediaRecorder captures audio chunks
- Verify: Blob size > 500 bytes after 3 seconds
Test 3: Voice-direct round trip
- Send audio blob to /voice-direct
- Verify: Response has transcript + response + audio_base64
- Measure: Total latency < 5 seconds
Test 4: Audio playback
- Decode base64 audio → Blob → Audio element
- Verify: Audio plays without errors
- Verify: Mic pauses during playback, resumes after
Test 5: Countdown plays during wait
- Verify: Web Speech API starts “Got it. 5, 4, 3, 2, 1” immediately
- Verify: Countdown cancels when response arrives
- Verify: No overlap between countdown and response audio
Test 6: Locked mode
- Build with
CLARA_LOCKED=true - Verify: DevTools cannot open (Cmd+Shift+I, F12 blocked)
- Verify: Right-click disabled
- Verify: Source code not visible in .app bundle
Test 7: Conference mode
- Two clients join same Daily.co room
- Agent responds to both participants
- Both hear the response
Known Issues to Fix
- CSP in index.html needs
http://localhost:*(already there but Forge webpack overrides it) - forge.config.ts
devContentSecurityPolicyneeds updating - Python 3.14 can’t build Pipecat (llvmlite) — use 3.12 venv
- Voice server .env needs all API keys (added: Groq, Deepgram, MiniMax)
- Renderer.ts just logs a wave emoji — needs actual voice logic or index.html handles it
Files to Modify
apps/clara-desktop/forge.config.ts— CSP fixapps/clara-desktop/src/index.html— Voice UI + countdownapps/clara-desktop/src/index.ts— Locked modeinfrastructure/voice/server/bot.py— Groq + thinking phrases (DONE)infrastructure/voice/server/server.py— No changes neededinfrastructure/voice/server/requirements.txt— Updated (DONE)
Deploy Checklist
- Fix CSP (forge.config.ts + index.html)
- Test voice-direct round trip locally
- Test countdown audio
- Test locked mode build
- Deploy updated bot.py + requirements to EC2
- Package locked build for Quik
- Test conference mode with Daily.co