Clara Desktop — Real-Time Voice Requirements

Date: 2026-03-22 Author: Granville (Architect) Priority: P0 — Most important piece of functionality

What Mo Wants

  1. Open Clara Desktop → talk to agents → real conversation (1-5 sec latency)
  2. Quik gets the same experience on his machine
  3. Conference mode — Mo and Quik both talking to agents
  4. 5-4-3-2-1 countdown audio while AI processes (no dead air)
  5. Locked build for Quik (no DevTools, no source access)
  6. Local server for Mo (zero network latency), EC2 for everyone else

Current State

  • Electron app EXISTS at /Volumes/X10-Pro/Native-Projects/apps/clara-desktop/
  • Voice server EXISTS at infrastructure/voice/server/
  • /voice-direct endpoint works (HTTP batch: upload audio → process → return)
  • Pipecat streaming pipeline in bot.py (switched to Groq, not deployed yet)
  • CSP blocking localhost connections (needs fix)
  • Dev mode works but has security warnings

Architecture

Voice Flow (Mo — Local)

Mic → Electron app → localhost:7860/voice-direct → Deepgram STT → Groq LLM → MiniMax TTS → Speakers
         ^                                                                                    |
         |_________________________ 5,4,3,2,1 countdown plays while waiting __________________|

Voice Flow (Quik — Remote)

Mic → Electron app → EC2 voice server/voice-direct → Deepgram STT → Groq LLM → MiniMax TTS → Speakers

Conference Mode

Mo's mic ──→ Daily.co room ←── Quik's mic
                 ↓
           Pipecat pipeline
      (Deepgram → Groq → MiniMax)
                 ↓
           Agents respond in room
                 ↓
Mo's speakers ←──────────→ Quik's speakers

TDD Approach — Test Before Code

Test 1: CSP allows localhost

  • Fix CSP in index.html AND forge.config.ts devContentSecurityPolicy
  • Verify: fetch('http://localhost:7860/agents') succeeds in renderer

Test 2: Mic capture works

  • Verify: MediaRecorder captures audio chunks
  • Verify: Blob size > 500 bytes after 3 seconds

Test 3: Voice-direct round trip

  • Send audio blob to /voice-direct
  • Verify: Response has transcript + response + audio_base64
  • Measure: Total latency < 5 seconds

Test 4: Audio playback

  • Decode base64 audio → Blob → Audio element
  • Verify: Audio plays without errors
  • Verify: Mic pauses during playback, resumes after

Test 5: Countdown plays during wait

  • Verify: Web Speech API starts “Got it. 5, 4, 3, 2, 1” immediately
  • Verify: Countdown cancels when response arrives
  • Verify: No overlap between countdown and response audio

Test 6: Locked mode

  • Build with CLARA_LOCKED=true
  • Verify: DevTools cannot open (Cmd+Shift+I, F12 blocked)
  • Verify: Right-click disabled
  • Verify: Source code not visible in .app bundle

Test 7: Conference mode

  • Two clients join same Daily.co room
  • Agent responds to both participants
  • Both hear the response

Known Issues to Fix

  1. CSP in index.html needs http://localhost:* (already there but Forge webpack overrides it)
  2. forge.config.ts devContentSecurityPolicy needs updating
  3. Python 3.14 can’t build Pipecat (llvmlite) — use 3.12 venv
  4. Voice server .env needs all API keys (added: Groq, Deepgram, MiniMax)
  5. Renderer.ts just logs a wave emoji — needs actual voice logic or index.html handles it

Files to Modify

  • apps/clara-desktop/forge.config.ts — CSP fix
  • apps/clara-desktop/src/index.html — Voice UI + countdown
  • apps/clara-desktop/src/index.ts — Locked mode
  • infrastructure/voice/server/bot.py — Groq + thinking phrases (DONE)
  • infrastructure/voice/server/server.py — No changes needed
  • infrastructure/voice/server/requirements.txt — Updated (DONE)

Deploy Checklist

  • Fix CSP (forge.config.ts + index.html)
  • Test voice-direct round trip locally
  • Test countdown audio
  • Test locked mode build
  • Deploy updated bot.py + requirements to EC2
  • Package locked build for Quik
  • Test conference mode with Daily.co