Grand Diomande Research · Full HTML Reader

Stage 1 Path B: The Spoken Mesh — Agents That Talk Back

> Grounded in: Stage 0 finding that agents return results as text to Discord/terminal. No mesh-level TTS exists. The mesh is mute.

Agents That Account for Themselves architecture technical paper candidate score 26 .md

Full Public Reader

Stage 1 Path B: The Spoken Mesh — Agents That Talk Back

> Grounded in: Stage 0 finding that agents return results as text to Discord/terminal. No mesh-level TTS exists. The mesh is mute.

Core Thesis

A voice-first architecture isn't just about listening — it's about speaking. When a Prefect flow completes, when a pane finishes a pulse task, when Evolution World triggers a mutation, when a security alert fires — these events should be spoken aloud. Not all of them. The critical ones. The mesh should have a voice.

The Mechanism

1. Event-to-Speech Bridge:

python
class MeshVoice:
    """Subscribes to NUMU events and speaks important ones."""
    VOICE_EVENTS = {
        "pulse.complete": "Pulse {task} completed on {pane}.",
        "pane.spawn": "New pane spawned for {project}.",
        "pane.absorbing": "Warning: {pane} is absorbing. Unstick recommended.",
        "security.alert": "Security alert: {message}.",
        "build.success": "{app} build succeeded. Ready for upload.",
        "build.failure": "{app} build failed: {error}.",
        "ew.mutation": "Evolution World mutated {target}.",
        "flow.error": "Prefect flow {name} failed: {error}.",
        "creator_shield.escalation": "Creator Shield escalation: {severity}.",
    }

    async def on_event(self, event: NumuEvent):
        template = self.VOICE_EVENTS.get(event.type)
        if template:
            text = template.format(**event.data)
            await self.speak(text, priority=event.priority)

    async def speak(self, text: str, priority: str = "normal"):
        if priority == "critical":
            # ElevenLabs for critical (higher quality, slight latency)
            audio = await elevenlabs_tts(text)
            play_audio(audio)
        else:
            # macOS say for routine (instant, no API call)
            subprocess.run(["say", "-v", "Samantha", text])

2. Priority and Suppression:
- Critical events (security, build failure, absorbing panes) always spoken
- Normal events spoken only when no active voice conversation
- Suppress during "focus mode" (user says "quiet" or "mute mesh")
- Rate limiting: max 1 spoken event per 30 seconds for non-critical
- Queue events during suppression, summarize when un-muted: "While you were focused, 3 builds completed and 1 flow failed"

3. Voice Personality:
- ElevenLabs voice ID already configured: `TmSgyk1vGAD9YzdtJV3V`
- Consistent voice across all mesh announcements
- Tone varies by severity: neutral for status, urgent cadence for alerts
- Optional: different voices for different subsystems (infra = deep voice, creative = lighter)

4. Spatial Audio (future):
- Mac speakers can do stereo positioning
- Left speaker = build/deploy events, right speaker = creative/content events
- Or: volume correlates with priority (whisper for info, full volume for critical)

What This Solves

  • Ambient awareness without checking Discord/terminal
  • Critical alerts reach you even when not looking at a screen
  • Summarized status updates reduce context-switching
  • The mesh "personality" becomes tangible — it has a voice
  • Builds on existing ElevenLabs integration (voice ID, API key configured)

What This Risks

  • Annoying if poorly tuned — constant spoken interruptions are worse than silence
  • ElevenLabs API calls add cost (~$0.30/1000 chars, critical events only mitigates)
  • System `say` voice quality is mediocre for non-English text
  • Audio output conflicts with music, calls, video editing
  • Privacy: spoken agent output audible to anyone nearby

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

evo-cube-output/voice-first-agent-architecture/stage1-path-b.md

Detected Structure

Method · Evaluation · Figures · Architecture · is Stage Research