Grand Diomande Research · Full HTML Reader

speakd v1.0 — Architecture

``` ┌─────────────────────────────────┐ │ Fn Key Pressed │ │ (CGEvent tap + poll detect) │ └────────────────┬────────────────┘ │ ┌────────────────▼────────────────┐ │ 96kHz Audio Capture (cpal) │ │ Built-in mic preferred over │ │ Continuity/iPhone │ └────────────────┬────────────────┘ │ ┌────────────────▼────────────────┐ │ Fn Key Released │ │ (poll detects within 100ms) │ └────────────────┬────────────────┘ │ ┌────────────────▼────────────────┐ │ WAV Encoding (hound crate) │ │ Native rate, 16-bit mono │ └─────────

Language as Infrastructure architecture technical paper candidate score 50 .md

Full Public Reader

speakd v1.0 — Architecture

Fallback Chain

                    ┌─────────────────────────────────┐
                    │         Fn Key Pressed           │
                    │    (CGEvent tap + poll detect)    │
                    └────────────────┬────────────────┘
                                     │
                    ┌────────────────▼────────────────┐
                    │     96kHz Audio Capture (cpal)   │
                    │    Built-in mic preferred over   │
                    │    Continuity/iPhone             │
                    └────────────────┬────────────────┘
                                     │
                    ┌────────────────▼────────────────┐
                    │         Fn Key Released          │
                    │    (poll detects within 100ms)   │
                    └────────────────┬────────────────┘
                                     │
                    ┌────────────────▼────────────────┐
                    │     WAV Encoding (hound crate)   │
                    │    Native rate, 16-bit mono      │
                    └────────────────┬────────────────┘
                                     │
               ┌─────────────────────▼─────────────────────┐
               │          TRANSCRIPTION ROUTER              │
               │  Tries each tier, first success wins       │
               │  3s timeout on mesh, 10s on MLX, 15s cloud │
               └─────────────────────┬─────────────────────┘
                                     │
        ┌────────────────────────────┼────────────────────────────┐
        │                            │                            │
   ┌────▼────┐                 ┌─────▼─────┐                ┌────▼────┐
   │ TIER 1  │  fail/timeout   │  TIER 2   │  fail/timeout  │ TIER 3  │
   │  MESH   │ ──────────────► │   CLOUD   │ ─────────────► │  LOCAL  │
   │  (LAN)  │                 │ (Internet) │                │(Offline)│
   └────┬────┘                 └─────┬─────┘                └────┬────┘
        │                            │                            │
   ┌────▼────────────┐         ┌─────▼──────────┐          ┌─────▼──────────┐
   │ 1a. Mac4 :9530  │         │ 4. OpenAI      │          │ 5. Local macOS │
   │ SFSpeechRec +   │         │ Whisper API    │          │ SFSpeechRec    │
   │ SpeechTranscriber│         │ whisper-1      │          │ on-device only │
   │ (0.7s, free)    │         │ (2-4s, $0.006) │          │ (1-3s, free)   │
   └────┬─────────────┘         └────────────────┘          └────────────────┘
        │ fail
   ┌────▼────────────┐
   │ 1b. Mac2 :9530  │
   │ SpeechRec relay │
   │ (not yet built) │
   └────┬─────────────┘
        │ fail
   ┌────▼────────────┐
   │ 2. Mac5 :8100   │
   │ MLX Whisper     │
   │ (not configured)│
   └──────────────────┘

Recording Modes

ModeActivationStopUse Case
HoldHold FnRelease FnQuick dictation (<30s)
ToggleFn + SpacePress Fn againLong dictation, meetings

Post-Transcription Pipeline

Transcription text
    │
    ├── Punctuation pass (if source lacks punctuation)
    │   └── GPT-5.4-nano API call (~200ms, <$0.001)
    │       Only for: mac4-speech-analyzer, local-on-device
    │       Skipped for: openai-whisper (already punctuated)
    │
    ├── pbcopy + Cmd+V paste (immediate)
    │
    ├── SQLite history save ([home-path])
    │   └── Fields: text, duration, source, timestamp
    │
    └── Background post-processing (non-blocking thread)
        ├── Personal knowledge chain (Desktop/Speak/core/)
        └── Smart notes → Apple Notes

Audio Feedback

EventSoundTiming
Recording startsTink.aiffImmediate on Fn press
Mode switch (hold→toggle)Morse.aiffOn Space press during hold
Recording stopsPop.aiffOn Fn release / toggle stop
Transcription completemacOS notificationAfter paste

File Layout

[home-path]          — 3.8MB arm64 Rust binary
[home-path]
    history.db               — SQLite transcription log
    transcribe-local         — Compiled Swift binary (offline fallback)
    transcribe-local.swift   — Source (auto-compiled on first use)
Desktop/speakd/            — Source code (Cargo project)
    src/
        main.rs              — CLI args, startup, key loading
        hotkey.rs            — CGEvent tap, Fn detection, hold/toggle modes
        audio.rs             — cpal capture, AudioBuffer, mic selection
        transcribe.rs        — 5-tier routing, WAV encoding, punctuation
        paste.rs             — pbcopy + Cmd+V
        history.rs           — SQLite CRUD
        postprocess.rs       — Python knowledge chain (stdin, non-blocking)

Relay Architecture (Mac4)

Desktop/speech-relay-app/          — Swift Package (macOS app)
    SpeechRelayApp/main.swift        — NSApplication.accessory
        ├── TCP server on :9530
        ├── GET  /health → engine info
        └── POST /transcribe → WAV in, JSON out
            ├── SFSpeechRecognizer (primary, stable)
            └── SpeechTranscriber (fallback, Neural Engine)

LaunchAgent: com.speakd.relay (auto-start, KeepAlive)

Failure Modes & Recovery

FailureDetectionRecovery
Mac4 relay down3s connect timeoutSkip to next tier
Mac4 transcription crashHTTP connection resetSkip to cloud
OpenAI API key invalid401 responseSkip to local
OpenAI rate limited429 responseSkip to local
No internetAll cloud timeoutsLocal on-device
Tailscale downAll mesh timeoutsCloud → local
Mic disconnectedEmpty audio buffer"Too short" message
Audio device changecpal error callbackNeeds restart (future: auto-reconnect)
Local binary missingFile checkAuto-compile from embedded Swift source

Latency Budget (typical 5s recording)

StepHold ModeToggle Mode
Fn detection<1ms<1ms
Audio capturereal-timereal-time
Fn release detection~100ms (poll)~100ms
WAV encoding~10ms~10ms
Mac4 transcription~700ms~700ms
Punctuation (nano)~200ms~200ms
Paste (pbcopy+Cmd+V)~50ms~50ms
Total (Mac4 path)~1.1s~1.1s
Total (OpenAI path)~3.5s~3.5s
Total (local path)~2.0s~2.0s

What's Built vs Pending

ComponentStatusNotes
Fn hold-to-recordLIVEPolling fallback for Apple Silicon
Fn+Space toggleLIVEFor long dictation
Mac4 SFSpeechRecognizer relayLIVE0.7s, free, LaunchAgent
Mac4 SpeechTranscriber relayWIREDNeeds model download on Mac4
Mac2 relayNOT BUILTSame architecture, deploy when needed
Mac5 MLX WhisperNOT CONFIGUREDNeed whisper endpoint at :8100
OpenAI WhisperLIVEFallback, $0.006/min
Local on-deviceLIVEAuto-compiles Swift binary
GPT-5.4-nano punctuationLIVEFor unpunctuated sources
History DBLIVESQLite, CLI search
Post-processingLIVEKnowledge chain + smart notes
MenuBar UIPENDINGNext major feature
Voice isolation (noise cancel)PENDINGNeeds AVAudioEngine in Swift layer

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

speakd/ARCHITECTURE.md

Detected Structure

Method · Evaluation · Figures · Code Anchors · Architecture