Grand Diomande Research · Full HTML Reader
speakd v1.0 — Architecture
``` ┌─────────────────────────────────┐ │ Fn Key Pressed │ │ (CGEvent tap + poll detect) │ └────────────────┬────────────────┘ │ ┌────────────────▼────────────────┐ │ 96kHz Audio Capture (cpal) │ │ Built-in mic preferred over │ │ Continuity/iPhone │ └────────────────┬────────────────┘ │ ┌────────────────▼────────────────┐ │ Fn Key Released │ │ (poll detects within 100ms) │ └────────────────┬────────────────┘ │ ┌────────────────▼────────────────┐ │ WAV Encoding (hound crate) │ │ Native rate, 16-bit mono │ └─────────
Full Public Reader
speakd v1.0 — Architecture
Fallback Chain
┌─────────────────────────────────┐
│ Fn Key Pressed │
│ (CGEvent tap + poll detect) │
└────────────────┬────────────────┘
│
┌────────────────▼────────────────┐
│ 96kHz Audio Capture (cpal) │
│ Built-in mic preferred over │
│ Continuity/iPhone │
└────────────────┬────────────────┘
│
┌────────────────▼────────────────┐
│ Fn Key Released │
│ (poll detects within 100ms) │
└────────────────┬────────────────┘
│
┌────────────────▼────────────────┐
│ WAV Encoding (hound crate) │
│ Native rate, 16-bit mono │
└────────────────┬────────────────┘
│
┌─────────────────────▼─────────────────────┐
│ TRANSCRIPTION ROUTER │
│ Tries each tier, first success wins │
│ 3s timeout on mesh, 10s on MLX, 15s cloud │
└─────────────────────┬─────────────────────┘
│
┌────────────────────────────┼────────────────────────────┐
│ │ │
┌────▼────┐ ┌─────▼─────┐ ┌────▼────┐
│ TIER 1 │ fail/timeout │ TIER 2 │ fail/timeout │ TIER 3 │
│ MESH │ ──────────────► │ CLOUD │ ─────────────► │ LOCAL │
│ (LAN) │ │ (Internet) │ │(Offline)│
└────┬────┘ └─────┬─────┘ └────┬────┘
│ │ │
┌────▼────────────┐ ┌─────▼──────────┐ ┌─────▼──────────┐
│ 1a. Mac4 :9530 │ │ 4. OpenAI │ │ 5. Local macOS │
│ SFSpeechRec + │ │ Whisper API │ │ SFSpeechRec │
│ SpeechTranscriber│ │ whisper-1 │ │ on-device only │
│ (0.7s, free) │ │ (2-4s, $0.006) │ │ (1-3s, free) │
└────┬─────────────┘ └────────────────┘ └────────────────┘
│ fail
┌────▼────────────┐
│ 1b. Mac2 :9530 │
│ SpeechRec relay │
│ (not yet built) │
└────┬─────────────┘
│ fail
┌────▼────────────┐
│ 2. Mac5 :8100 │
│ MLX Whisper │
│ (not configured)│
└──────────────────┘Recording Modes
| Mode | Activation | Stop | Use Case |
|---|---|---|---|
| Hold | Hold Fn | Release Fn | Quick dictation (<30s) |
| Toggle | Fn + Space | Press Fn again | Long dictation, meetings |
Post-Transcription Pipeline
Transcription text
│
├── Punctuation pass (if source lacks punctuation)
│ └── GPT-5.4-nano API call (~200ms, <$0.001)
│ Only for: mac4-speech-analyzer, local-on-device
│ Skipped for: openai-whisper (already punctuated)
│
├── pbcopy + Cmd+V paste (immediate)
│
├── SQLite history save ([home-path])
│ └── Fields: text, duration, source, timestamp
│
└── Background post-processing (non-blocking thread)
├── Personal knowledge chain (Desktop/Speak/core/)
└── Smart notes → Apple NotesAudio Feedback
| Event | Sound | Timing |
|---|---|---|
| Recording starts | Tink.aiff | Immediate on Fn press |
| Mode switch (hold→toggle) | Morse.aiff | On Space press during hold |
| Recording stops | Pop.aiff | On Fn release / toggle stop |
| Transcription complete | macOS notification | After paste |
File Layout
[home-path] — 3.8MB arm64 Rust binary
[home-path]
history.db — SQLite transcription log
transcribe-local — Compiled Swift binary (offline fallback)
transcribe-local.swift — Source (auto-compiled on first use)
Desktop/speakd/ — Source code (Cargo project)
src/
main.rs — CLI args, startup, key loading
hotkey.rs — CGEvent tap, Fn detection, hold/toggle modes
audio.rs — cpal capture, AudioBuffer, mic selection
transcribe.rs — 5-tier routing, WAV encoding, punctuation
paste.rs — pbcopy + Cmd+V
history.rs — SQLite CRUD
postprocess.rs — Python knowledge chain (stdin, non-blocking)Relay Architecture (Mac4)
Desktop/speech-relay-app/ — Swift Package (macOS app)
SpeechRelayApp/main.swift — NSApplication.accessory
├── TCP server on :9530
├── GET /health → engine info
└── POST /transcribe → WAV in, JSON out
├── SFSpeechRecognizer (primary, stable)
└── SpeechTranscriber (fallback, Neural Engine)
LaunchAgent: com.speakd.relay (auto-start, KeepAlive)Failure Modes & Recovery
| Failure | Detection | Recovery |
|---|---|---|
| Mac4 relay down | 3s connect timeout | Skip to next tier |
| Mac4 transcription crash | HTTP connection reset | Skip to cloud |
| OpenAI API key invalid | 401 response | Skip to local |
| OpenAI rate limited | 429 response | Skip to local |
| No internet | All cloud timeouts | Local on-device |
| Tailscale down | All mesh timeouts | Cloud → local |
| Mic disconnected | Empty audio buffer | "Too short" message |
| Audio device change | cpal error callback | Needs restart (future: auto-reconnect) |
| Local binary missing | File check | Auto-compile from embedded Swift source |
Latency Budget (typical 5s recording)
| Step | Hold Mode | Toggle Mode |
|---|---|---|
| Fn detection | <1ms | <1ms |
| Audio capture | real-time | real-time |
| Fn release detection | ~100ms (poll) | ~100ms |
| WAV encoding | ~10ms | ~10ms |
| Mac4 transcription | ~700ms | ~700ms |
| Punctuation (nano) | ~200ms | ~200ms |
| Paste (pbcopy+Cmd+V) | ~50ms | ~50ms |
| Total (Mac4 path) | ~1.1s | ~1.1s |
| Total (OpenAI path) | ~3.5s | ~3.5s |
| Total (local path) | ~2.0s | ~2.0s |
What's Built vs Pending
| Component | Status | Notes |
|---|---|---|
| Fn hold-to-record | LIVE | Polling fallback for Apple Silicon |
| Fn+Space toggle | LIVE | For long dictation |
| Mac4 SFSpeechRecognizer relay | LIVE | 0.7s, free, LaunchAgent |
| Mac4 SpeechTranscriber relay | WIRED | Needs model download on Mac4 |
| Mac2 relay | NOT BUILT | Same architecture, deploy when needed |
| Mac5 MLX Whisper | NOT CONFIGURED | Need whisper endpoint at :8100 |
| OpenAI Whisper | LIVE | Fallback, $0.006/min |
| Local on-device | LIVE | Auto-compiles Swift binary |
| GPT-5.4-nano punctuation | LIVE | For unpunctuated sources |
| History DB | LIVE | SQLite, CLI search |
| Post-processing | LIVE | Knowledge chain + smart notes |
| MenuBar UI | PENDING | Next major feature |
| Voice isolation (noise cancel) | PENDING | Needs AVAudioEngine in Swift layer |
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
speakd/ARCHITECTURE.md
Detected Structure
Method · Evaluation · Figures · Code Anchors · Architecture