Grand Diomande Research · Full HTML Reader

CC-Speak

- **Lock-free audio capture** using CPAL - **Real-time RMS/peak metering** for UI feedback - **WAV encoding** with quality metrics (SNR, clipping detection) - **Python bindings** via PyO3

Embodied Trajectory Systems research note experiment writeup candidate score 18 .md

Full Public Reader

CC-Speak

High-performance Rust audio capture engine for voice-to-text workflows.

Features

  • Lock-free audio capture using CPAL
  • Real-time RMS/peak metering for UI feedback
  • WAV encoding with quality metrics (SNR, clipping detection)
  • Python bindings via PyO3

Performance Targets

MetricPython (PyAudio)Rust (CC-Speak)
Recording start latency~100ms<10ms
WAV encoding time~50ms<5ms
Total to clipboard~300ms<50ms

Building

Rust Library Only

bash
cd core/cc-speak
cargo build --release

Python Extension

bash
cd core/cc-speak

# Development build (for testing)
maturin develop --features python

# Release build (optimized)
maturin build --release --features python

Usage

Rust

rust
use cc_speak::{CaptureSession, CaptureConfig};

let config = CaptureConfig::default();
let session = CaptureSession::new(config)?;

// Start recording
let handle = session.start_manual()?;

// ... wait for stop signal ...

// Stop and get result
let result = handle.stop();
println!("Duration: {}ms, Quality: {:.2}",
    result.duration_ms,
    result.audio_meta.quality_score
);

// WAV bytes ready for Whisper API
let wav_data = result.wav_data;

Python

python
from cc_speak import SpeakEngine

engine = SpeakEngine()
handle = engine.start_manual()

# Wait for user to stop
input("Press Enter to stop recording...")

result = handle.stop()
print(f"Duration: {result.duration_ms}ms")
print(f"Quality: {result.audio_meta.quality_score:.2f}")

# Get WAV bytes for Whisper API
wav_bytes = result.wav_bytes()

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                         CC-SPEAK AUDIO ENGINE                                │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────────────────┐
│  CPAL Backend   │────▶│  Audio Thread   │────▶│  Sample Buffer              │
│  (Mic Input)    │     │  (Real-time)    │     │  (Mutex<Vec<f32>>)          │
└─────────────────┘     └─────────────────┘     └──────────────┬──────────────┘
                                                               │
                        ┌──────────────────────────────────────┘
                        ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         CAPTURE COORDINATOR                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                         │
│  │   Metering  │  │ WAV Encoder │  │   Quality   │                         │
│  │  (RMS, Peak)│  │  (hound)    │  │  Analyzer   │                         │
│  └─────────────┘  └─────────────┘  └─────────────┘                         │
└─────────────────────────────────────────────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         OUTPUT                                               │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────────────────┐  │
│  │ WAV Bytes       │  │ Duration        │  │ AudioMetadata               │  │
│  │ (for Whisper)   │  │ (ms)            │  │ (RMS, SNR, quality_score)   │  │
│  └─────────────────┘  └─────────────────┘  └─────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────┘

Quality Metrics

MetricDescription
`rms_energy`Root mean square (average loudness) 0.0-1.0
`peak_amplitude`Maximum sample value 0.0-1.0
`snr_db`Estimated signal-to-noise ratio in dB
`has_clipping`Whether audio clipping was detected
`quality_score`Composite quality score 0.0-1.0

Quality Score Interpretation

ScoreLevelDescription
0.9+ExcellentStudio quality, clear speech
0.7-0.9GoodClear speech, minor noise
0.5-0.7AcceptableSome noise, speech intelligible
0.3-0.5PoorSignificant noise
<0.3RejectToo noisy for training

Dependencies

  • cpal - Cross-platform audio I/O
  • hound - WAV encoding
  • pyo3 - Python bindings (optional)
  • cc-core-rs - Core infrastructure

License

UNLICENSED - Internal use only

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/core/audio-media/cc-speak/README.md

Detected Structure

Method · Evaluation · Architecture