Beat Tracking Cache Optimization

Full HTML reader

Read the full artifact

Extracted abstract or opening context

The `build_phrase_database_incremental.py` script includes a **beat tracking cache** that significantly speeds up reprocessing of audio files. Beat tracking is the most computationally expensive step in the phrase database building pipeline (~270s per file). The cache stores: - Beat times - Downbeat locations - BPM information - The cache automatically invalidates when a file is modified - Gracefully handles corrupted cache files by recomputing - Cache persists across script runs | Stage | Without Cache | With Cache | Speedup | |-------|--------------|------------|---------| | **Beat Tracking** | ~270s | <0.5s | **~500x faster** ⚡ | | **Total per file** | ~280-310s | ~10-20s | **~15-30x faster** ⚡ | | **10 files** | ~50 min | ~2-4 min | **~15-25x faster** ⚡ | | **200 files** | ~16-17 hours | ~30-60 min | **~15-30x faster** ⚡ | Use `--no-cache` when: - Beat tracking algorithm has changed - You suspect cached results are incorrect - Debugging beat tracking issues - First-time processing (no cache exists yet anyway)

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.