Back to corpus
research noteexperiment writeup candidatescore 24
Beat Tracking Cache Optimization
The `build_phrase_database_incremental.py` script includes a **beat tracking cache** that significantly speeds up reprocessing of audio files.
Full HTML reader
Read the full artifact
Extracted abstract or opening context
The `build_phrase_database_incremental.py` script includes a **beat tracking cache** that significantly speeds up reprocessing of audio files.
Beat tracking is the most computationally expensive step in the phrase database building pipeline (~270s per file). The cache stores: - Beat times - Downbeat locations - BPM information
- The cache automatically invalidates when a file is modified - Gracefully handles corrupted cache files by recomputing - Cache persists across script runs
| Stage | Without Cache | With Cache | Speedup | |-------|--------------|------------|---------| | **Beat Tracking** | ~270s | <0.5s | **~500x faster** ⚡ | | **Total per file** | ~280-310s | ~10-20s | **~15-30x faster** ⚡ | | **10 files** | ~50 min | ~2-4 min | **~15-25x faster** ⚡ | | **200 files** | ~16-17 hours | ~30-60 min | **~15-30x faster** ⚡ |
Use `--no-cache` when: - Beat tracking algorithm has changed - You suspect cached results are incorrect - Debugging beat tracking issues - First-time processing (no cache exists yet anyway)
Promotion decision
What has to happen next
Attach run IDs, datasets, metrics, and reproduction commands.
Why this is not always a full paper yet
Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.