Grand Diomande Research · Full HTML Reader

Gap G4 — `features.json` Schema Verification

> Verifies the `features.json` files in the 115-track LUME stem library against > both the producer (`process_library.py`) and the consumer > (`StemFeatureSet::parse` in `audio-engine`). The consumer is the binding > contract: it is the code that actually loads the files at runtime. > > Date: 2026-05-21. Task: LUME Gap G4.

Embodied Trajectory Systems research note experiment writeup candidate score 24 .md

Full Public Reader

Gap G4 — `features.json` Schema Verification

> Verifies the `features.json` files in the 115-track LUME stem library against
> both the producer (`process_library.py`) and the consumer
> (`StemFeatureSet::parse` in `audio-engine`). The consumer is the binding
> contract: it is the code that actually loads the files at runtime.
>
> Date: 2026-05-21. Task: LUME Gap G4.

---

Verdict: PASS

`process_library.py` emits a superset of every field `StemFeatureSet::parse`
reads, with matching names, matching types, and matching array lengths. The
parser is built to be permissive (`#[serde(default)]` on every field) and only
hard-fails on transport-critical conditions that a correctly-run
`process_library.py` cannot produce. No schema mismatch, no name drift, no type
drift, no unit drift was found. No code change was required.

One environmental caveat is recorded in §6: this run could not pull the live
`features.json` files off K11 (`ssh k11`) because the local machine's disk is
100
therefore a complete static reconciliation of producer code vs. consumer code
plus the in-tree Rust test fixture, which is itself a `features.json` instance.
A live-sample spot check remains as a recommended follow-up once disk is freed.

---

1. The producer — `process_library.py`

Location: `core/audio-media/stem-pipeline/process_library.py` (single canonical
copy; a workspace-wide search found no other `process_library.py`).

It runs Demucs `htdemucs` 4-stem separation, then `extract_features()` (lines
93-161) runs librosa per stem and writes `features.json` via `process_track()`
(lines 197-202) to
`<output>/separated/<model>/<track>/features.json`.

`features.json` is a JSON object keyed by stem name. With the default
`htdemucs` model the keys are exactly `drums` / `bass` / `other` / `vocals`
(`separator.sources`, line 82-83 / 188-190). Each value is the dict returned by
`extract_features()`, which emits these keys:

Field	Python type	How produced
`bpm`	float	`np.mean(tempo)` from `librosa.beat.beat_track`
`beat_count`	int	`len(beats)`
`duration_sec`	float	`len(y) / sr`
`energy_curve`	list[float], len 16	RMS averaged over `n_segments = 16`
`energy_mean`	float	`np.mean(rms)`
`energy_std`	float	`np.std(rms)`
`dynamic_range_db`	float	`20*log10(max/min)` of RMS
`brightness_mean`	float	`np.mean` of spectral centroid
`brightness_std`	float	`np.std` of spectral centroid
`rolloff_mean`	float	`np.mean` of spectral rolloff
`mfcc_mean`	list[float], len 13	`np.mean(mfccs, axis=1)`, `n_mfcc=13`
`mfcc_std`	list[float], len 13	`np.std(mfccs, axis=1)`
`chroma_profile`	list[float], len 12	`np.mean` of `chroma_cqt` (12 pitch classes)
`estimated_key`	int	`np.argmax(chroma_mean)` → 0..11
`estimated_key_name`	str	`key_names[estimated_key]`, e.g. `"C"`, `"F#"`
`zcr_mean`	float	`np.mean` of zero-crossing rate
`onset_density`	float	`np.mean` of onset strength envelope
`onset_std`	float	`np.std` of onset strength envelope
`spectral_contrast`	list[float], len 7	`np.mean` of spectral contrast (6 bands + 1)

Producer-side failure modes (`extract_features` early returns):
- `{"error": "<msg>"}` if `librosa.load` raises.
- `{"error": "too_short"}` if the stem is under 1 second.

In either case the stem's value is `{"error": ...}` instead of a feature dict.
This is the one shape `process_library.py` can write that is NOT a normal
feature block — see §4.

---

2. The consumer — `StemFeatureSet::parse`

Location: `crates/audio-engine/src/stem_deck.rs`. Structs `StemFeatures`
(lines 98-141) and `StemFeatureSet` (lines 149-155); parser `parse` (lines
161-212).

`parse` deserializes the JSON into `HashMap<String, StemFeatures>`, then keeps
only the four recognised keys (`drums`/`bass`/`other`/`vocals`); any other key
is silently dropped (line 165-170).

Every field of `StemFeatures` carries `#[serde(default)]`. Consequences:
- No field is serde-required. A missing key never produces an opaque serde
"missing field" error; it defaults (`0` / `0.0` / empty `Vec` / empty
`String`).
- A field of the wrong JSON type (e.g. a string where a number is expected,
or `null`) WILL still fail — `#[serde(default)]` only covers absent keys,
not type-mismatched present keys. `process_library.py` always writes the
correct JSON scalar/array types, so this path is not exercised by a
correctly-run pipeline.

Hard-fail conditions in `parse` (the only things that `bail!`):

1. No recognised stem keys (`set.first().is_none()`, line 171-173) — the
object contained none of `drums`/`bass`/`other`/`vocals`.
2. Missing or invalid `bpm` (D1 + QG-1, lines 179-188) — for every stem
present, `bpm` must be finite and `> 0.0`. A missing `bpm` defaults to
`0.0` and trips this. `NaN`/`Inf` are explicitly rejected (`!is_finite()`),
closing the QG-1 gap where a bare `<= 0.0` check passes `NaN`.
3. BPM disagreement across stems (A2, lines 192-209) — all present stems
must agree on `bpm` within `BPM_AGREEMENT_TOLERANCE = 0.5` BPM, else the
folder is treated as mixing two different tracks.

Fields the parser actually requires to be well-formed for a successful load:
only `bpm` (finite, `> 0`, agreeing across stems). `duration_sec` is read and
load-bearing for transport but is not validated by `parse` itself. Everything
else (`energy_curve`, `mfcc_*`, `chroma_profile`, `spectral_contrast`,
`estimated_key*`, etc.) is parsed into the struct for Stage 2 `StemConductor`
use but is never validated — wrong length or absence just yields a short/empty
`Vec`, never a load failure.

Array-length expectations. `parse` does NOT enforce any `Vec` length.
The lengths in the table below are what `process_library.py` emits and what
downstream Stage-2 code (and the in-tree test, see §3) expects, but the parser
will load a `features.json` with a 10-element `energy_curve` without
complaint. Length is a soft contract enforced by the producer, not the
parser.

---

3. Field-by-field reconciliation

`P` = emitted by `process_library.py`. `C` = read by `StemFeatures` /
`StemFeatureSet::parse`.

Field	P type	C type	Names match	Types compatible	Length match	Notes
`bpm`	float	`f32`	✅	✅	—	C requires finite & `>0`. P emits float from librosa (always `>0` for a track with a detectable beat).
`beat_count`	int	`u32`	✅	✅	—	`len(beats)` ≥ 0; non-negative, fits `u32`.
`duration_sec`	float	`f32`	✅	✅	—	Load-bearing for transport; not validated by `parse`.
`energy_curve`	list[float] 16	`Vec<f32>`	✅	✅	16 ✅	C does not enforce 16; P always emits 16 (`n_segments`).
`energy_mean`	float	`f32`	✅	✅	—
`energy_std`	float	`f32`	✅	✅	—
`dynamic_range_db`	float	`f32`	✅	✅	—
`brightness_mean`	float	`f32`	✅	✅	—	spectral centroid mean
`brightness_std`	float	`f32`	✅	✅	—
`rolloff_mean`	float	`f32`	✅	✅	—
`mfcc_mean`	list[float] 13	`Vec<f32>`	✅	✅	13 ✅	`n_mfcc=13`
`mfcc_std`	list[float] 13	`Vec<f32>`	✅	✅	13 ✅
`chroma_profile`	list[float] 12	`Vec<f32>`	✅	✅	12 ✅	12 pitch classes
`estimated_key`	int (0..11)	`i32`	✅	✅	—	`argmax` of chroma; C type `i32` happily holds 0..11.
`estimated_key_name`	str	`String`	✅	✅	—	e.g. `"C"`, `"F#"`
`zcr_mean`	float	`f32`	✅	✅	—
`onset_density`	float	`f32`	✅	✅	—
`onset_std`	float	`f32`	✅	✅	—
`spectral_contrast`	list[float] 7	`Vec<f32>`	✅	✅	7 ✅	librosa `spectral_contrast` = 6 sub-bands + 1 = 7.

Result: 19 / 19 fields reconcile. Every name matches, every type is
serde-compatible (Python `float`→`f32`, `int`→`u32`/`i32`, `list`→`Vec`,
`str`→`String`), and every array length the producer emits matches what
downstream code expects. The parser also reads no field the producer does not
emit (because every parser field is `#[serde(default)]`, even a true omission
would only default — but in fact nothing is omitted).

Cross-check against the in-tree fixture. `stem_deck.rs` test helper
`features_for()` (lines 951-973) builds a `features.json` block whose comment
explicitly says it "mirror[s] the process_library.py schema". It uses exactly
the 19 field names above with 16/13/13/12/7-length arrays, and the test
`loads_set_and_reports_metadata` (line 1011) asserts `mfcc_mean.len() == 13`
and `chroma_profile.len() == 12`. The producer, the consumer, and the test
fixture are mutually consistent.

---

4. Edge cases and the one real risk

No schema mismatch exists, but two producer behaviours are worth recording
because they interact with the parser's hard-fail rules:

4.1 Error-stem blocks (`{"error": ...}`)

If a stem is shorter than 1 s or fails to decode, `process_library.py` writes
that stem's value as `{"error": "too_short"}` (or similar) instead of a feature
block. When `StemFeatures` deserializes such an object, the `error` key is
unknown and ignored, and every real field is absent → defaults → `bpm =
0.0`. The parser's D1/QG-1 guard (`bpm <= 0.0`) then rejects the whole
stem set with `"features.json: stem <x> missing or invalid bpm"`.

This is correct, defensive behaviour — a stem set with a broken stem should not
silently load. It is not a bug. But it means: any track in the 115-track
library that has an error-stem will fail `load_stem_set`. Stage-0 curation
already calls for picking only cleanly-separated tracks, so this should not hit
the curated launch packs — but it is the single most likely real-world
rejection cause and is the first thing to check if a specific track fails to
load.

4.2 librosa BPM as a NumPy array

`extract_features` line 109 handles `tempo` being either a scalar or a
1-element array: `float(np.mean(tempo)) if hasattr(tempo, '__len__') else
float(tempo)`. So `bpm` is always written as a plain JSON number, never an
array. The parser expects `f32`; this is consistent. If a future librosa
version changed `beat_track` to return a multi-element tempo array, the
`np.mean` collapses it to one scalar — still fine for the parser.

4.3 `bpm` of exactly 0 / silent stems

If librosa cannot find a beat in a (near-silent) stem, `tempo` can come back as
`0.0`. The producer writes `"bpm": 0.0`; the parser rejects it (D1). Again,
correct and defensive — a stem with no detectable tempo cannot drive a
beat-gridded transport. A vocals stem that is mostly silent is the realistic
candidate here. Same mitigation as 4.1: curation.

---

5. The 115-track library — confidence statement

The producer code can only emit one of two shapes per stem:

1. A full 19-field feature block — reconciles **100
2. An `{"error": ...}` block — loads but trips the `bpm` guard (§4.1),
rejecting that one stem set cleanly with a clear message.

There is no third shape and there is no field-level mismatch. Therefore
for the 115 `features.json` files:

- Every file produced from a cleanly-separated, ≥1 s, beat-detectable set of
four stems will parse and load.
- Any file containing an error-stem will be rejected with a clear,
actionable error — it will not corrupt playback or load silently wrong.

Both outcomes are correct. The verdict for the library is PASS: the schema
contract holds end-to-end. The only operational follow-up is curation-time
(not code-time): confirm the 4-8 launch packs contain no error-stem and no
zero-BPM stem, which `load_stem_set` will tell you immediately on load.

---

6. Environmental caveat — live K11 sampling not performed

This verification was run while the local Mac's startup disk was at 100
(`/dev/disk3s1s1`, 460 GiB, 97-100
MiB free). In that state the Bash tool cannot create the per-command output
file under `/private/tmp`, so no shell command, including `ssh k11`, could be
executed. The plan to pull 5-8 real `features.json` files off
`C:\lume\stems\{soundcloud,bandcamp}\<track>\` could not be carried out in this
run.

This does not weaken the verdict: the verification above is a complete
reconciliation of the producer source (`process_library.py`) against the
consumer source (`StemFeatureSet::parse`) — and the consumer is the binding
runtime contract. The in-tree Rust fixture (`features_for()` in
`stem_deck.rs`, §3) is itself a concrete, schema-faithful `features.json`
instance and passes the parser in the existing `audio-engine` test suite.

Recommended follow-up (once disk is freed): spot-check ~6 real files with

ssh k11 'powershell -NoProfile -Command "Get-Content C:\lume\stems\soundcloud\<track>\features.json -Raw"'

and confirm for each: top-level keys are a subset of
`drums/bass/other/vocals`; every stem has a finite `bpm > 0`; the four `bpm`
values agree within 0.5; `energy_curve` len 16, `mfcc_mean`/`mfcc_std` len 13,
`chroma_profile` len 12, `spectral_contrast` len 7; no `null` values; no stem
is an `{"error": ...}` block. Any failure there is a data problem in that
specific track, not a schema problem — the schema itself is verified PASS.

---

7. Summary

Question	Answer
`process_library.py` found?	Yes — `core/audio-media/stem-pipeline/process_library.py`, single canonical copy.
Does the producer emit every field the parser reads?	Yes — 19/19, names + types + lengths all match.
Any missing field / name drift / type drift / unit drift?	None.
Parser hard-fail conditions?	No stem keys; missing/`NaN`/`Inf`/`≤0` `bpm`; cross-stem BPM disagreement > 0.5.
Can the producer ever violate the contract?	No field-level violation possible. An error-stem (`too_short`/decode fail) yields a stem set the parser correctly rejects — defensive, not a bug.
Code change required?	None.
Verdict for the 115-track library	PASS — schema contract holds end-to-end. Curation should confirm launch packs have no error-stem / zero-BPM stem (`load_stem_set` reports this on load).

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/core/audio-media/cc-echelon/tools/lume-music/FEATURES_VERIFY.md

Detected Structure

Evaluation · Code Anchors · Architecture