Grand Diomande Research · Full HTML Reader

LUME Music Direction — Conclusive Architecture Decision

> Research report, 2026-05-20. Author: research-engine. > Question settled: how should the LUME bar produce music that responds to body > motion AND sounds genuinely good (venue quality)? > > **Decision: stem-based interactive playback.** Body motion controls real, > professionally produced 4-stem Demucs sets — layering, crossfading, filtering, > FX, beat-synced triggering — instead of synthesizing music from scratch.

Embodied Trajectory Systems architecture technical paper candidate score 54 .md

Full Public Reader

LUME Music Direction — Conclusive Architecture Decision

> Research report, 2026-05-20. Author: research-engine.
> Question settled: how should the LUME bar produce music that responds to body
> motion AND sounds genuinely good (venue quality)?
>
> Decision: stem-based interactive playback. Body motion controls real,
> professionally produced 4-stem Demucs sets — layering, crossfading, filtering,
> FX, beat-synced triggering — instead of synthesizing music from scratch.

---

1. Executive summary

The live Strudel path is failing for a structural reason, not a tuning reason.
Generating good-sounding music live from scratch is the hardest possible way to
solve this problem, and two sound-design passes have already confirmed it does
not get there. The offline `motion_score_composer.py` only sounds good because it
is painstakingly handcrafted additive synthesis with no real-time constraints —
porting that to a streaming synth (its memory file's "Path B") inherits all of
that handcrafting cost and still produces synthesized, not produced, audio.

The breakthrough is on HD1. Mohamed already has a professionally produced music
library that has already been stem-separated with Demucs htdemucs into clean
4-stem sets. The source material is professional. The separation work is done.
The instant any stem plays, it sounds like real music — because it is real
music. The body's job is no longer to invent music (impossible to do well live)
but to perform music that is already good (a solved, DJ-shaped problem). Mohamed
is a DJ; rekordbox is on K11; this is the natural fit.

The Comp-Core `audio-engine` Rust crate already contains the runtime primitives
for this: a `DeckPlayer` buffer-playback node with seek + looping, a `hound` WAV
loader, `FilterFx` / `DelayFx` / `ReverbFx` with beat-synced delay, a `MotionSynth`
mod-matrix, a `VoiceManager`, and `link-clock` (Ableton Link). The `cc-dj-auto`
crate has a full automatic-mixer (`mixer.rs`, `transition.rs`, `analyzer.rs`).
We are not building a music engine from zero — we are wiring an existing one to
the existing 128-dynamics body signal.

---

2. The key finding — HD1

HD1 is a 460 GB drive, **97
(not Mac1). SSH shells cannot read it: macOS Full Disk Access blocks `sshd` and
`osascript do shell script`. The directory listing is readable via Finder
AppleScript over SSH (Finder has FDA); file contents are not, until a human
copies them off HD1 in Terminal.app (which has FDA). This is the same gate
recorded in `san-training-data-2026-05-19.md`.

What is on HD1 (verified by directory listing):

Path	What it is	Why it matters
`Bandcamp-Downloads/`	~53 produced tracks as `.wav` (high quality) + `stem-output/`	Source material — bass/club/electro, Mohamed's actual taste
`Bandcamp-Downloads/stem-output/separated/htdemucs/`	~53 tracks, each split into `drums.wav` + `bass.wav` + `other.wav` + `vocals.wav` + `features.json`	THE ASSET. Stem separation already done.
`SoundCloud/`	~60+ produced tracks (`.mp3`) + `stem-output/`	A second, larger stem-output set (ghetto-tech / footwork / bass)
`SoundCloud/stem-output/separated/htdemucs/`	Another full htdemucs 4-stem set + `library_profile.json`	More stem inventory
`motionmix-training/`	4 rated playlists, per-track `.npz` librosa features, `ratings.json`	A curated, Mohamed-rated DJ library with extracted features
`stem_search.py`, `stem_phrase_factory.py` (HD1 root)	Existing stem-search + phrase-factory tooling	Prior stem work already exists — not greenfield
`bandcamp_stem_pipeline.py` (in stem-output)	The pipeline that produced the stems	The stem pipeline is reproducible

The single most important finding: the expensive, slow part of a stem-based
system — running Demucs over a whole library — is already done. HD1 holds
~100+ professionally produced tracks already separated into clean 4-stem Demucs
sets, each with a `features.json` (BPM, key, energy curve, MFCC, chroma, onset
density — see `process_library.py` for the exact schema). A stem-driven LUME does
not start from zero; it starts from a finished, feature-tagged stem library.

The local Comp-Core copy of the pipeline is
`core/audio-media/stem-pipeline/process_library.py` — it documents the exact
per-stem `features.json` schema and supports `htdemucs_6s` (6-stem: adds separate
guitar + piano) if a future pass wants finer control.

---

3. The pipeline as it stands today

K11 Femto Bolt → MediaPipe pose (33 landmarks)
   → raw-33 pose JSON UDP → [ip]:9705
   → femto-bridge encode_jsonl (Rust, stateful, long-lived subprocess)
   → dynamics_128 vector @ 30 fps   ← SOLID. Stays as the input. Do not touch.
   → motion_composer_realtime.py    ← causal feature→music mapping. KEEP the mapping.
   → StrudelDriver → strudel_live.html → browser Web Audio   ← THE WEAK LINK.

The 128-dynamics signal and the causal feature extractor
(`RealtimeMotionComposer` / `StreamingFeatureExtractor`) are good engineering and
are kept verbatim. The failure is purely in the last stage: *what makes
sound*. Strudel synthesizing from raw oscillators in a browser cannot reach venue
quality, and "buzzing / obnoxious" is the expected outcome of live from-scratch
synthesis, not a bug to be tuned away.

---

4. Methodology comparison

Scored 1–10 (10 = best). "Sound quality" is the dominant axis — it is the stated
problem.

Methodology	Sound quality	Build difficulty (10=easy)	Liveness / latency	Fit: K11 + rekordbox + 128-dynamics	Verdict
Live synthesis (Strudel) — current path	3 — synthesized, browser oscillators, two passes failed	7 — already built	7 — phrase-boundary re-eval, glitch-free but coarse	5 — browser, not rekordbox-native; audio leaves the Rust stack	Reject. Structural ceiling. Proven not good enough.
Offline composer → Rust streaming synth ("Path B")	6 — same handcrafted additive synthesis, still synthesized not produced; risk of regressing the offline quality when made causal	3 — reimplement `render_score` as a block callback; hard to keep it sounding as good causally	8 — native, low latency	6 — native Rust, but still a synth voice into rekordbox	Reject as primary. Months of work to land below the quality of playing real stems. Keep the mapping, discard the synthesis.
Stem-based interactive playback — body controls real produced stems	9 — it is real produced music; sounds good the instant a stem plays	6 — stems already separated; `audio-engine` already has `DeckPlayer` + WAV loader + FX	8 — sample playback + parameter smoothing is the lowest-latency thing an audio engine does	9 — DJ-shaped, rekordbox-native, consumes 128-dynamics directly as control	RECOMMENDED.
Sample/loop generative (smart-DJ) — beat-matched loops triggered by motion	8 — real audio, but loop-cut artifacts if not beat-locked	5 — needs beat grid + loop-slicing + Link sync	8 — native	8 — very DJ-native; `cc-dj-auto` mixer already exists	Strong. Folds in as Stage 4 of the recommended path (loop/phrase layer on top of stems).
SAN neural (body→music net)	unknown — untrained; V6 still in data-prep	1 — not usable now, blocked on training	n/a	n/a	Defer. Off the launch critical path, exactly as the memory's "gets-smarter track" says.
Hybrid: stem bed + generative/synth accents	9 — produced bed guarantees the floor; accents add motion expressivity	4 — both subsystems must exist first	8	9	This is the end-state. Reached incrementally — Stages 1–3 are the stem bed, Stages 4–5 layer accents.

Why stem-based wins, stated plainly

1. It removes the impossible requirement. "Generate good music live from
scratch" is deleted from the problem. The music is pre-produced. Motion
performs it. This is the root-cause fix; everything else fights the symptom.
2. The asset already exists and the slow work is done. ~100+ tracks,
already Demucs-separated, already feature-tagged, on HD1. No separation
compute on the critical path.
3. The runtime already exists. `audio-engine` has `DeckPlayer`
(buffer playback, seek, loop), `loader::load_wav`, `FilterFx`, `DelayFx`
(BPM-synced), `ReverbFx`, `VoiceManager`, `MotionSynth` mod-matrix, and
`link-clock`. `cc-dj-auto` has `mixer.rs` / `transition.rs` / `analyzer.rs`.
4. It is DJ-shaped. Mohamed is a DJ. Vertical stem layering, filter sweeps,
FX throws, beat-synced crossfades — this is the literal vocabulary of modern
stem DJing (Serato Stems, rekordbox 7 active pads). The control metaphor is
one he already owns.
5. The 128-dynamics signal maps onto it cleanly. The body features are
control signals, and a stem mixer is a control surface. See §6.
6. It degrades gracefully. Worst case, a stem set just plays as a loop —
still produced music, still sounds good. The Strudel path's worst case is
silence or buzzing.

The honest counterargument (contrarian check)

*"Stem playback is just a fancy loop player — it is not really generative, the
visitor is not really 'creating' music, and it may feel less novel than a neural
body→music net."*

Response: this is true and it does not matter for the launch goal. The stated
problem is "music that responds to movement AND sounds genuinely good — venue
quality." A visitor standing in front of the camera and hearing a club track
build, open up, filter, and drop in response to how they move is
unmistakably interactive and unmistakably good-sounding. The "true generation"
ambition is real but it belongs to SAN — and SAN can later be trained with the
stem system as its quality teacher (clean targets), exactly as the memory's
distillation idea proposes. Stem-based is the launch architecture; neural is the
upgrade. Shipping a venue-quality interactive system now beats shipping a
"truly generative" system that buzzes.

---

5. Recommended architecture

Stem-based interactive playback engine, built in `cc-echelon/audio-engine`,
driven by the existing 128-dynamics body signal, running on K11, routed into
rekordbox.

K11 Femto Bolt → MediaPipe → raw-33 pose JSON → :9705
  → femto-bridge encode_jsonl (KEEP — long-lived subprocess)
  → dynamics_128 @ 30 fps (KEEP — solid input)
  → StreamingFeatureExtractor  (KEEP — causal feature mapping from
       motion_composer_realtime.py: energy, wrist, openness, extension,
       verticality, breath, anticipation, release + accent/open/release peaks)
  → StemConductor  (NEW — replaces StrudelDriver)
       maps features → stem mix state: per-stem gain, filter cutoff,
       FX sends, loop region, crossfade target
  → audio-engine StemDeck graph (NEW node built on existing DeckPlayer)
       4 DeckPlayers (drums/bass/other/vocals) sample-locked + beat-aligned
       via link-clock, summed through FilterFx/DelayFx/ReverbFx
  → cpal audio out on K11
  → VB-Audio Virtual Cable → rekordbox external-input deck

Key design rules:

- The 128-dynamics input is frozen. No change to pose capture, the encoder,
or the femto-bridge subprocess. This research does not re-litigate a working
pipeline.
- The causal feature layer is reused verbatim. `StreamingFeatureExtractor`,
`RollingNormalizer`, `OnlinePeakDetector` from `motion_composer_realtime.py`
are kept exactly. They already turn raw 128-floats into smoothed, normalized,
causal features with motion events. That work is done and verified.
- `StrudelDriver` is replaced, not extended. The thing it drove (Strudel)
is the failure. The new sink is `StemConductor` → `StemDeck`.
- The stem playback engine is Rust, in `audio-engine`. It extends the
existing `DeckPlayer` rather than introducing a new audio runtime. Native
`cpal` output, sample-accurate, lowest possible latency. No browser.
- rekordbox stays the front-of-house mixer. The engine outputs finished
stereo audio into a rekordbox deck via VB-Audio Virtual Cable (the routing
already settled in `lume-sensor-capture-architecture-2026-05-19.md`).
rekordbox is not a synth and is not asked to be one.
- Stem sets are content, loaded from disk. A LUME stem pack is a directory
of `{drums,bass,other,vocals}.wav` + the `features.json` (BPM/key/energy) the
existing pipeline already emits. The bar curates a handful of sets; switching
sets is a content decision, not a code change.

How the body controls the music (the mapping)

The 128-dynamics features become a stem control surface. This is the
`StemConductor` mapping, and it reuses the intent of the offline composer's
feature→music rules — just pointed at stems instead of sine waves:

Body feature	Stem-engine control	Musical result
`energy` (smoothed)	Vertical layering: low energy = drums+bass only; rising energy fades in `other` then `vocals`	Stillness = minimal groove; movement = full track builds
`openness` / `extension`	Low-pass filter cutoff on the full mix (the existing `FilterFx`)	Wide-open body = bright, open sound; closed = filtered-down
`verticality`	Filter resonance + high-shelf; reverb send	Lifted body = brighter, more air
`wrist` speed	FX send (`DelayFx` throw, beat-synced)	Fast hands = delay throws / echo trails
`breath`	Reverb mix / sub presence (slow)	Slow swell texture
`accent` peak event	Beat-quantized loop re-trigger / stutter on the next bar grid	Sharp moves punch the groove on-beat
`open` peak event	Crossfade toward a brighter or higher-energy stem set	Big open gestures change the section
`release` peak event	FX tail-out / filter sweep down + drop back to the drum+bass core	Settling = the track breathes back down
`anticipation` (rising energy gradient)	Pre-load the next layer / build a riser	Build-up before a peak

Two crucial properties this inherits for free:

- It always sounds in time. Stems are beat-gridded; `link-clock` keeps the
4 `DeckPlayer`s sample-locked. Layering and re-triggering happen on the bar
grid, so motion never produces an off-beat mess.
- It always sounds good. Every layer is a professionally produced stem.
The worst a motionless visitor can do is hear a clean drum+bass loop. There
is no "buzzing" failure mode because nothing is synthesized.

---

6. Staged plan — each stage independently validatable

The end goal is explicit: a LUME visitor stands in front of the K11 Bolt, and
their movement performs a professionally produced track in real time — building
it, opening it up, filtering and throwing FX, dropping it back down — at venue
sound quality, routed through rekordbox.

### Stage 0 — Unlock HD1 and curate the launch stem packs (human-gated, ~30 min)
- Mohamed, in Terminal.app on Mac4 (has Full Disk Access), copies the stem
library off HD1 to a non-FDA-restricted location, then onto K11:

  cp -R "/Volumes/HD1/Bandcamp-Downloads/stem-output/separated/htdemucs" Desktop/lume-stems-bandcamp
  cp -R "/Volumes/HD1/SoundCloud/stem-output/separated/htdemucs"        Desktop/lume-stems-soundcloud

(Also grab the `features.json` files — they ride along inside each track dir.)
- Pick 4–8 stem sets for launch. Selection criteria: tracks Mohamed loves,
clean separation, a spread of energy/BPM. The `motionmix-training/ratings.json`
and the per-track `features.json` (BPM, key, energy curve) inform the picks.
- Validates when: the chosen `{drums,bass,other,vocals}.wav` sets + their
`features.json` are on K11 at a known path, e.g. `C:\lume\stems\<set>\`.
- *This is the only step that requires Mohamed and it is the gate everything
else depends on. It is identical in nature to the existing
`build_v5_pairs.py` HD1 gate already documented in memory.*

### Stage 1 — `StemDeck` node in `audio-engine` (Rust, ~2–3 days)
- New `audio-engine` node: `StemDeck` = 4 internal `DeckPlayer`s
(drums/bass/other/vocals) loaded from one stem set via the existing
`loader::load_wav`, sample-locked, summed.
- Per-stem gain with `param_control::SmoothParameter` (no zipper noise).
- One shared `FilterFx` + `DelayFx` + `ReverbFx` on the sum (all already exist).
- Looping over a bar-quantized region using `DeckCommand::SetLooping` /
`Seek` (already in `nodes.rs`).
- Validates when: an `audio-engine` example (`examples/stem_deck.rs`) loads
one stem set, plays it, and a scripted gain/filter automation produces an
audible build-up + filter sweep on real hardware. Pure audio test, no motion.

### Stage 2 — `StemConductor`: 128-dynamics → stem mix state (~2–3 days)
- New module that consumes `StreamingFeatureExtractor` output (the kept causal
feature layer) and emits a `StemMixState` (per-stem gain, filter cutoff, FX
sends, loop region, crossfade target) per the §5 mapping table.
- Continuous features → smoothed parameters every frame; peak events
(accent/open/release) → bar-quantized actions via `link-clock`.
- Validates when: replaying a recorded `dynamics_128.npy` (the IMG_4241 /
IMG_4243 files already on HD2) through `StemConductor` → `StemDeck` renders an
offline WAV where the music demonstrably builds with energy and filters with
openness. Same replay-verification discipline already used for the realtime
composer — bit-reproducible, no live hardware needed.

### Stage 3 — Live on K11, end to end (~2–3 days)
- Wire `StemConductor` to the live `:9705` pose feed (the bridge already exists
in `motion_to_music_live.py`; swap the `StrudelDriver` sink for a thin client
that feeds the Rust `StemConductor`, or move the bridge into the Rust binary).
- `cpal` output on K11 → VB-Audio Virtual Cable → rekordbox external deck.
- Validates when: Mohamed stands in front of the K11 Bolt, moves, and hears
a produced track build/open/filter/drop in response — venue quality, in
rekordbox. This is the launch acceptance test for the music half.
- Install a `LUME-Music` NSSM autostart service (mirrors `LUME-BoltSkeleton`),
replacing the Strudel-based service plan in the memory file.

### Stage 4 — Crossfade + multi-set + loop layer (smart-DJ, ~1 week)
- `open` peak events crossfade between stem sets, beat- and key-matched using
each set's `features.json` (BPM/key) and the existing `cc-dj-auto` mixer /
`transition.rs` logic.
- Add a loop/phrase layer: motion can trigger beat-matched loops/phrases sliced
from stems (this is the "sample/loop generative" methodology folded in).
- Validates when: sustained interaction moves through multiple tracks with
smooth, in-key, on-beat transitions — no audible seams.

### Stage 5 — Hybrid accent layer + SAN teacher hook (upgrade track)
- Layer the kept `MotionSynth` (the existing Rust mod-matrix synth) as a thin
accent voice on top of the stem bed — body-triggered stabs/risers that the
produced stems cannot provide. The produced stems guarantee the quality floor;
the synth adds expressivity. This is the hybrid end-state in the comparison
table.
- The `StemConductor`'s motion→mix decisions become clean training targets:
SAN V6 can be distilled to imitate them (the memory's "composer as teacher"
idea, now with a better teacher than the offline composer).
- Validates when: accent layer adds motion expressivity without muddying the
mix; and a SAN training run consumes `StemConductor` decision logs as targets.

---

7. What to stop doing

- Stop tuning Strudel. Two sound-design passes have proven the ceiling.
The `strudel_live.html` / `StrudelDriver` path should be retired once Stage 3
lands. (Keep `motion_composer_realtime.py`'s feature layer — that part is good
and is reused.)
- Do not pursue "Path B" (offline composer → Rust streaming synth) as the
primary path. It is months of work to land below stem playback on sound
quality. Its `MotionSynth` engine is still valuable — as the Stage 5 accent
layer, not as the main voice.
- Do not wait for SAN. It is correctly on the gets-smarter track. The bar
launches on stems.

8. Risks and honest gaps

- HD1 access is a hard human gate (Stage 0). Nothing ships until Mohamed
copies the stems off HD1 in Terminal.app. This is unavoidable — it is a macOS
FDA restriction, the same one already blocking `build_v5_pairs.py`.
- Demucs separation quality varies. htdemucs 4-stem is good but not perfect;
bleed between `other` and `vocals` is common. Mitigation: Stage 0 curation
picks cleanly separated tracks; `htdemucs_6s` is available in
`process_library.py` if a finer re-pass is wanted later.
- Beat-grid accuracy. Layering only sounds tight if each stem set has an
accurate BPM + downbeat. The `features.json` has librosa `bpm` + `beat_count`;
downbeat may need a manual nudge per set during Stage 0 curation. Low effort,
one-time per track.
- Stem count at launch is small. 4–8 sets is a deliberately small launch
catalog. That is fine — depth of interaction per track matters more than
catalog size for a bar installation. More sets are pure content adds later.
- Copyright. The HD1 library is downloaded/produced music. For a public
venue this is a licensing question for Mohamed to resolve (it is a DJ-set
question, not an engineering one) — flagged here, not solved here.

---

9. Reusable assets inventory (what already exists)

Asset	Location	Role in the recommended architecture
`dynamics_128` encoder	`crates/femto-bridge` (`encode_jsonl`)	KEEP — the body input, unchanged
Causal feature extractor + peak detectors	`tools/lume-music/motion_composer_realtime.py`	KEEP — reused verbatim as the feature layer
Live pose bridge	`tools/lume-music/motion_to_music_live.py`	KEEP plumbing; swap the Strudel sink for `StemConductor`
`DeckPlayer` (buffer playback, seek, loop)	`audio-engine/src/nodes.rs`	Base of the new `StemDeck` node
WAV loader	`audio-engine/src/loader.rs` (`hound`)	Loads stem `.wav` files
`FilterFx` / `DelayFx` (BPM-synced) / `ReverbFx`	`audio-engine/src/fx.rs`	The motion-controlled FX chain
`SmoothParameter` / `ParameterController`	`audio-engine/src/param_control.rs`	Zipper-free per-frame parameter automation
`link-clock` (Ableton Link)	`crates/link-clock`	Beat grid / sample-lock across the 4 stem decks
`cc-dj-auto` mixer / transition / analyzer	`cc-dj/crates/cc-dj-auto`	Stage 4 beat- and key-matched crossfades
`MotionSynth` mod-matrix synth	`audio-engine/src/synth/echelon_integration.rs`	Stage 5 hybrid accent layer
rekordbox / serato bridges	`cc-dj/crates/cc-dj-control/src/bridge/`	Reference for the rekordbox routing path
Stem pipeline	`core/audio-media/stem-pipeline/process_library.py`	Re-run / `htdemucs_6s` re-pass; defines `features.json` schema
Stem library (~100+ tracks, pre-separated)	HD1 `Bandcamp-Downloads` + `SoundCloud` `stem-output/`	The content. The asset that makes this the right architecture.

---

10. Bottom line

Stop synthesizing music live. Mohamed already has a professionally produced,
already-stem-separated music library on HD1 and a DJ's instinct for performing
it. The LUME bar should perform produced stems with the body, not *invent music
from scratch*. The 128-dynamics signal is a control surface; the
`cc-echelon/audio-engine` already has the playback and FX primitives; rekordbox
is the front-of-house mixer. Build the `StemDeck` + `StemConductor`, validate
each stage independently, and the music half of LUME reaches venue quality —
because the source material already is.

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

Comp-Core/core/audio-media/cc-echelon/tools/lume-music/MUSIC_DIRECTION.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture