LUME Music Direction — Conclusive Architecture Decision
> Research report, 2026-05-20. Author: research-engine. > Question settled: how should the LUME bar produce music that responds to body > motion AND sounds genuinely good (venue quality)? > > **Decision: stem-based interactive playback.** Body motion controls real, > professionally produced 4-stem Demucs sets — layering, crossfading, filtering, > FX, beat-synced triggering — instead of synthesizing music from scratch.
Full Public Reader
LUME Music Direction — Conclusive Architecture Decision
> Research report, 2026-05-20. Author: research-engine.
> Question settled: how should the LUME bar produce music that responds to body
> motion AND sounds genuinely good (venue quality)?
>
> Decision: stem-based interactive playback. Body motion controls real,
> professionally produced 4-stem Demucs sets — layering, crossfading, filtering,
> FX, beat-synced triggering — instead of synthesizing music from scratch.
---
1. Executive summary
The live Strudel path is failing for a structural reason, not a tuning reason.
Generating good-sounding music live from scratch is the hardest possible way to
solve this problem, and two sound-design passes have already confirmed it does
not get there. The offline `motion_score_composer.py` only sounds good because it
is painstakingly handcrafted additive synthesis with no real-time constraints —
porting that to a streaming synth (its memory file's "Path B") inherits all of
that handcrafting cost and still produces synthesized, not produced, audio.
The breakthrough is on HD1. Mohamed already has a professionally produced music
library that has already been stem-separated with Demucs htdemucs into clean
4-stem sets. The source material is professional. The separation work is done.
The instant any stem plays, it sounds like real music — because it is real
music. The body's job is no longer to invent music (impossible to do well live)
but to perform music that is already good (a solved, DJ-shaped problem). Mohamed
is a DJ; rekordbox is on K11; this is the natural fit.
The Comp-Core `audio-engine` Rust crate already contains the runtime primitives
for this: a `DeckPlayer` buffer-playback node with seek + looping, a `hound` WAV
loader, `FilterFx` / `DelayFx` / `ReverbFx` with beat-synced delay, a `MotionSynth`
mod-matrix, a `VoiceManager`, and `link-clock` (Ableton Link). The `cc-dj-auto`
crate has a full automatic-mixer (`mixer.rs`, `transition.rs`, `analyzer.rs`).
We are not building a music engine from zero — we are wiring an existing one to
the existing 128-dynamics body signal.
---
2. The key finding — HD1
HD1 is a 460 GB drive, **97
(not Mac1). SSH shells cannot read it: macOS Full Disk Access blocks `sshd` and
`osascript do shell script`. The directory listing is readable via Finder
AppleScript over SSH (Finder has FDA); file contents are not, until a human
copies them off HD1 in Terminal.app (which has FDA). This is the same gate
recorded in `san-training-data-2026-05-19.md`.
What is on HD1 (verified by directory listing):
| Path | What it is | Why it matters |
|---|---|---|
| `Bandcamp-Downloads/` | ~53 produced tracks as `.wav` (high quality) + `stem-output/` | Source material — bass/club/electro, Mohamed's actual taste |
| `Bandcamp-Downloads/stem-output/separated/htdemucs/` | ~53 tracks, each split into `drums.wav` + `bass.wav` + `other.wav` + `vocals.wav` + `features.json` | THE ASSET. Stem separation already done. |
| `SoundCloud/` | ~60+ produced tracks (`.mp3`) + `stem-output/` | A second, larger stem-output set (ghetto-tech / footwork / bass) |
| `SoundCloud/stem-output/separated/htdemucs/` | Another full htdemucs 4-stem set + `library_profile.json` | More stem inventory |
| `motionmix-training/` | 4 rated playlists, per-track `.npz` librosa features, `ratings.json` | A curated, Mohamed-rated DJ library with extracted features |
| `stem_search.py`, `stem_phrase_factory.py` (HD1 root) | Existing stem-search + phrase-factory tooling | Prior stem work already exists — not greenfield |
| `bandcamp_stem_pipeline.py` (in stem-output) | The pipeline that produced the stems | The stem pipeline is reproducible |
The single most important finding: the expensive, slow part of a stem-based
system — running Demucs over a whole library — is already done. HD1 holds
~100+ professionally produced tracks already separated into clean 4-stem Demucs
sets, each with a `features.json` (BPM, key, energy curve, MFCC, chroma, onset
density — see `process_library.py` for the exact schema). A stem-driven LUME does
not start from zero; it starts from a finished, feature-tagged stem library.
The local Comp-Core copy of the pipeline is
`core/audio-media/stem-pipeline/process_library.py` — it documents the exact
per-stem `features.json` schema and supports `htdemucs_6s` (6-stem: adds separate
guitar + piano) if a future pass wants finer control.
---
3. The pipeline as it stands today
K11 Femto Bolt → MediaPipe pose (33 landmarks)
→ raw-33 pose JSON UDP → [ip]:9705
→ femto-bridge encode_jsonl (Rust, stateful, long-lived subprocess)
→ dynamics_128 vector @ 30 fps ← SOLID. Stays as the input. Do not touch.
→ motion_composer_realtime.py ← causal feature→music mapping. KEEP the mapping.
→ StrudelDriver → strudel_live.html → browser Web Audio ← THE WEAK LINK.The 128-dynamics signal and the causal feature extractor
(`RealtimeMotionComposer` / `StreamingFeatureExtractor`) are good engineering and
are kept verbatim. The failure is purely in the last stage: *what makes
sound*. Strudel synthesizing from raw oscillators in a browser cannot reach venue
quality, and "buzzing / obnoxious" is the expected outcome of live from-scratch
synthesis, not a bug to be tuned away.
---
4. Methodology comparison
Scored 1–10 (10 = best). "Sound quality" is the dominant axis — it is the stated
problem.
| Methodology | Sound quality | Build difficulty (10=easy) | Liveness / latency | Fit: K11 + rekordbox + 128-dynamics | Verdict |
|---|---|---|---|---|---|
| Live synthesis (Strudel) — current path | 3 — synthesized, browser oscillators, two passes failed | 7 — already built | 7 — phrase-boundary re-eval, glitch-free but coarse | 5 — browser, not rekordbox-native; audio leaves the Rust stack | Reject. Structural ceiling. Proven not good enough. |
| Offline composer → Rust streaming synth ("Path B") | 6 — same handcrafted additive synthesis, still synthesized not produced; risk of regressing the offline quality when made causal | 3 — reimplement `render_score` as a block callback; hard to keep it sounding as good causally | 8 — native, low latency | 6 — native Rust, but still a synth voice into rekordbox | Reject as primary. Months of work to land below the quality of playing real stems. Keep the mapping, discard the synthesis. |
| Stem-based interactive playback — body controls real produced stems | 9 — it is real produced music; sounds good the instant a stem plays | 6 — stems already separated; `audio-engine` already has `DeckPlayer` + WAV loader + FX | 8 — sample playback + parameter smoothing is the lowest-latency thing an audio engine does | 9 — DJ-shaped, rekordbox-native, consumes 128-dynamics directly as control | RECOMMENDED. |
| Sample/loop generative (smart-DJ) — beat-matched loops triggered by motion | 8 — real audio, but loop-cut artifacts if not beat-locked | 5 — needs beat grid + loop-slicing + Link sync | 8 — native | 8 — very DJ-native; `cc-dj-auto` mixer already exists | Strong. Folds in as Stage 4 of the recommended path (loop/phrase layer on top of stems). |
| SAN neural (body→music net) | unknown — untrained; V6 still in data-prep | 1 — not usable now, blocked on training | n/a | n/a | Defer. Off the launch critical path, exactly as the memory's "gets-smarter track" says. |
| Hybrid: stem bed + generative/synth accents | 9 — produced bed guarantees the floor; accents add motion expressivity | 4 — both subsystems must exist first | 8 | 9 | This is the end-state. Reached incrementally — Stages 1–3 are the stem bed, Stages 4–5 layer accents. |
Why stem-based wins, stated plainly
1. It removes the impossible requirement. "Generate good music live from
scratch" is deleted from the problem. The music is pre-produced. Motion
performs it. This is the root-cause fix; everything else fights the symptom.
2. The asset already exists and the slow work is done. ~100+ tracks,
already Demucs-separated, already feature-tagged, on HD1. No separation
compute on the critical path.
3. The runtime already exists. `audio-engine` has `DeckPlayer`
(buffer playback, seek, loop), `loader::load_wav`, `FilterFx`, `DelayFx`
(BPM-synced), `ReverbFx`, `VoiceManager`, `MotionSynth` mod-matrix, and
`link-clock`. `cc-dj-auto` has `mixer.rs` / `transition.rs` / `analyzer.rs`.
4. It is DJ-shaped. Mohamed is a DJ. Vertical stem layering, filter sweeps,
FX throws, beat-synced crossfades — this is the literal vocabulary of modern
stem DJing (Serato Stems, rekordbox 7 active pads). The control metaphor is
one he already owns.
5. The 128-dynamics signal maps onto it cleanly. The body features are
control signals, and a stem mixer is a control surface. See §6.
6. It degrades gracefully. Worst case, a stem set just plays as a loop —
still produced music, still sounds good. The Strudel path's worst case is
silence or buzzing.
The honest counterargument (contrarian check)
*"Stem playback is just a fancy loop player — it is not really generative, the
visitor is not really 'creating' music, and it may feel less novel than a neural
body→music net."*
Response: this is true and it does not matter for the launch goal. The stated
problem is "music that responds to movement AND sounds genuinely good — venue
quality." A visitor standing in front of the camera and hearing a club track
build, open up, filter, and drop in response to how they move is
unmistakably interactive and unmistakably good-sounding. The "true generation"
ambition is real but it belongs to SAN — and SAN can later be trained with the
stem system as its quality teacher (clean targets), exactly as the memory's
distillation idea proposes. Stem-based is the launch architecture; neural is the
upgrade. Shipping a venue-quality interactive system now beats shipping a
"truly generative" system that buzzes.
---
5. Recommended architecture
Stem-based interactive playback engine, built in `cc-echelon/audio-engine`,
driven by the existing 128-dynamics body signal, running on K11, routed into
rekordbox.
K11 Femto Bolt → MediaPipe → raw-33 pose JSON → :9705
→ femto-bridge encode_jsonl (KEEP — long-lived subprocess)
→ dynamics_128 @ 30 fps (KEEP — solid input)
→ StreamingFeatureExtractor (KEEP — causal feature mapping from
motion_composer_realtime.py: energy, wrist, openness, extension,
verticality, breath, anticipation, release + accent/open/release peaks)
→ StemConductor (NEW — replaces StrudelDriver)
maps features → stem mix state: per-stem gain, filter cutoff,
FX sends, loop region, crossfade target
→ audio-engine StemDeck graph (NEW node built on existing DeckPlayer)
4 DeckPlayers (drums/bass/other/vocals) sample-locked + beat-aligned
via link-clock, summed through FilterFx/DelayFx/ReverbFx
→ cpal audio out on K11
→ VB-Audio Virtual Cable → rekordbox external-input deckKey design rules:
- The 128-dynamics input is frozen. No change to pose capture, the encoder,
or the femto-bridge subprocess. This research does not re-litigate a working
pipeline.
- The causal feature layer is reused verbatim. `StreamingFeatureExtractor`,
`RollingNormalizer`, `OnlinePeakDetector` from `motion_composer_realtime.py`
are kept exactly. They already turn raw 128-floats into smoothed, normalized,
causal features with motion events. That work is done and verified.
- `StrudelDriver` is replaced, not extended. The thing it drove (Strudel)
is the failure. The new sink is `StemConductor` → `StemDeck`.
- The stem playback engine is Rust, in `audio-engine`. It extends the
existing `DeckPlayer` rather than introducing a new audio runtime. Native
`cpal` output, sample-accurate, lowest possible latency. No browser.
- rekordbox stays the front-of-house mixer. The engine outputs finished
stereo audio into a rekordbox deck via VB-Audio Virtual Cable (the routing
already settled in `lume-sensor-capture-architecture-2026-05-19.md`).
rekordbox is not a synth and is not asked to be one.
- Stem sets are content, loaded from disk. A LUME stem pack is a directory
of `{drums,bass,other,vocals}.wav` + the `features.json` (BPM/key/energy) the
existing pipeline already emits. The bar curates a handful of sets; switching
sets is a content decision, not a code change.
How the body controls the music (the mapping)
The 128-dynamics features become a stem control surface. This is the
`StemConductor` mapping, and it reuses the intent of the offline composer's
feature→music rules — just pointed at stems instead of sine waves:
| Body feature | Stem-engine control | Musical result |
|---|---|---|
| `energy` (smoothed) | Vertical layering: low energy = drums+bass only; rising energy fades in `other` then `vocals` | Stillness = minimal groove; movement = full track builds |
| `openness` / `extension` | Low-pass filter cutoff on the full mix (the existing `FilterFx`) | Wide-open body = bright, open sound; closed = filtered-down |
| `verticality` | Filter resonance + high-shelf; reverb send | Lifted body = brighter, more air |
| `wrist` speed | FX send (`DelayFx` throw, beat-synced) | Fast hands = delay throws / echo trails |
| `breath` | Reverb mix / sub presence (slow) | Slow swell texture |
| `accent` peak event | Beat-quantized loop re-trigger / stutter on the next bar grid | Sharp moves punch the groove on-beat |
| `open` peak event | Crossfade toward a brighter or higher-energy stem set | Big open gestures change the section |
| `release` peak event | FX tail-out / filter sweep down + drop back to the drum+bass core | Settling = the track breathes back down |
| `anticipation` (rising energy gradient) | Pre-load the next layer / build a riser | Build-up before a peak |
Two crucial properties this inherits for free:
- It always sounds in time. Stems are beat-gridded; `link-clock` keeps the
4 `DeckPlayer`s sample-locked. Layering and re-triggering happen on the bar
grid, so motion never produces an off-beat mess.
- It always sounds good. Every layer is a professionally produced stem.
The worst a motionless visitor can do is hear a clean drum+bass loop. There
is no "buzzing" failure mode because nothing is synthesized.
---
6. Staged plan — each stage independently validatable
The end goal is explicit: a LUME visitor stands in front of the K11 Bolt, and
their movement performs a professionally produced track in real time — building
it, opening it up, filtering and throwing FX, dropping it back down — at venue
sound quality, routed through rekordbox.
### Stage 0 — Unlock HD1 and curate the launch stem packs (human-gated, ~30 min)
- Mohamed, in Terminal.app on Mac4 (has Full Disk Access), copies the stem
library off HD1 to a non-FDA-restricted location, then onto K11:
cp -R "/Volumes/HD1/Bandcamp-Downloads/stem-output/separated/htdemucs" Desktop/lume-stems-bandcamp
cp -R "/Volumes/HD1/SoundCloud/stem-output/separated/htdemucs" Desktop/lume-stems-soundcloud(Also grab the `features.json` files — they ride along inside each track dir.)
- Pick 4–8 stem sets for launch. Selection criteria: tracks Mohamed loves,
clean separation, a spread of energy/BPM. The `motionmix-training/ratings.json`
and the per-track `features.json` (BPM, key, energy curve) inform the picks.
- Validates when: the chosen `{drums,bass,other,vocals}.wav` sets + their
`features.json` are on K11 at a known path, e.g. `C:\lume\stems\<set>\`.
- *This is the only step that requires Mohamed and it is the gate everything
else depends on. It is identical in nature to the existing
`build_v5_pairs.py` HD1 gate already documented in memory.*
### Stage 1 — `StemDeck` node in `audio-engine` (Rust, ~2–3 days)
- New `audio-engine` node: `StemDeck` = 4 internal `DeckPlayer`s
(drums/bass/other/vocals) loaded from one stem set via the existing
`loader::load_wav`, sample-locked, summed.
- Per-stem gain with `param_control::SmoothParameter` (no zipper noise).
- One shared `FilterFx` + `DelayFx` + `ReverbFx` on the sum (all already exist).
- Looping over a bar-quantized region using `DeckCommand::SetLooping` /
`Seek` (already in `nodes.rs`).
- Validates when: an `audio-engine` example (`examples/stem_deck.rs`) loads
one stem set, plays it, and a scripted gain/filter automation produces an
audible build-up + filter sweep on real hardware. Pure audio test, no motion.
### Stage 2 — `StemConductor`: 128-dynamics → stem mix state (~2–3 days)
- New module that consumes `StreamingFeatureExtractor` output (the kept causal
feature layer) and emits a `StemMixState` (per-stem gain, filter cutoff, FX
sends, loop region, crossfade target) per the §5 mapping table.
- Continuous features → smoothed parameters every frame; peak events
(accent/open/release) → bar-quantized actions via `link-clock`.
- Validates when: replaying a recorded `dynamics_128.npy` (the IMG_4241 /
IMG_4243 files already on HD2) through `StemConductor` → `StemDeck` renders an
offline WAV where the music demonstrably builds with energy and filters with
openness. Same replay-verification discipline already used for the realtime
composer — bit-reproducible, no live hardware needed.
### Stage 3 — Live on K11, end to end (~2–3 days)
- Wire `StemConductor` to the live `:9705` pose feed (the bridge already exists
in `motion_to_music_live.py`; swap the `StrudelDriver` sink for a thin client
that feeds the Rust `StemConductor`, or move the bridge into the Rust binary).
- `cpal` output on K11 → VB-Audio Virtual Cable → rekordbox external deck.
- Validates when: Mohamed stands in front of the K11 Bolt, moves, and hears
a produced track build/open/filter/drop in response — venue quality, in
rekordbox. This is the launch acceptance test for the music half.
- Install a `LUME-Music` NSSM autostart service (mirrors `LUME-BoltSkeleton`),
replacing the Strudel-based service plan in the memory file.
### Stage 4 — Crossfade + multi-set + loop layer (smart-DJ, ~1 week)
- `open` peak events crossfade between stem sets, beat- and key-matched using
each set's `features.json` (BPM/key) and the existing `cc-dj-auto` mixer /
`transition.rs` logic.
- Add a loop/phrase layer: motion can trigger beat-matched loops/phrases sliced
from stems (this is the "sample/loop generative" methodology folded in).
- Validates when: sustained interaction moves through multiple tracks with
smooth, in-key, on-beat transitions — no audible seams.
### Stage 5 — Hybrid accent layer + SAN teacher hook (upgrade track)
- Layer the kept `MotionSynth` (the existing Rust mod-matrix synth) as a thin
accent voice on top of the stem bed — body-triggered stabs/risers that the
produced stems cannot provide. The produced stems guarantee the quality floor;
the synth adds expressivity. This is the hybrid end-state in the comparison
table.
- The `StemConductor`'s motion→mix decisions become clean training targets:
SAN V6 can be distilled to imitate them (the memory's "composer as teacher"
idea, now with a better teacher than the offline composer).
- Validates when: accent layer adds motion expressivity without muddying the
mix; and a SAN training run consumes `StemConductor` decision logs as targets.
---
7. What to stop doing
- Stop tuning Strudel. Two sound-design passes have proven the ceiling.
The `strudel_live.html` / `StrudelDriver` path should be retired once Stage 3
lands. (Keep `motion_composer_realtime.py`'s feature layer — that part is good
and is reused.)
- Do not pursue "Path B" (offline composer → Rust streaming synth) as the
primary path. It is months of work to land below stem playback on sound
quality. Its `MotionSynth` engine is still valuable — as the Stage 5 accent
layer, not as the main voice.
- Do not wait for SAN. It is correctly on the gets-smarter track. The bar
launches on stems.
8. Risks and honest gaps
- HD1 access is a hard human gate (Stage 0). Nothing ships until Mohamed
copies the stems off HD1 in Terminal.app. This is unavoidable — it is a macOS
FDA restriction, the same one already blocking `build_v5_pairs.py`.
- Demucs separation quality varies. htdemucs 4-stem is good but not perfect;
bleed between `other` and `vocals` is common. Mitigation: Stage 0 curation
picks cleanly separated tracks; `htdemucs_6s` is available in
`process_library.py` if a finer re-pass is wanted later.
- Beat-grid accuracy. Layering only sounds tight if each stem set has an
accurate BPM + downbeat. The `features.json` has librosa `bpm` + `beat_count`;
downbeat may need a manual nudge per set during Stage 0 curation. Low effort,
one-time per track.
- Stem count at launch is small. 4–8 sets is a deliberately small launch
catalog. That is fine — depth of interaction per track matters more than
catalog size for a bar installation. More sets are pure content adds later.
- Copyright. The HD1 library is downloaded/produced music. For a public
venue this is a licensing question for Mohamed to resolve (it is a DJ-set
question, not an engineering one) — flagged here, not solved here.
---
9. Reusable assets inventory (what already exists)
| Asset | Location | Role in the recommended architecture |
|---|---|---|
| `dynamics_128` encoder | `crates/femto-bridge` (`encode_jsonl`) | KEEP — the body input, unchanged |
| Causal feature extractor + peak detectors | `tools/lume-music/motion_composer_realtime.py` | KEEP — reused verbatim as the feature layer |
| Live pose bridge | `tools/lume-music/motion_to_music_live.py` | KEEP plumbing; swap the Strudel sink for `StemConductor` |
| `DeckPlayer` (buffer playback, seek, loop) | `audio-engine/src/nodes.rs` | Base of the new `StemDeck` node |
| WAV loader | `audio-engine/src/loader.rs` (`hound`) | Loads stem `.wav` files |
| `FilterFx` / `DelayFx` (BPM-synced) / `ReverbFx` | `audio-engine/src/fx.rs` | The motion-controlled FX chain |
| `SmoothParameter` / `ParameterController` | `audio-engine/src/param_control.rs` | Zipper-free per-frame parameter automation |
| `link-clock` (Ableton Link) | `crates/link-clock` | Beat grid / sample-lock across the 4 stem decks |
| `cc-dj-auto` mixer / transition / analyzer | `cc-dj/crates/cc-dj-auto` | Stage 4 beat- and key-matched crossfades |
| `MotionSynth` mod-matrix synth | `audio-engine/src/synth/echelon_integration.rs` | Stage 5 hybrid accent layer |
| rekordbox / serato bridges | `cc-dj/crates/cc-dj-control/src/bridge/` | Reference for the rekordbox routing path |
| Stem pipeline | `core/audio-media/stem-pipeline/process_library.py` | Re-run / `htdemucs_6s` re-pass; defines `features.json` schema |
| Stem library (~100+ tracks, pre-separated) | HD1 `Bandcamp-Downloads` + `SoundCloud` `stem-output/` | The content. The asset that makes this the right architecture. |
---
10. Bottom line
Stop synthesizing music live. Mohamed already has a professionally produced,
already-stem-separated music library on HD1 and a DJ's instinct for performing
it. The LUME bar should perform produced stems with the body, not *invent music
from scratch*. The 128-dynamics signal is a control surface; the
`cc-echelon/audio-engine` already has the playback and FX primitives; rekordbox
is the front-of-house mixer. Build the `StemDeck` + `StemConductor`, validate
each stage independently, and the music half of LUME reaches venue quality —
because the source material already is.
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
Comp-Core/core/audio-media/cc-echelon/tools/lume-music/MUSIC_DIRECTION.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture