Grand Diomande Research · Full HTML Reader

K11 Music Production Stack — Where Everything Lives

| Tool | Role | Need on K11? | Why | |---|---|---|---| | **Unity** | Bar product **visuals output** | ✅ Yes (already there) | Renders the 1920×440 bar display from LUMA UDP. This is your "show." | | **Rekordbox** | **DJ performance** (live mixing, loops, FX, hot cues) | ✅ Yes (next install) | Has the most expressive keyboard shortcuts. cc-dj-control already has a Rekordbox bridge built. | | **Serato** | Same role as Rekordbox, competitor | ⚠️ Optional alternative | cc-dj-control supports it too. Pick one — running

Embodied Trajectory Systems technical note experiment writeup candidate score 44 .md

Full Public Reader

K11 Music Production Stack — Where Everything Lives

Date: 2026-05-12. Authority: verified against the actual code in `Desktop/Comp-Core/`.

---

The short answer to your question

> "Unity vs Rekordbox vs Ableton. Are these things that I would need?"

ToolRoleNeed on K11?Why
UnityBar product visuals output✅ Yes (already there)Renders the 1920×440 bar display from LUMA UDP. This is your "show."
RekordboxDJ performance (live mixing, loops, FX, hot cues)✅ Yes (next install)Has the most expressive keyboard shortcuts. cc-dj-control already has a Rekordbox bridge built.
SeratoSame role as Rekordbox, competitor⚠️ Optional alternativecc-dj-control supports it too. Pick one — running both at once doesn't help. Default to Rekordbox per your preference.
Ableton LiveMusic production (write, record, sequence, sound-design)⚠️ Eventually, not day oneDAW for making tracks. Rekordbox plays existing tracks. Until you need to produce new music from motion in real time, you can skip Ableton entirely. See "Where Ableton fits" below.

Net for the first install pass: Unity + Rekordbox. Add Ableton later if and when the motion-to-music generation needs a DAW host.

---

What's already built (and where it lives)

You already have the full DJ agent stack. Stop treating it as TBD. The code is in `Desktop/Comp-Core/core/audio-media/cc-dj/`.

`cc-dj` — six crates

core/audio-media/cc-dj/
├── crates/
│   ├── cc-dj-types/     ← Command, Action, Tier, DeckState, DJConfig types
│   ├── cc-dj-voice/     ← Voice control via Gemini Live API
│   ├── cc-dj-gesture/   ← Gesture recognition + training pipeline
│   ├── cc-dj-control/   ← Rekordbox + Serato bridges, action execution
│   ├── cc-dj-auto/      ← Auto-DJ: mixing, transitions, analysis
│   └── cc-dj-python/    ← Python bindings via PyO3
├── configs/
│   ├── commands.yaml         ← 3,381 commands (yes, three thousand)
│   ├── dj.yaml               ← agent config, tier rules, deck setup
│   └── gesture_mappings.yaml ← gesture → keyboard shortcut per deck
└── README.md

What lives in each config

`configs/commands.yaml` (3,381 lines). Every keyboard shortcut Rekordbox and Serato expose, with metadata. Sample entry:

yaml
- id: '3014'
  canonical: '4-Beat Loop'
  category: loop
  deck: left
  action_type: loop
  shortcut: 'command + 4'
  safety:
    destructive: false
    requires_idle: false
    confirm_if_playing: false
    cooldown_ms: 800

`safety.cooldown_ms` and `confirm_if_playing` are what stop gestures from accidentally nuking your set. The "tier" system in `cc-dj-types` (Transport → Cue → Loop → FX → Blend) is the progressive-unlock model: low-tier gestures fire fast, high-tier ones require longer commitment.

`configs/gesture_mappings.yaml` (162 lines). Direct gesture-to-shortcut mapping per deck. Ten gesture primitives shipped:
- `flick_forward`, `flick_back`
- `shake`
- `double_tap`
- `scratch_motion`
- `circular_motion`
- `punch`
- `snap`
- `wrist_twist_left`
- `tilt_control`
- `up_down_motion`

Sample mapping (left deck):

yaml
left_deck:
  flick_forward:
    command_id: "3006"
    shortcut: "Z"
    description: "Play/Pause Left Deck"
    category: transport
  circular_motion:
    command_id: "3014"
    shortcut: "command + 4"
    description: "4-Beat Loop Left Deck"
    category: loop

`configs/dj.yaml` (217 lines). Agent configuration: deck definitions, tier thresholds, cooldown defaults, Gemini Live model + voice settings.

Plus: the keyboard primitive

Separate from cc-dj, `core/audio-media/cc-echelon/crates/midi-osc/src/keyboard.rs` (550 lines) is the low-level keyboard event emitter. cc-dj-control sits on top and calls into it.

Plus: the paper

`papers/computational-choreography/paper.md` is the academic write-up of the whole motion → command pipeline. That's where the keyboard-mapping rationale you remembered actually lives.

---

The pipeline (motion → music → DJ set)

Body motion (Femto Bolt RGB → MediaPipe)
    │
    ▼
cc-gesture / cc-dj-gesture
    │  classifies the motion into one of 10 primitive gestures
    │  + reads tier state and cooldowns
    ▼
cc-dj-voice (Gemini Live)
    │  optional voice channel: "lock the loop", "swap decks", spoken intent
    │  fuses with gesture stream → final Action
    ▼
cc-dj-control
    │  resolves Action → Command from commands.yaml
    │  emits keyboard shortcut via cc-echelon midi-osc/keyboard.rs
    ▼
Rekordbox  (or Serato, depending on your choice)
    │  fires the actual play / cue / loop / fx
    │  the audio output of Rekordbox feeds the room
    │
    ├──▶ Room speakers (audience hears it)
    └──▶ echelon-bar Rust receives gesture/latent state in parallel
                                │
                                ▼
                          LUMA UDP :9703
                                │
                                ▼
                            Unity Bar Visuals (1920×440 strip)

Two outputs from one body. The audience hears music in Rekordbox. The bar shows visuals in Unity. Both driven by the same gesture stream, with Gemini Live as a spoken-intent layer that can override or refine.

---

How "training the motion" works (your specific phrase)

You said: "we'd train the particular motion that we have."

The training path is already wired in `cc-dj-gesture`:

1. Perform the motion N times.
2. cc-dj-gesture records the landmark trajectory and writes to `gestures.json` (the GestureDatabase).
3. Recognizer learns the new gesture's pattern.
4. You assign it a command_id from `commands.yaml` in `gesture_mappings.yaml`.
5. Next time you perform the motion, the recognizer matches, the bridge fires the keyboard shortcut, Rekordbox responds.

The README shows the API:

rust
let db = GestureDatabase::with_storage("gestures.json");
let mut recognizer = GestureRecognizer::with_database(db);
let mut trainer = GestureTrainer::new(...);
trainer.record_sample(motion_data);
trainer.save_gesture("body_wave_loop_8", command_id="3015");

No new code required to add a custom motion. New entries in `gestures.json` + `gesture_mappings.yaml`.

---

How motion-generated audio gets into Rekordbox

You said: "we will take those inputs and then essentially add them to rekordbox so we can mix them and mix it with other songs."

This is where it splits into two paths, and the choice is the whole "do I need Ableton" question.

Path A — Render once, DJ many times (no Ableton needed)

1. echelon-bar generates audio in real time from body motion.
2. Capture that audio to a WAV file (echelon-bar already does this for replay; the writer side exists).
3. Drop the WAV into Rekordbox's library.
4. Rekordbox analyzes it (beat grid, key, waveform).
5. You DJ it against other tracks like any other source.

Pros: simple, fast, no extra software. The motion-generated track becomes part of your library.
Cons: the audio is fixed once rendered. You can't re-shape it live.

Path B — Live motion → live audio → live DJ (Ableton lives here)

1. echelon-bar emits MIDI (notes + CC) instead of audio.
2. Ableton receives the MIDI, drives synths/samplers, generates audio live.
3. Ableton Link syncs tempo to Rekordbox.
4. Rekordbox runs the rest of your set; Ableton's master output is fed in as a fourth deck (via an audio interface loopback or sub-mix).

Pros: the body literally plays the track in real time. Sound design is malleable mid-performance.
Cons: Ableton license, more setup, more failure modes. Most performers don't do this Day 1.

My recommendation: Path A first. Run a session. Capture the motion-generated audio. Cue it up in Rekordbox alongside your usual set. See if you actually want Path B before paying for Ableton.

---

Where Ableton specifically fits, if you go that direction

Ableton is a DAW (Digital Audio Workstation). Its strengths:
- Session View: clips arranged in a grid, launched live. Perfect for gesture-driven clip triggering.
- Max for Live: drag-and-drop programmable devices. You can build a "motion-to-CC mapper" in M4L without writing Rust.
- Ableton Link: beat-sync between any Link-aware apps. Rekordbox supports Link natively since 6.0. So Ableton tempo follows Rekordbox tempo automatically.
- VST/AU hosting: every synth and sample library you own loads inside Ableton.

The integration would be:
- `cc-dj-control` adds a `MidiBridge` alongside the keyboard bridge.
- A new mapping config `ableton_mappings.yaml` lists which gestures fire which MIDI notes / CCs.
- Ableton runs in a fixed Session View; M4L receives the MIDI and triggers clips / morphs sounds.
- Rekordbox stays as the master mixer; Ableton's stereo out routes into a Rekordbox channel.

None of this requires Unity to change. The visuals layer is independent.

---

What you'd need to install on K11 (in order)

1. Rekordbox (Pioneer site, free Performance license to start). Install, run once, accept the license.
2. A few free Rekordbox keyboard shortcuts you'll actually use. Open Preferences → Controller → Keyboard mappings. cc-dj-control assumes the default mappings — if you change them, update `commands.yaml`.
3. The cc-dj Rust crates built for Windows. Same procedure as the existing K11 echelon-bar build: `cargo build --release -p cc-dj-control --target x86_64-pc-windows-gnu`. Tests should pass.
4. A Gemini API key in env (`GEMINI_API_KEY`) so cc-dj-voice can talk to Live.
5. An audio interface (you already have one — the USB-C box for the bar). Rekordbox's master output goes through it.
6. (Skip until you decide on Path B) Ableton Live Suite + Max for Live.

---

Documentation files I'm adding to your stack

This doc lives at `Desktop/K11-MUSIC-PRODUCTION-RUNBOOK-2026-05-12.md`.

Existing related docs you should pin:

DocPathWhat it covers
cc-dj README`Desktop/Comp-Core/core/audio-media/cc-dj/README.md`API surface, six-crate architecture
Comp-Core map`Desktop/Comp-Core/CLAUDE.md`35 projects across 8 domain layers
K11 ↔ Mac4 pairing`Desktop/Comp-Core/Docs/chains/K11-MAC4-PAIRING-PLAN.md`543-line pairing plan, 6 phases
Femto + MediaPipe`Desktop/FEMTO-MEGA-VS-BOLT-RUNBOOK-2026-05-12.md`Camera differentiation + K11 MediaPipe setup
Comp Choreography paper`Desktop/Comp-Core/papers/computational-choreography/paper.md`The academic write-up, including the keyboard-mapping rationale
Gesture mappings`Desktop/Comp-Core/core/audio-media/cc-dj/configs/gesture_mappings.yaml`The 10 primitives → keyboard shortcuts
Commands library`Desktop/Comp-Core/core/audio-media/cc-dj/configs/commands.yaml`3,381 Rekordbox/Serato commands with safety rules

---

ADDENDUM 2026-05-12 21:00 — answering the four follow-up questions

Q1: "Use a GPT in real time."

This already exists in your stack. `cc-dj-voice` uses Gemini Live, which is Google's low-latency voice model. If you want OpenAI's equivalent (the GPT-4o Realtime API), it slots into the same place — `cc-dj-voice` is a thin adapter and swapping providers is a config change.

What real-time LLM does for you in this pipeline:
- Spoken intent → action: "load the next track on the right deck," "drop a 4-beat loop here," "swap decks."
- Conversation overlay: ask the agent what's playing, get a spoken answer, keep mixing.
- Safety net: voice commands can override gesture commands when you want hard control.

What it doesn't do well:
- Tight musical timing. Voice → LLM → response is 300–1000 ms round-trip. That's fine for "play the next track in 4 bars" but unusable for "hit the kick now." Gesture path is for tight timing. Voice path is for intent.

Provider choice:
- Gemini Live — already wired. Free tier exists. Google Cloud project needed.
- GPT-4o Realtime — comparable latency. Requires OpenAI API key and a small adapter in cc-dj-voice (~half a day of work).

Default to Gemini Live. Swap to GPT later if you have a specific reason (better tool use, model preference, billing).

---

Q2: "Implications of plugging into MIDI. We had some talks within computational choreography MIDI."

You're right that the paper and the codebase already cover MIDI. Here's the inventory.

In `papers/computational-choreography/paper.md`:
- Section on Beat Synchronization names `LinkClock` (Ableton Link FFI) + `MidiSyncClock` (24 PPQ MIDI clock) as the two beat-sync mechanisms.
- The `midi-osc` crate is documented as "MIDI/OSC I/O for hardware integration" with `MidiIn`, `MidiOut`, `OscSender`.
- The paper explicitly references the professional TouchDesigner → OSC → Ableton (via Max for Live or virtual MIDI) pipeline used by teamLab, Random International, and Universal Everything.

In code at `core/audio-media/cc-echelon/crates/midi-osc/`:

FileRole
`keyboard.rs` (550 lines)Keyboard simulation for Rekordbox / Serato / Traktor hotkeys
`midi_in.rs`MIDI input from controllers (DJ surfaces, foot pedals, MIDI keyboards)
`midi_out.rs`MIDI output + ControllerFeedback + LedColor + MidiClock
`osc.rs`OSC for TouchOSC, Resolume, Ableton
`mapping.rs`MidiMapping + MidiMessageType structs
`lib.rs`Top of crate — explicitly states "MIDI for DJ controllers and lighting"

Practical implications of plugging into MIDI:

1. MIDI Out unlocks Ableton + hardware synths + MIDI-aware FX in one wire. Same gesture stream that fires Rekordbox shortcuts can simultaneously fire MIDI notes into Ableton (or any DAW).
2. MIDI In unlocks hardware control surfaces. Plug in a DJ controller (DDJ, Mixtrack, etc.) and cc-dj-control can read its faders and knobs alongside your gestures — combined input.
3. MIDI Clock + Ableton Link keeps everything tempo-locked. Rekordbox 6+ supports Link; LinkClock crate wraps it.
4. MIDI is the universal language between music software and lighting consoles, MIDI-DMX bridges, smart fixtures, and effects pedals. One protocol, many endpoints.
5. No new code needed for basic MIDI out. `cc-echelon-midi-osc::MidiOutputHandler` already sends notes and CCs. The wiring step is: `cc-dj-control` action → call `MidiOutputHandler.send(MidiOutputMessage)` alongside the keyboard shortcut.

One config to know: virtual MIDI ports. On Windows you'll need `loopMIDI` (free) to create a virtual port that cc-echelon writes to and Ableton (or whatever) reads from. On Mac it's IAC Driver built-in. Both are zero-cost.

---

Q3: "What does it take to interact with the lighting?"

Two paths. They're not exclusive — start one, add the other later.

Path L1 — MIDI-controlled lighting (works today, immediate)

cc-echelon already speaks MIDI. Get a MIDI-to-DMX bridge and the gesture stream can drive lights with zero new code:

  • ENTTEC DMX USB Pro Mk2 ($150ish, industry standard) — connects to K11 via USB, MIDI in from cc-echelon's MIDI out, DMX out to your light fixtures.
  • Showtec MIDI to DMX 512 ($80) — cheaper, fewer features.
  • Configure cc-echelon's `MidiMapping` to emit CCs that map to DMX channels. The bridge translates.

This is what 80

Path L2 — Native DMX over ArtNet / sACN (real architecture, later)

If you outgrow MIDI-to-DMX (latency, channel count, complex shows), add native DMX:

  • ArtNet / sACN are Ethernet protocols carrying DMX universes. No hardware required if your lights speak ArtNet (modern intelligent fixtures do).
  • No existing crate in Comp-Core. Would need to add `cc-lighting` (new crate). Rust has libraries: `artnet_protocol`, `dmx`, `artnet-rs`. ~1 week of work to wrap one and define our lighting API.
  • Sit at the same layer as `cc-echelon-midi-osc`: gesture stream → lighting commands → ArtNet broadcast → fixtures respond directly over LAN.

What "interacting with lighting" actually means in this context

  • One light per deck (mirror Rekordbox state): play = green, pause = red, cue = white flash.
  • Beat-synced strobes: tempo from LinkClock drives strobe rate.
  • Gesture-direct fixture moves: arm raise = wash up, arm sweep = pan across.
  • Color mood: latent state from echelon-bar (calm / motion / openness) maps to RGB.
  • Unity bar visual + lights as one show: same LUMA UDP stream feeds Unity AND a lighting consumer.

The deepest integration is option 5 — treat the 1920×440 bar display and the lighting fixtures as two consumers of the same LUMA stream. Same authoring, same brain, two outputs. That's what makes the room feel coherent instead of "screen plus separate lights."

---

Q4: "Where does Ableton have a place? Will there be too much latency?"

Where Ableton fits:

The paper names it directly: the industry-standard motion → music workflow is OSC → Ableton via Max for Live. Used by teamLab, Random International, Universal Everything. cc-echelon already emits OSC. So getting motion into Ableton is one config away.

Three scenarios where Ableton earns its keep:

1. Live sound design. You want gestures to morph synth parameters in real time, not just trigger shortcut keys. Ableton hosts the synth, your gestures emit MIDI CC.
2. Sample triggering. Session View as a grid of clips. Each gesture fires a clip slot. The body becomes the launchpad.
3. Recording the motion-music in production quality. Run a session, capture the Ableton master, render to a stem for later DJing in Rekordbox.

If none of those three describe what you're doing tonight, you don't need Ableton tonight.

Latency reality:

PathCamera→audio latencyUse case
Femto → MediaPipe → echelon-bar audio out (cpal direct)40–55 msNative motion-to-music. Tight.
Femto → MediaPipe → MIDI → Ableton → audio interface70–95 msMotion as MIDI controller for Ableton synths. Still under the 100 ms perceptual threshold.
Voice → Gemini Live / GPT Realtime → action300–1000 msSpoken intent only. Not for musical timing.
Gesture → Rekordbox hotkey50–70 msDJ transport / loops / FX. Plenty tight.

The 100 ms rule of thumb: under that, the brain perceives motion and sound as a single event. Over that, lag becomes audible. All the gesture-driven paths fit. Only the spoken-LLM path doesn't, and that's fine because nobody's asking Gemini to play a note — they're asking it to change a track.

Practical advice:
- Buffer size on the audio interface matters more than you'd expect. 128 samples at 48 kHz = 2.7 ms of audio buffer; 1024 samples = 21 ms. Run as low as the interface allows without xruns.
- WASAPI exclusive mode on Windows beats shared mode for latency. Set that in Rekordbox / Ableton audio prefs.
- USB hub between K11 and Femto Bolt = bad. Direct USB-C port = good.

Verdict on Ableton: worth it once you're past Path A (render once, DJ many times). Latency is not the reason to avoid it. Setup complexity and license cost are.

---

Open questions to resolve before install day

1. Serato or Rekordbox? You leaned Rekordbox in chat. Confirm so we lock the bridge target.
2. Which audio interface routes Rekordbox master to the bar speakers? Need the model number to pin WASAPI settings on K11.
3. Gemini Live billing. API key tied to which Google Cloud project?
4. Do you want the gesture training UI on K11 or on Mac4? cc-dj-gesture training records gestures; Mac4 is more comfortable for the recording session. The trained `gestures.json` then syncs to K11.

---

TL;DR

You don't need to invent anything new. The whole motion → keyboard → Rekordbox path is built. Six crates, 3,381 commands, 10 gesture primitives, voice via Gemini Live, training pipeline included.

For K11 you need: Unity (have it), Rekordbox (install it), cc-dj Rust build (cargo it).

You can postpone Ableton until you specifically need a DAW for live sound design.

Where does each piece live? Right above this line.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

K11-MUSIC-PRODUCTION-RUNBOOK-2026-05-12.md

Detected Structure

Method · Evaluation · References · Figures · Code Anchors · Architecture