Grand Diomande Research · Full HTML Reader

LUME / DEMON Architecture Comparison

DEMON is a real-time controllable music diffusion runtime. It turns source audio, text prompts, LoRAs, references, and live control curves into generated or transformed music.

Embodied Trajectory Systems architecture technical paper candidate score 54 .md

Full Public Reader

LUME / DEMON Architecture Comparison

Date: 2026-05-28

Summary

DEMON is not a pose system, not a sensor fusion system, and not a DJ command
system.

DEMON is a real-time controllable music diffusion runtime. It turns source
audio, text prompts, LoRAs, references, and live control curves into generated
or transformed music.

LUME is an embodied capture, truth, choreography, visual, and DJ-control system.
It turns cameras, mocopi, watches, phones, labels, and pose evidence into
BodyTruth, gesture templates, visual responses, training bundles, and guarded
Rekordbox commands.

The integration point is not raw sensors. The integration point is:

text
LUME rehearsal bundle
  -> motion templates / NKo motion lexicon
  -> DEMON control curves
  -> generated or transformed audio

DEMON Architecture

DEMON is built around ACE-Step v1.5 and a StreamDiffusion-style ring buffer for
audio. The runtime keeps several in-flight music generations alive at different
denoising stages. Each tick advances the active slots with a batched decoder
forward pass. After warmup, completed song latents stream out steadily.

Important pieces:

text
source audio / prompt / references / LoRAs
  -> ACE-Step conditioning and source latents
  -> StreamPipeline ring buffer
  -> per-slot SlotRequest state
  -> batched decoder tick
  -> windowed VAE decode
  -> streamed generated audio

Each slot can carry its own:

text
seed
denoise strength
timestep schedule
source latent
conditioning
per-frame curves
x0 target
latent mask
CFG mode
LoRA state

DEMON has two control lanes:

text
submission-time controls
  prompt / source audio / denoise / conditioning
  affect new or draining slots
  convergence depends on ring depth

step-time controls
  shared mutable curves
  read by every in-flight slot every solver step
  next-tick effect

The control surfaces include:

text
per-frame source preservation
velocity scaling
ODE noise injection
CFG / guidance curves
x0 target morphing
channel gain
LoRA refit
prompt blend
timbre / structure references

Hardware/runtime:

text
Python 3.11
ACE-Step checkpoints
NVIDIA CUDA GPU for local runtime
TensorRT for the intended fast path
Next.js/web demo optional
MCP and MIDI control surfaces optional

Our current K11/Mac4/Mac5 mesh does not have the right local NVIDIA GPU. So LUME
should produce DEMON requests now, and run DEMON later on a remote/cloud GPU or a
future CUDA machine.

LUME Architecture

LUME starts from the body, not from audio.

text
performer
  -> cameras / mocopi / phones / watches
  -> BodyTruth
  -> movement primitives
  -> gesture templates
  -> bounded outputs
  -> rehearsal bundle

Current machine split:

text
Mac4
  long-take camera capture
  Unity / DYK visuals
  optional mocopi-to-Unity feed

K11
  durable rehearsal storage
  Pose Coach
  AirDeck viewer and self-play
  Rekordbox command safety gate

Mac5
  offline SAM3D reconstruction
  body analysis
  derived reconstruction artifacts

Current bundle outputs:

text
bundle.json
derived/nko/motion_lexicon.json
derived/sam3d/request.json
derived/demon/request.json

LUME's core safety rule remains:

text
Only K11 sends Rekordbox commands.

DEMON must not bypass this. DEMON can generate audio, transform audio, or become
a creative output target. It cannot replace the AirDeck bridge.

How DEMON Differs From LUME

Input

DEMON input:

text
source audio
text prompt
LoRA
timbre reference
structure reference
automation curves
MIDI / MCP / UI knobs

LUME input:

text
camera frames
pose landmarks
body boxes
mocopi skeleton
watch motion
SensorLogger telemetry
manual labels
Pose Coach clips
Rekordbox command logs

State

DEMON state:

text
in-flight latent slots
timestep schedules
conditioning tensors
shared mutable solver curves
decoder/VAE engine state
LoRA refit state

LUME state:

text
BodyTruth
source freshness
gesture evidence
label windows
bundle manifests
promotion status
visual template state
Rekordbox safety state

Output

DEMON output:

text
generated or transformed music audio
streamed decoded windows
control-session recordings

LUME output:

text
recorded training bundles
movement templates
Unity visuals
Pose Coach review clips
NKo motion lexicon
SAM3D reconstruction requests
guarded Rekordbox commands

Risk Boundary

DEMON risk:

text
bad generated audio
latency
GPU/runtime instability
style mismatch
copyright/source-audio review questions

LUME AirDeck risk:

text
wrong gesture fires a live DJ command
false positive play/pause/scratch/next-track
camera tracking loss
stale BodyTruth

That is why DEMON can be looser and more experimental than AirDeck command
promotion. A bad DEMON take can be ignored. A bad Rekordbox command interrupts
the performance.

Correct Integration

The correct integration is:

text
K11 bundle hub
  -> derived/nko/motion_lexicon.json
  -> derived/templates/gesture_templates.json
  -> derived/demon/request.json
  -> DEMON control-curve renderer
  -> DEMON runtime on CUDA/cloud
  -> generated audio candidate
  -> audition / compare / archive

Examples:

text
wave_color
  -> source-preservation or prompt-blend curve

burst_high_energy
  -> denoise / guidance / transformation envelope

weighted_slow_power_hold
  -> source-preservation hold / lower transformation

airdeck_platter_spin
  -> audio curve/LFO candidate
  -> not a Rekordbox scratch command

Wrong Integration

Do not do this:

text
camera -> DEMON -> Rekordbox
mocopi -> DEMON -> Rekordbox
DEMON MCP -> K11 keyboard keys
DEMON generated confidence -> BodyTruth

DEMON does not know whether the performer is truly present, whether a gesture is
promoted, or whether Rekordbox should receive a key. That is LUME/K11's job.

System Placement

Current placement:

text
K11
  emits derived/demon/request.json
  stores DEMON outputs later
  does not run DEMON locally today

Mac4
  may visualize DEMON control curves later
  should not run the heavy DEMON runtime today

Mac5
  may prepare offline template features
  is still not a DEMON runtime target without CUDA

Future CUDA/cloud host
  runs DEMON
  consumes K11 bundle requests
  returns generated audio and logs

Practical Next Build

Bundle packaging now writes this handoff shape:

text
derived/templates/gesture_templates.json
  -> derived/demon/request.json
  -> derived/demon/control_curves.json

That renderer should convert LUME template windows into DEMON-friendly curves:

text
time_s
label
intensity
source_preservation
denoise
prompt_blend
guidance
channel_gain_hint

Then DEMON can be plugged in later without changing the body architecture.

Current implementation:

text
tools/lume-mac5-reconstruction/package_k11_sam3d_first_capture_bundles.py
tools/lume-mac5-reconstruction/verify_k11_mac4_output_artifacts.py
tools/lume-mac5-reconstruction/smoke_test_mac4_output_contract.py

The implemented artifacts remain manifest/control-curve only. They do not run
DEMON locally and do not create a Rekordbox command path.

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

lume-commerce/viz/lume-pcloud/Docs/LUME_DEMON_ARCHITECTURE_COMPARISON_2026-05-28.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture