Grand Diomande Research · Full HTML Reader

External Research Fit Registry

This file maps external papers and projects into the current computational choreography stack. It is research guidance, not an implementation claim.

Embodied Trajectory Systems technical note experiment writeup candidate score 40 .md

Full Public Reader

External Research Fit Registry

Date: 2026-06-06

This file maps external papers and projects into the current computational
choreography stack. It is research guidance, not an implementation claim.

Current system invariant:

text
Mac4 -> long-take capture and Unity/live visuals
K11  -> durable rehearsal bundles, Pose Coach, AirDeck, Rekordbox safety gate
Mac5 -> offline reconstruction and heavy body analysis
Mac1 -> transfer/orchestration where trust between machines is needed

K11 remains the only live Rekordbox command gate. Mac4 Unity, Mac5
reconstruction, rented GPUs, generated video, and raw sensor streams must not
send DJ commands directly.

One-Line Fit

SourceBest roleRuntime laneIntegration priority
Cosmos + Locate Anything DGXSynthetic scene and object-grounding forgeOffline or rented GPUMedium
ENTHEABrowser/WebGL visual instrument for Mac4/OBS/NDILive visuals onlyHigh
D4RT4D reconstruction and all-pixel tracking from videoOffline reconstructionHigh research, medium implementation
MDA depth ambiguity paperBoundary-safe depth confidence and flying-point cleanupCapture/reconstruction QAHigh
MusePosePose-to-avatar/video content generatorOffline content/renderMedium
Magenta RealTime 2Local Apple Silicon generative music instrumentLive audio, not DJ safetyHigh
DEMONControllable music diffusion from LUME control curvesCUDA/cloud audio render, audition before liveMedium-high
KinesisPhysiological motion prior and fatigue/plausibility scorerOffline training/evaluationLong-term
MAMMAMulti-person SMPL-X markerless mocapOffline reconstructionHighest strategic reconstruction target

System Placement

text
synthetic rehearsal
  Cosmos + Locate Anything
      -> synthetic scene/label manifests

real capture
  Mac4 cameras + K11 Pose Coach + MotionMix/iPhones
      -> lume.rehearsal_bundle.v1

depth and scene truth
  MDA-style depth boundary confidence
  D4RT 4D scene reconstruction
  MAMMA SMPL-X multi-person reconstruction
      -> derived reconstruction artifacts

body/motor meaning
  Kinesis physiological prior
      -> plausibility, fatigue, motor-intent scores

review and promotion
  K11 SAM3D/AirDeck review workbench
      -> approved non-live artifacts
      -> dry-run manifest
      -> later live manifest only after proof

live output
  Mac4 Unity / ENTHEA / TouchDesigner visuals
  Magenta RealTime 2 accompaniment
  DEMON generated/transformed audio candidates
  K11 Rekordbox bridge

Concrete Tour Through Our System

Think of the system as four buses, not one giant app.

Bus A - Real-Time Body Truth

Current authority:

text
MotionMix / K11 / Mac4 sensors
  -> LumeBodyTruth
  -> Mac4 Unity / visuals
  -> K11 AirDeck as extra safety context

Relevant local docs:

text
[home]/Desktop/MotionMix/LUME-BODY-TRUTH-CONTRACT.md
[home]/Desktop/computational-choreography/04-generative-output/lume-visuals.md
[home]/Desktop/computational-choreography/04-generative-output/motion-lexicon.md

Research that belongs here:

  • MDA as depth-confidence logic.
  • ENTHEA as a visual consumer.
  • Magenta RealTime 2 as an audio/MIDI consumer.
  • DEMON only as a later consumer of approved/template-derived control curves.

Rule:

This bus can react live. It cannot promote new DJ gestures by itself.

Bus B - Rehearsal Bundle Evidence

Current authority:

text
K11 C:\lume\dance-sessions\_processed\<bundle_id>
  -> bundle.json
  -> derived/sam3d/request.json
  -> derived/<research_backend>/...

Relevant local tools:

text
[home]/Desktop/lume-commerce/viz/lume-pcloud/tools/lume-mac5-reconstruction/package_k11_sam3d_first_capture_bundles.py
[home]/Desktop/lume-commerce/viz/lume-pcloud/tools/lume-mac5-reconstruction/orchestrate_k11_to_mac5.py
[home]/Desktop/lume-commerce/viz/lume-pcloud/tools/lume-mac5-reconstruction/mac5_k11_bundle_worker.py

Research that belongs here:

  • SAM3D now.
  • MAMMA next.
  • D4RT next.
  • MusePose as preview output.
  • Cosmos + Locate Anything as synthetic fixture output.
  • Kinesis as motion-quality scoring.
  • DEMON request/control-curve manifests for generated audio candidates.

Rule:

This bus writes evidence and review artifacts. It never sends live controls.

Bus C - Review And Promotion

Current authority:

text
derived reconstruction artifacts
  -> sam3d-review-index.json
  -> sam3d-gesture-candidate-library.json
  -> sam3d-approved-gesture-library.json
  -> sam3d-dry-run-promotion-manifest.json
  -> later live manifest only after proof

Relevant local tools:

text
[home]/Desktop/lume-commerce/viz/lume-pcloud/tools/lume-mac5-reconstruction/build_k11_sam3d_review_index.py
[home]/Desktop/lume-commerce/viz/lume-pcloud/tools/lume-mac5-reconstruction/verify_k11_sam3d_airdeck_artifacts.py
[home]/Desktop/lume-commerce/viz/lume-pcloud/tools/lume-mac5-reconstruction/run_k11_sam3d_approval_intake.py

Research that belongs here:

  • MAMMA evidence thumbnails and SMPL-X quality scores.
  • D4RT point-cloud/camera-track overlays.
  • MDA depth QA badges.
  • Kinesis plausibility/fatigue badges.
  • MusePose generated preview sidecar.
  • Cosmos synthetic fixture provenance warnings.

Rule:

Research output can raise or lower confidence. Human approval and K11 dry-run
gates still decide promotion.

Bus D - Performance Output

Current authority:

text
K11 Rekordbox bridge
Mac4 Unity / TouchDesigner / OBS / NDI
Magenta RT2 / audio instruments
DEMON rendered audio candidates
ENTHEA browser visuals

Research that belongs here:

  • ENTHEA for browser visual synthesis.
  • Magenta RealTime 2 for body-steered accompaniment.
  • DEMON for auditioned generated/transformed music, usually from CUDA/cloud.
  • MusePose renders for pre-made visuals, not proof.

Rule:

Output surfaces can be expressive. They are not evidence unless they point back
to a real capture bundle.

Backend Attachment Plan

Every research backend should look like a sibling of `derived/sam3d`, not a
replacement for it.

text
_processed/<bundle_id>/
  bundle.json
  raw/
  derived/
    sam3d/
      summary.json
      sam3d_frames.jsonl
      motion_windows.jsonl
      template_candidates.json
    mamma/
      summary.json
      smplx_params.npz
      person_tracks.jsonl
      contact_events.jsonl
      review_frames/
    d4rt/
      summary.json
      camera_poses.json
      tracks_3d.jsonl
      pointcloud_world.ply
    depth_quality/
      summary.json
      boundary_confidence.png
      flying_point_mask.png
    kinesis/
      summary.json
      motor_plausibility.json
      fatigue_score.json
    musepose/
      summary.json
      generated_avatar_take.mp4
    demon/
      summary.json
      request.json
      control_curves.json
      render_manifest.json
      generated_audio_candidates/
    cosmos_grounding/
      summary.json
      visual_ops_manifest.json
      synthetic_scene.mp4

Minimum required fields for every new backend:

json
{
  "schema": "lume.<backend>_summary.v1",
  "bundle_id": "<id>",
  "source_kind": "real_capture | synthetic | generated_preview | reconstructed",
  "status": "complete | unavailable | failed_explicit | placeholder_explicit",
  "live_control_eligible": false,
  "requires_human_review": true,
  "created_at": "ISO-8601",
  "tool": "<backend name>",
  "inputs": [],
  "outputs": [],
  "warnings": []
}

This lets the existing verifier pattern expand without trusting a model just
because a file exists.

Research-By-Function Map

If the question is "Where is the person?"

Use:

  • MAMMA for SMPL-X body, person identity, contact, occlusion.
  • D4RT for 4D point tracks and camera/world geometry.
  • K11 Pose Coach for local AirDeck zone framing.

Output:

text
derived/mamma/person_tracks.jsonl
derived/d4rt/tracks_3d.jsonl
derived/sam3d/motion_windows.jsonl

System use:

Show stronger evidence in the K11 review console. Do not command Rekordbox.

If the question is "Can I trust this depth/body edge?"

Use:

  • MDA-style multi-hypothesis depth confidence.
  • Existing depth/RGB/mocopi freshness in `LumeBodyTruth`.

Output:

text
derived/depth_quality/boundary_confidence.png
LumeBodyTruth.sources.depth.boundary_confidence

System use:

Lower visual intensity or suppress gesture candidates when depth ambiguity is
high.

If the question is "Can we make visuals right now?"

Use:

  • ENTHEA for browser/WebGL visuals.
  • Mac4 Unity / TouchDesigner for existing show visuals.

Output:

text
ENTHEA browser source -> OBS/NDI
Unity LUMM :9702 -> bar display

System use:

Make the bar look alive immediately. This is the fastest visible win.

If the question is "Can the body play music?"

Use:

  • Magenta RealTime 2 small/base on Apple Silicon.
  • BodyTruth-to-MIDI/OSC bridge.

Output:

text
LumeBodyTruth -> MIDI CC / notes -> MRT2
derived/mrt2/midi_automation.jsonl

System use:

Body motion steers accompaniment or texture. Rekordbox remains separate.

If the question is "Can the body steer heavier music diffusion?"

Use:

- DEMON from LUME templates/control curves.

Output:

text
derived/demon/request.json
derived/demon/control_curves.json
derived/demon/generated_audio_candidates/

System use:

Generate or transform audio candidates from reviewed motion labels and
time-varying control curves. DEMON belongs after bundle/template extraction,
not before BodyTruth. Its output is auditioned, compared, archived, or promoted
as content. It does not command Rekordbox.

If the question is "Can we generate avatar/video assets?"

Use:

- MusePose from a verified pose sequence.

Output:

text
derived/musepose/generated_avatar_take.mp4

System use:

Marketing clips, training previews, visual companion material. Not evidence.

If the question is "Can we rehearse edge cases before recording?"

Use:

- Cosmos + Locate Anything.

Output:

text
derived/cosmos_grounding/visual_ops_manifest.json

System use:

Generate synthetic UI fixtures and negative controls: occluded hand, wrong hand,
phone in hand, cup near deck, no person, two people crossing, false deck label.
Everything stays `source_kind=synthetic`.

If the question is "Is this motion physically meaningful?"

Use:

- Kinesis after reconstruction.

Output:

text
derived/kinesis/motor_plausibility.json
derived/kinesis/fatigue_score.json

System use:

Score motion quality, fatigue, locomotion, and body-energy windows. Later, this
can help choose which gestures feel sustainable enough for performance.

What To Build First

The fastest path is not to install every model. Build the contract first, then
plug models in one by one.

Step 1 - Add Research Backend Slots

Patch the bundle/review docs and eventually the verifier to recognize optional:

text
derived/mamma
derived/d4rt
derived/depth_quality
derived/kinesis
derived/musepose
derived/cosmos_grounding
derived/mrt2
derived/enthea
derived/demon

This is mostly schema and UI plumbing. It should not require running the heavy
models yet.

Step 2 - Ship One Live Visible Win

Run ENTHEA as a Mac4 browser source in OBS/NDI and map one BodyTruth variable to
one visual parameter.

Suggested first mapping:

text
stillness -> ENTHEA calm/slow mode
burst     -> drop/wormhole effect
velocity  -> intensity

This gives immediate "the body affects the room" feedback without touching
Rekordbox.

Step 3 - Ship One Audio Win

Install Magenta RealTime 2 small on the strongest available Apple Silicon Mac.
Use MIDI steering only.

Suggested first mapping:

text
present=false       -> mute / idle drone
present_still       -> sparse pad
active              -> groove density up
burst               -> accent / chaos up briefly
gesture_candidate   -> filter/timbre change, not a DJ command

This proves body-to-music control locally before involving heavier CUDA music
diffusion.

Step 4 - Add DEMON Control-Curve Renderer

Do not start by running DEMON. Start by producing the artifact it wants:

text
derived/templates/gesture_templates.json
  -> derived/demon/request.json
  -> derived/demon/control_curves.json

First fields:

text
time_s
label
intensity
source_preservation
denoise
prompt_blend
guidance
channel_gain_hint

This lets K11 preserve body-to-audio intent now, while the actual DEMON runtime
waits for a CUDA/TensorRT-capable host.

2026-06-06 implementation checkpoint:

text
package_k11_sam3d_first_capture_bundles.py
  -> derived/templates/gesture_templates.json
  -> derived/enthea/bodytruth_control_map.json
  -> derived/mrt2/control_map.json
  -> derived/demon/request.json
  -> derived/demon/control_curves.json

verify_k11_mac4_output_artifacts.py
smoke_test_mac4_output_contract.py

This is still read-only output plumbing. It creates the Mac4/DEMON contracts
without running ENTHEA, MRT2, DEMON, or Rekordbox.

Step 5 - Upgrade Offline Reconstruction

Prototype MAMMA as `derived/mamma` using one captured K11 bundle. If MAMMA code
or model access blocks us, make the worker write:

text
status=unavailable
warnings=["mamma_backend_not_installed"]

That keeps the contract honest while preserving the lane.

Step 6 - Add Depth QA

Use MDA checkpoints, if practical, or a cheap approximate boundary-risk pass
first. The useful product feature is not "MDA is deployed"; it is:

text
gesture candidate suppressed because hand boundary depth is ambiguous

Step 7 - Add Synthetic Testing

Run Cosmos + Locate Anything on a rented GPU or DGX-style host to create
synthetic AirDeck edge-case scenes. Feed the manifests into the review UI as
fixtures, not training truth.

What This Unlocks

1. Better capture confidence: MDA/D4RT/MAMMA tell us when the body evidence is
strong, weak, occluded, or ambiguous.
2. Better training data: Cosmos creates synthetic negative controls; Kinesis
scores plausibility; MAMMA gives richer body/contact labels.
3. Better review: K11 operator console can compare SAM3D, MAMMA, D4RT, depth QA,
and generated preview side by side.
4. Better live show: ENTHEA and Magenta RT2 let the body steer visuals and music
immediately, while DEMON gives a heavier audition/render lane without
compromising command safety.
5. Better product story: the system becomes "capture real motion, reconstruct
meaning, review evidence, then perform with governed outputs."

Hard No Lines

- No external backend writes `send_keys=true`.
- No synthetic or generated video becomes training truth without explicit
provenance.
- No MusePose output is used as recognition evidence.
- No D4RT/MAMMA/MDA result bypasses K11 review.
- No Magenta RT2 or ENTHEA event triggers Rekordbox directly.
- No DEMON MCP, MIDI, or generated confidence triggers Rekordbox directly.
- No rented GPU is a live performance gate.

1. Cosmos + Locate Anything DGX

Source:

- https://github.com/joeynyc/cosmos-locateanything-dgx

Observed source facts:

- The repo generates a short video with `nvidia/Cosmos3-Nano`, then runs
`nvidia/LocateAnything-3B` over sampled frames for user-selected labels.
- It writes a `visual_ops_manifest.json` plus annotated/demo videos.
- It stages models so Cosmos and Locate Anything are not loaded together,
keeping peak memory lower for DGX Spark.
- Code is MIT, but NVIDIA model weights and model code use upstream terms.

Fit:

This is a synthetic rehearsal forge. It can generate fake-but-structured DJ
booth, dance floor, bar, camera, body, object, and occlusion scenes, then ground
visual labels with boxes. It should not be used as proof of real body tracking.
Its value is creating negative controls, camera-planning examples, and UI test
fixtures before real K11 recordings exist.

Bundle artifact target:

text
derived/cosmos_grounding/
  visual_ops_manifest.json
  generated_scene.mp4
  grounded_frames.jsonl
  label_review.md

Best first use:

Generate 20 synthetic AirDeck scenes with labels such as `left hand`, `right
hand`, `deck`, `mixer`, `person`, `phone`, `cup`, and `occluder`. Use these to
stress-test review UI, label vocabulary, and no-op controls. Keep every item
marked `synthetic=true`.

Do not:

  • Do not train live gesture promotion directly from generated videos.
  • Do not run on Mac5 unless the model footprint is proven practical.
  • Do not mix synthetic detections with real K11 evidence without provenance.

2. ENTHEA

Sources:

  • https://elder-plinius.github.io/ENTHEA/
  • https://github.com/elder-plinius/ENTHEA

Observed source facts:

- ENTHEA is a single-file WebGL2 plus Web Audio visual synthesizer.
- It exposes live music visualizer controls, mic/browser-tab audio, tempo
locking, drop detection, scene snapshots, MIDI learn, flicker, symmetry,
raymarching, trails, OKLab palette controls, and fullscreen capture.
- The GitHub README positions it for VJs and live visualists.

Fit:

This belongs in the Mac4 visual layer, either as a browser/OBS source or as a
shader vocabulary reference for Unity/TouchDesigner. It is immediately useful
because it already thinks like a performance instrument: audio-reactive, MIDI
learnable, scene-based, and capture-friendly.

Integration options:

1. Run ENTHEA as an isolated browser source on Mac4, then feed it to OBS/NDI.
2. Drive its controls from `LumeBodyTruth` through MIDI/OSC/WebSocket.
3. Borrow concepts for Unity DYK shaders: symmetry order, trails, flicker,
raymarch depth, drop effects, scene snapshots.

Bundle artifact target:

text
derived/enthea/
  enthea_scene_snapshot.json
  bodytruth_control_map.json
  render_capture.mp4

Boundary:

ENTHEA can respond to music/body truth for visuals. It must not infer DJ intent
or dispatch Rekordbox actions. License must be checked before embedding code
directly into commercial or closed-source surfaces; safest first move is
isolated browser-source use.

3. D4RT

Source:

- https://d4rt-paper.github.io/

Observed source facts:

- D4RT is a CVPR 2026 Best Paper from Google DeepMind, UCL, and Oxford.
- The project page describes a feedforward model that jointly infers depth,
spatio-temporal correspondence, and camera parameters from a single video.
- It supports sparse 3D tracks, depth-projected reconstruction, and all-pixel
tracking in world coordinates.

Fit:

D4RT is a possible next-generation scene reconstruction layer above the current
SAM3DBody path. SAM3D gives body-centric frame evidence. D4RT would give scene
and camera truth over time: camera motion, pixel tracks, depth, and dynamic
scene reconstruction. That matters for dance footage because not every useful
signal is a skeleton. Hands, deck surfaces, cables, occluders, camera motion,
and stage geometry all affect trust.

Bundle artifact target:

text
derived/d4rt/
  camera_poses.json
  tracks_3d.jsonl
  pointcloud_world.ply
  dynamic_scene_summary.json
  source_video_ref.json

Best first use:

Treat D4RT as a research watch and schema design target. When code or a hosted
runner becomes practical, test it on one K11 first-capture bundle and compare:

  • Are deck and mixer surfaces stable?
  • Are hands tracked through partial occlusion?
  • Does camera motion corrupt AirDeck zone estimates?
  • Can it generate better Unity replay cameras?

Do not:

- Do not replace SAM3D until a local or rented-GPU runner produces repeatable
artifacts for the existing `lume.rehearsal_bundle.v1` contract.
- Do not route D4RT output into live AirDeck commands.

4. MDA Depth Ambiguity

Source:

- https://arxiv.org/abs/2606.02552

Observed source facts:

- The paper, submitted 2026-06-01, targets flying-point artifacts near object
boundaries in depth estimation.
- The proposed MDA representation predicts multiple depth hypotheses and
probabilities per pixel instead of a single depth.
- The abstract says this improves boundary reconstruction, handles severe blur,
and extends to transparent objects and sky/finite-depth separation.

Fit:

This is not a standalone LUME module. It is a depth quality principle that
belongs inside `LumeBodyTruth`, Mac4 depth/matte cleanup, and K11 camera safety.
The system already worries about false body truth, shimmer, and false gestures.
Flying points near body boundaries are exactly the kind of artifact that can
make a visualizer ugly and a command gate unsafe.

Bundle artifact target:

text
derived/depth_quality/
  boundary_confidence.png
  depth_hypotheses.exr
  flying_point_mask.png
  depth_quality_report.json

Best first use:

Before implementing MDA itself, add a cheap boundary-risk score to depth-derived
body truth:

  • mark hand/body boundaries as lower confidence;
  • suppress gesture candidates when depth confidence is ambiguous;
  • expose `sources.depth.boundary_confidence` in `LumeBodyTruth`;
  • use the mask to clean Unity silhouettes and particle emission edges.

Do not:

  • Do not claim MDA is deployed unless a model or adapter exists locally.
  • Do not let prettier depth override K11's command debounce and safety gate.

5. MusePose

Source:

- https://github.com/TMElyralab/MusePose

Observed source facts:

- MusePose is a diffusion-based, pose-guided virtual-human video generation
framework.
- The README says it can generate dance videos of a reference character under
a given pose sequence and includes a pose-align algorithm.
- It requires a CUDA-oriented Python environment with OpenCV, diffusers, mmcv,
mmdet, and mmpose.
- The README lists limitations around detail consistency and flicker/noise in
complex backgrounds.

Fit:

MusePose is a content and preview lane, not a tracking lane. Once K11/Mac5
produce clean pose sequences, MusePose can turn them into stylized avatar
renders for marketing, rehearsal review, album visuals, or "what this gesture
looks like" previews in the AirDeck training library.

Bundle artifact target:

text
derived/musepose/
  pose_sequence_ref.json
  reference_image_ref.json
  render_manifest.json
  generated_avatar_take.mp4

Best first use:

After the five K11 `left_hand_raise` takes are captured and reconstructed,
export a normalized pose sequence and run one offline MusePose render on a
rented GPU. Use it only as a visual companion to the real evidence, never as the
evidence.

Do not:

  • Do not run this on Mac5 unless CUDA requirements are replaced.
  • Do not feed MusePose video into live gesture proof.
  • Do not let avatar consistency problems contaminate gesture labels.

6. Magenta RealTime 2

Source:

- https://magenta.withgoogle.com/mrt2

Observed source facts:

- Magenta RealTime 2 is a local live music model for Apple Silicon.
- It ships standalone apps and AU plugins.
- Features include MIDI steering, text-to-synth, audio cloning, prompt mixing,
sound design, and modulation/gesture control.
- The page says low-latency latent steering can use an LFO, MIDI controller, or
camera.
- The Base model requires M3 Pro or M2 Max or higher; the Small model runs on
any Apple Silicon MacBook.

Fit:

This is the cleanest live audio expansion. It should be a generative
accompaniment instrument driven by body truth, not a replacement for Rekordbox.
It can sit on Mac4 or another Apple Silicon host and respond to K11/MotionMix
gesture states by steering latent music, texture, and timbre.

Integration options:

1. AU plugin in a DAW on Mac4.
2. Standalone MRT2 app controlled through MIDI from MotionMix/K11.
3. BodyTruth-to-MIDI bridge mapping `stillness`, `burst`, `velocity`, and
`gesture_candidate` to MRT2 macro controls.

Bundle artifact target:

text
derived/mrt2/
  control_map.json
  midi_automation.jsonl
  audio_take.wav
  prompt_mix_snapshot.json

Best first use:

Build a read-only BodyTruth-to-MIDI bridge for MRT2:

  • `stillness` -> texture density down;
  • `burst` -> chaos/energy up;
  • `velocity` -> modulation depth;
  • `gesture_confirmed` -> musical accent, not Rekordbox key.

Do not:

  • Do not let MRT2 control AirDeck state.
  • Do not use generated accompaniment as proof that a gesture was recognized.
  • Do not route MRT2 latency into the safety-critical DJ path.

7. DEMON

Sources:

  • `[home]/Desktop/lume-commerce/viz/lume-pcloud/Docs/LUME_DEMON_ARCHITECTURE_COMPARISON_2026-05-28.md`
  • `[home]/Desktop/lume-commerce/viz/lume-pcloud/Docs/LUME_REHEARSAL_BUNDLE_HUB_IMPLEMENTATION_2026-05-28.md`

Observed local facts:

- DEMON is not a pose system, sensor-fusion system, or DJ command system.
- DEMON is a controllable music diffusion runtime built around ACE-Step v1.5
and a StreamDiffusion-style audio ring buffer.
- It consumes source audio, prompts, references, LoRAs, and time-varying control
curves, then emits generated or transformed audio.
- The intended runtime is Python/CUDA/TensorRT. Current K11/Mac4/Mac5 hardware
should emit manifests now and run DEMON later on CUDA/cloud.
- Existing bundle work already treats `derived/demon/request.json` as
`schema: lume.demon_request.v1` with `status: research_manifest_only`.

Fit:

DEMON is the heavier cousin of the Magenta RT2 lane. Magenta RT2 is the local
Apple Silicon instrument for immediate live play. DEMON is the offline or
cloud-rendered music-diffusion lane for turning reviewed motion templates into
audio candidates. It should consume LUME's motion lexicon and template windows,
not raw cameras.

Correct integration:

text
K11 rehearsal bundle
  -> derived/nko/motion_lexicon.json
  -> derived/templates/gesture_templates.json
  -> derived/demon/request.json
  -> derived/demon/control_curves.json
  -> DEMON runtime on CUDA/cloud
  -> generated/transformed audio candidates
  -> audition / compare / archive

Bundle artifact target:

text
derived/demon/
  summary.json
  request.json
  control_curves.json
  render_manifest.json
  generated_audio_candidates/

Best first use:

Build the control-curve renderer before trying to run the model:

  • `wave_color` -> prompt-blend or source-preservation curve;
  • `burst_high_energy` -> denoise/guidance/transformation envelope;
  • `weighted_slow_power_hold` -> source-preservation hold;
  • `airdeck_platter_spin` -> audio LFO candidate, not a Rekordbox scratch.

Do not:

  • Do not route camera, mocopi, or DEMON MCP output directly into Rekordbox.
  • Do not let DEMON-generated confidence write back into `LumeBodyTruth`.
  • Do not treat generated audio as proof that a gesture was recognized.
  • Do not run DEMON on Mac4/Mac5 until the CUDA/runtime requirement changes.

8. Kinesis

Source:

- https://github.com/amathislab/Kinesis

Observed source facts:

- Kinesis is an ICRA 2026 motion-imitation framework for physiologically
plausible musculoskeletal motor control.
- The README says Kinesis 2.0 supports musculoskeletal embodiments up to 290
muscles, fatigue, and downstream tasks such as football penalty kicks.
- It is trained on 1.8 hours of locomotion data and supports locomotion,
text-to-motion control, and high-level control examples.
- Full-body with arms is listed as coming soon.

Fit:

Kinesis is a long-term motor prior. LUME does not only need to know whether a
motion happened; it needs to know whether a movement is physically plausible,
fatiguing, expressive, repeatable, or likely to be noise. Kinesis can become an
offline evaluator that scores reconstructed motion windows before they become
training material.

Bundle artifact target:

text
derived/kinesis/
  motor_plausibility.json
  fatigue_score.json
  imitation_error.jsonl
  task_control_summary.json

Best first use:

Do not start with deck-hand gestures. Start with locomotion and body-energy
windows from Mac4 long takes:

  • still recovery baseline;
  • burst high energy;
  • torso lean / weight shift;
  • walking or circling.

Use Kinesis as a scoring layer for motion quality, not as an AirDeck classifier.

Do not:

- Do not claim it handles LUME full-body DJ gestures until the arm/full-body
path is actually available and verified.
- Do not run muscle simulation in live performance.

9. MAMMA

Source:

- https://arxiv.org/abs/2506.13040

Observed source facts:

- MAMMA is a markerless multi-person motion-capture pipeline.
- The abstract says it recovers SMPL-X parameters from multi-view video of
two-person interactions.
- It predicts dense 2D contact-aware surface landmarks conditioned on
segmentation masks, with person-specific correspondences under occlusion.
- The paper constructs a synthetic multi-view dataset with extreme poses, hand
motions, close interactions, SMPL-X ground truth, and dense 2D landmarks.

Fit:

MAMMA is the strongest direct fit for the current Mac5 reconstruction lane. The
existing SAM3DBody path is verified, but it is body/frame evidence. MAMMA points
toward full SMPL-X body parameters, multi-person handling, contact, occlusion,
hands, and close interaction. That is closer to real dance, club, and
performance scenes than single-person sparse skeleton output.

Bundle artifact target:

text
derived/mamma/
  summary.json
  smplx_params.npz
  dense_landmarks_2d.jsonl
  person_tracks.jsonl
  contact_events.jsonl
  reconstruction_review_frames/

Best first use:

Define MAMMA as a second reconstruction backend beside SAM3D:

text
derived/sam3d/...
derived/mamma/...

The worker contract should mirror the existing Mac5 SAM3D contract:

  • never fabricate evidence;
  • explicit placeholder status when unavailable;
  • write review frames and logs;
  • return artifacts to K11;
  • map to AirDeck only through human review and dry-run gates.

Do not:

- Do not document Mac5 as running MAMMA until code/model/data access and a
repeatable local or rented-GPU run are proven.
- Do not collapse MAMMA into live control. It is an offline truth/backfill lane.

Recommended Build Order

Wave 1 - Immediate, low-risk integration

1. Add `external_research_fit` entries to the rehearsal bundle schema.
2. Add depth-boundary confidence fields to `LumeBodyTruth` as a cheap MDA-style
safety layer.
3. Run ENTHEA as an isolated Mac4 browser/OBS visual source and drive it from
audio only.
4. Prototype Magenta RT2 with a read-only BodyTruth-to-MIDI bridge.
5. Add a DEMON control-curve renderer that writes manifest-only artifacts.

Wave 2 - Offline artifact expansion

1. Add optional `derived/mamma`, `derived/d4rt`, `derived/musepose`,
`derived/kinesis`, `derived/cosmos_grounding`, `derived/mrt2`, and
`derived/demon` directories to the bundle contract.
2. Extend the K11 review workbench so each candidate can show multi-backend
evidence: SAM3D, MAMMA, D4RT, depth QA, and visual preview.
3. Add provenance flags:

json
{
  "source_kind": "real_capture | synthetic | generated_preview | reconstructed",
  "live_control_eligible": false,
  "requires_human_review": true
}

Wave 3 - Heavy backend trials

1. Run MAMMA, if available, on one K11 first-capture bundle.
2. Run D4RT, if available, on one Mac4 long take.
3. Run MusePose on one approved reconstructed pose sequence.
4. Run Cosmos + Locate Anything on synthetic AirDeck negative-control scenes.
5. Run Kinesis scoring on locomotion/body-energy windows.
6. Run DEMON on a CUDA/cloud host using one reviewed bundle request.

Decision Rules

- Real capture beats synthetic generation.
- K11 safety gates beat every model.
- Offline reconstruction can enrich review but cannot command.
- Generated avatar/video output is content, not evidence.
- Depth ambiguity should lower confidence, not invent certainty.
- Multi-person and occlusion-aware reconstruction is strategic because club
footage will rarely be clean single-person lab footage.
- Apple Silicon local audio generation is useful because it can run beside the
performance, but it must stay outside the Rekordbox command path.
- CUDA/cloud music diffusion is useful for rendered candidates and offline show
assets, but it must not become a live performance gate.

Primary Sources

  • Cosmos + Locate Anything DGX: https://github.com/joeynyc/cosmos-locateanything-dgx
  • ENTHEA live app: https://elder-plinius.github.io/ENTHEA/
  • ENTHEA source: https://github.com/elder-plinius/ENTHEA
  • D4RT project: https://d4rt-paper.github.io/
  • MDA depth ambiguity: https://arxiv.org/abs/2606.02552
  • MusePose: https://github.com/TMElyralab/MusePose
  • Magenta RealTime 2: https://magenta.withgoogle.com/mrt2
  • DEMON local comparison: [home]/Desktop/lume-commerce/viz/lume-pcloud/Docs/LUME_DEMON_ARCHITECTURE_COMPARISON_2026-05-28.md
  • DEMON bundle handoff: [home]/Desktop/lume-commerce/viz/lume-pcloud/Docs/LUME_REHEARSAL_BUNDLE_HUB_IMPLEMENTATION_2026-05-28.md
  • Kinesis: https://github.com/amathislab/Kinesis
  • MAMMA: https://arxiv.org/abs/2506.13040

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

computational-choreography/09-reference/external-research.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture