External Research Fit Registry
This file maps external papers and projects into the current computational choreography stack. It is research guidance, not an implementation claim.
Full Public Reader
External Research Fit Registry
Date: 2026-06-06
This file maps external papers and projects into the current computational
choreography stack. It is research guidance, not an implementation claim.
Current system invariant:
Mac4 -> long-take capture and Unity/live visuals
K11 -> durable rehearsal bundles, Pose Coach, AirDeck, Rekordbox safety gate
Mac5 -> offline reconstruction and heavy body analysis
Mac1 -> transfer/orchestration where trust between machines is neededK11 remains the only live Rekordbox command gate. Mac4 Unity, Mac5
reconstruction, rented GPUs, generated video, and raw sensor streams must not
send DJ commands directly.
One-Line Fit
| Source | Best role | Runtime lane | Integration priority |
|---|---|---|---|
| Cosmos + Locate Anything DGX | Synthetic scene and object-grounding forge | Offline or rented GPU | Medium |
| ENTHEA | Browser/WebGL visual instrument for Mac4/OBS/NDI | Live visuals only | High |
| D4RT | 4D reconstruction and all-pixel tracking from video | Offline reconstruction | High research, medium implementation |
| MDA depth ambiguity paper | Boundary-safe depth confidence and flying-point cleanup | Capture/reconstruction QA | High |
| MusePose | Pose-to-avatar/video content generator | Offline content/render | Medium |
| Magenta RealTime 2 | Local Apple Silicon generative music instrument | Live audio, not DJ safety | High |
| DEMON | Controllable music diffusion from LUME control curves | CUDA/cloud audio render, audition before live | Medium-high |
| Kinesis | Physiological motion prior and fatigue/plausibility scorer | Offline training/evaluation | Long-term |
| MAMMA | Multi-person SMPL-X markerless mocap | Offline reconstruction | Highest strategic reconstruction target |
System Placement
synthetic rehearsal
Cosmos + Locate Anything
-> synthetic scene/label manifests
real capture
Mac4 cameras + K11 Pose Coach + MotionMix/iPhones
-> lume.rehearsal_bundle.v1
depth and scene truth
MDA-style depth boundary confidence
D4RT 4D scene reconstruction
MAMMA SMPL-X multi-person reconstruction
-> derived reconstruction artifacts
body/motor meaning
Kinesis physiological prior
-> plausibility, fatigue, motor-intent scores
review and promotion
K11 SAM3D/AirDeck review workbench
-> approved non-live artifacts
-> dry-run manifest
-> later live manifest only after proof
live output
Mac4 Unity / ENTHEA / TouchDesigner visuals
Magenta RealTime 2 accompaniment
DEMON generated/transformed audio candidates
K11 Rekordbox bridgeConcrete Tour Through Our System
Think of the system as four buses, not one giant app.
Bus A - Real-Time Body Truth
Current authority:
MotionMix / K11 / Mac4 sensors
-> LumeBodyTruth
-> Mac4 Unity / visuals
-> K11 AirDeck as extra safety contextRelevant local docs:
[home]/Desktop/MotionMix/LUME-BODY-TRUTH-CONTRACT.md
[home]/Desktop/computational-choreography/04-generative-output/lume-visuals.md
[home]/Desktop/computational-choreography/04-generative-output/motion-lexicon.mdResearch that belongs here:
- MDA as depth-confidence logic.
- ENTHEA as a visual consumer.
- Magenta RealTime 2 as an audio/MIDI consumer.
- DEMON only as a later consumer of approved/template-derived control curves.
Rule:
This bus can react live. It cannot promote new DJ gestures by itself.
Bus B - Rehearsal Bundle Evidence
Current authority:
K11 C:\lume\dance-sessions\_processed\<bundle_id>
-> bundle.json
-> derived/sam3d/request.json
-> derived/<research_backend>/...Relevant local tools:
[home]/Desktop/lume-commerce/viz/lume-pcloud/tools/lume-mac5-reconstruction/package_k11_sam3d_first_capture_bundles.py
[home]/Desktop/lume-commerce/viz/lume-pcloud/tools/lume-mac5-reconstruction/orchestrate_k11_to_mac5.py
[home]/Desktop/lume-commerce/viz/lume-pcloud/tools/lume-mac5-reconstruction/mac5_k11_bundle_worker.pyResearch that belongs here:
- SAM3D now.
- MAMMA next.
- D4RT next.
- MusePose as preview output.
- Cosmos + Locate Anything as synthetic fixture output.
- Kinesis as motion-quality scoring.
- DEMON request/control-curve manifests for generated audio candidates.
Rule:
This bus writes evidence and review artifacts. It never sends live controls.
Bus C - Review And Promotion
Current authority:
derived reconstruction artifacts
-> sam3d-review-index.json
-> sam3d-gesture-candidate-library.json
-> sam3d-approved-gesture-library.json
-> sam3d-dry-run-promotion-manifest.json
-> later live manifest only after proofRelevant local tools:
[home]/Desktop/lume-commerce/viz/lume-pcloud/tools/lume-mac5-reconstruction/build_k11_sam3d_review_index.py
[home]/Desktop/lume-commerce/viz/lume-pcloud/tools/lume-mac5-reconstruction/verify_k11_sam3d_airdeck_artifacts.py
[home]/Desktop/lume-commerce/viz/lume-pcloud/tools/lume-mac5-reconstruction/run_k11_sam3d_approval_intake.pyResearch that belongs here:
- MAMMA evidence thumbnails and SMPL-X quality scores.
- D4RT point-cloud/camera-track overlays.
- MDA depth QA badges.
- Kinesis plausibility/fatigue badges.
- MusePose generated preview sidecar.
- Cosmos synthetic fixture provenance warnings.
Rule:
Research output can raise or lower confidence. Human approval and K11 dry-run
gates still decide promotion.
Bus D - Performance Output
Current authority:
K11 Rekordbox bridge
Mac4 Unity / TouchDesigner / OBS / NDI
Magenta RT2 / audio instruments
DEMON rendered audio candidates
ENTHEA browser visualsResearch that belongs here:
- ENTHEA for browser visual synthesis.
- Magenta RealTime 2 for body-steered accompaniment.
- DEMON for auditioned generated/transformed music, usually from CUDA/cloud.
- MusePose renders for pre-made visuals, not proof.
Rule:
Output surfaces can be expressive. They are not evidence unless they point back
to a real capture bundle.
Backend Attachment Plan
Every research backend should look like a sibling of `derived/sam3d`, not a
replacement for it.
_processed/<bundle_id>/
bundle.json
raw/
derived/
sam3d/
summary.json
sam3d_frames.jsonl
motion_windows.jsonl
template_candidates.json
mamma/
summary.json
smplx_params.npz
person_tracks.jsonl
contact_events.jsonl
review_frames/
d4rt/
summary.json
camera_poses.json
tracks_3d.jsonl
pointcloud_world.ply
depth_quality/
summary.json
boundary_confidence.png
flying_point_mask.png
kinesis/
summary.json
motor_plausibility.json
fatigue_score.json
musepose/
summary.json
generated_avatar_take.mp4
demon/
summary.json
request.json
control_curves.json
render_manifest.json
generated_audio_candidates/
cosmos_grounding/
summary.json
visual_ops_manifest.json
synthetic_scene.mp4Minimum required fields for every new backend:
{
"schema": "lume.<backend>_summary.v1",
"bundle_id": "<id>",
"source_kind": "real_capture | synthetic | generated_preview | reconstructed",
"status": "complete | unavailable | failed_explicit | placeholder_explicit",
"live_control_eligible": false,
"requires_human_review": true,
"created_at": "ISO-8601",
"tool": "<backend name>",
"inputs": [],
"outputs": [],
"warnings": []
}This lets the existing verifier pattern expand without trusting a model just
because a file exists.
Research-By-Function Map
If the question is "Where is the person?"
Use:
- MAMMA for SMPL-X body, person identity, contact, occlusion.
- D4RT for 4D point tracks and camera/world geometry.
- K11 Pose Coach for local AirDeck zone framing.
Output:
derived/mamma/person_tracks.jsonl
derived/d4rt/tracks_3d.jsonl
derived/sam3d/motion_windows.jsonlSystem use:
Show stronger evidence in the K11 review console. Do not command Rekordbox.
If the question is "Can I trust this depth/body edge?"
Use:
- MDA-style multi-hypothesis depth confidence.
- Existing depth/RGB/mocopi freshness in `LumeBodyTruth`.
Output:
derived/depth_quality/boundary_confidence.png
LumeBodyTruth.sources.depth.boundary_confidenceSystem use:
Lower visual intensity or suppress gesture candidates when depth ambiguity is
high.
If the question is "Can we make visuals right now?"
Use:
- ENTHEA for browser/WebGL visuals.
- Mac4 Unity / TouchDesigner for existing show visuals.
Output:
ENTHEA browser source -> OBS/NDI
Unity LUMM :9702 -> bar displaySystem use:
Make the bar look alive immediately. This is the fastest visible win.
If the question is "Can the body play music?"
Use:
- Magenta RealTime 2 small/base on Apple Silicon.
- BodyTruth-to-MIDI/OSC bridge.
Output:
LumeBodyTruth -> MIDI CC / notes -> MRT2
derived/mrt2/midi_automation.jsonlSystem use:
Body motion steers accompaniment or texture. Rekordbox remains separate.
If the question is "Can the body steer heavier music diffusion?"
Use:
- DEMON from LUME templates/control curves.
Output:
derived/demon/request.json
derived/demon/control_curves.json
derived/demon/generated_audio_candidates/System use:
Generate or transform audio candidates from reviewed motion labels and
time-varying control curves. DEMON belongs after bundle/template extraction,
not before BodyTruth. Its output is auditioned, compared, archived, or promoted
as content. It does not command Rekordbox.
If the question is "Can we generate avatar/video assets?"
Use:
- MusePose from a verified pose sequence.
Output:
derived/musepose/generated_avatar_take.mp4System use:
Marketing clips, training previews, visual companion material. Not evidence.
If the question is "Can we rehearse edge cases before recording?"
Use:
- Cosmos + Locate Anything.
Output:
derived/cosmos_grounding/visual_ops_manifest.jsonSystem use:
Generate synthetic UI fixtures and negative controls: occluded hand, wrong hand,
phone in hand, cup near deck, no person, two people crossing, false deck label.
Everything stays `source_kind=synthetic`.
If the question is "Is this motion physically meaningful?"
Use:
- Kinesis after reconstruction.
Output:
derived/kinesis/motor_plausibility.json
derived/kinesis/fatigue_score.jsonSystem use:
Score motion quality, fatigue, locomotion, and body-energy windows. Later, this
can help choose which gestures feel sustainable enough for performance.
What To Build First
The fastest path is not to install every model. Build the contract first, then
plug models in one by one.
Step 1 - Add Research Backend Slots
Patch the bundle/review docs and eventually the verifier to recognize optional:
derived/mamma
derived/d4rt
derived/depth_quality
derived/kinesis
derived/musepose
derived/cosmos_grounding
derived/mrt2
derived/enthea
derived/demonThis is mostly schema and UI plumbing. It should not require running the heavy
models yet.
Step 2 - Ship One Live Visible Win
Run ENTHEA as a Mac4 browser source in OBS/NDI and map one BodyTruth variable to
one visual parameter.
Suggested first mapping:
stillness -> ENTHEA calm/slow mode
burst -> drop/wormhole effect
velocity -> intensityThis gives immediate "the body affects the room" feedback without touching
Rekordbox.
Step 3 - Ship One Audio Win
Install Magenta RealTime 2 small on the strongest available Apple Silicon Mac.
Use MIDI steering only.
Suggested first mapping:
present=false -> mute / idle drone
present_still -> sparse pad
active -> groove density up
burst -> accent / chaos up briefly
gesture_candidate -> filter/timbre change, not a DJ commandThis proves body-to-music control locally before involving heavier CUDA music
diffusion.
Step 4 - Add DEMON Control-Curve Renderer
Do not start by running DEMON. Start by producing the artifact it wants:
derived/templates/gesture_templates.json
-> derived/demon/request.json
-> derived/demon/control_curves.jsonFirst fields:
time_s
label
intensity
source_preservation
denoise
prompt_blend
guidance
channel_gain_hintThis lets K11 preserve body-to-audio intent now, while the actual DEMON runtime
waits for a CUDA/TensorRT-capable host.
2026-06-06 implementation checkpoint:
package_k11_sam3d_first_capture_bundles.py
-> derived/templates/gesture_templates.json
-> derived/enthea/bodytruth_control_map.json
-> derived/mrt2/control_map.json
-> derived/demon/request.json
-> derived/demon/control_curves.json
verify_k11_mac4_output_artifacts.py
smoke_test_mac4_output_contract.pyThis is still read-only output plumbing. It creates the Mac4/DEMON contracts
without running ENTHEA, MRT2, DEMON, or Rekordbox.
Step 5 - Upgrade Offline Reconstruction
Prototype MAMMA as `derived/mamma` using one captured K11 bundle. If MAMMA code
or model access blocks us, make the worker write:
status=unavailable
warnings=["mamma_backend_not_installed"]That keeps the contract honest while preserving the lane.
Step 6 - Add Depth QA
Use MDA checkpoints, if practical, or a cheap approximate boundary-risk pass
first. The useful product feature is not "MDA is deployed"; it is:
gesture candidate suppressed because hand boundary depth is ambiguousStep 7 - Add Synthetic Testing
Run Cosmos + Locate Anything on a rented GPU or DGX-style host to create
synthetic AirDeck edge-case scenes. Feed the manifests into the review UI as
fixtures, not training truth.
What This Unlocks
1. Better capture confidence: MDA/D4RT/MAMMA tell us when the body evidence is
strong, weak, occluded, or ambiguous.
2. Better training data: Cosmos creates synthetic negative controls; Kinesis
scores plausibility; MAMMA gives richer body/contact labels.
3. Better review: K11 operator console can compare SAM3D, MAMMA, D4RT, depth QA,
and generated preview side by side.
4. Better live show: ENTHEA and Magenta RT2 let the body steer visuals and music
immediately, while DEMON gives a heavier audition/render lane without
compromising command safety.
5. Better product story: the system becomes "capture real motion, reconstruct
meaning, review evidence, then perform with governed outputs."
Hard No Lines
- No external backend writes `send_keys=true`.
- No synthetic or generated video becomes training truth without explicit
provenance.
- No MusePose output is used as recognition evidence.
- No D4RT/MAMMA/MDA result bypasses K11 review.
- No Magenta RT2 or ENTHEA event triggers Rekordbox directly.
- No DEMON MCP, MIDI, or generated confidence triggers Rekordbox directly.
- No rented GPU is a live performance gate.
1. Cosmos + Locate Anything DGX
Source:
- https://github.com/joeynyc/cosmos-locateanything-dgx
Observed source facts:
- The repo generates a short video with `nvidia/Cosmos3-Nano`, then runs
`nvidia/LocateAnything-3B` over sampled frames for user-selected labels.
- It writes a `visual_ops_manifest.json` plus annotated/demo videos.
- It stages models so Cosmos and Locate Anything are not loaded together,
keeping peak memory lower for DGX Spark.
- Code is MIT, but NVIDIA model weights and model code use upstream terms.
Fit:
This is a synthetic rehearsal forge. It can generate fake-but-structured DJ
booth, dance floor, bar, camera, body, object, and occlusion scenes, then ground
visual labels with boxes. It should not be used as proof of real body tracking.
Its value is creating negative controls, camera-planning examples, and UI test
fixtures before real K11 recordings exist.
Bundle artifact target:
derived/cosmos_grounding/
visual_ops_manifest.json
generated_scene.mp4
grounded_frames.jsonl
label_review.mdBest first use:
Generate 20 synthetic AirDeck scenes with labels such as `left hand`, `right
hand`, `deck`, `mixer`, `person`, `phone`, `cup`, and `occluder`. Use these to
stress-test review UI, label vocabulary, and no-op controls. Keep every item
marked `synthetic=true`.
Do not:
- Do not train live gesture promotion directly from generated videos.
- Do not run on Mac5 unless the model footprint is proven practical.
- Do not mix synthetic detections with real K11 evidence without provenance.
2. ENTHEA
Sources:
- https://elder-plinius.github.io/ENTHEA/
- https://github.com/elder-plinius/ENTHEA
Observed source facts:
- ENTHEA is a single-file WebGL2 plus Web Audio visual synthesizer.
- It exposes live music visualizer controls, mic/browser-tab audio, tempo
locking, drop detection, scene snapshots, MIDI learn, flicker, symmetry,
raymarching, trails, OKLab palette controls, and fullscreen capture.
- The GitHub README positions it for VJs and live visualists.
Fit:
This belongs in the Mac4 visual layer, either as a browser/OBS source or as a
shader vocabulary reference for Unity/TouchDesigner. It is immediately useful
because it already thinks like a performance instrument: audio-reactive, MIDI
learnable, scene-based, and capture-friendly.
Integration options:
1. Run ENTHEA as an isolated browser source on Mac4, then feed it to OBS/NDI.
2. Drive its controls from `LumeBodyTruth` through MIDI/OSC/WebSocket.
3. Borrow concepts for Unity DYK shaders: symmetry order, trails, flicker,
raymarch depth, drop effects, scene snapshots.
Bundle artifact target:
derived/enthea/
enthea_scene_snapshot.json
bodytruth_control_map.json
render_capture.mp4Boundary:
ENTHEA can respond to music/body truth for visuals. It must not infer DJ intent
or dispatch Rekordbox actions. License must be checked before embedding code
directly into commercial or closed-source surfaces; safest first move is
isolated browser-source use.
3. D4RT
Source:
- https://d4rt-paper.github.io/
Observed source facts:
- D4RT is a CVPR 2026 Best Paper from Google DeepMind, UCL, and Oxford.
- The project page describes a feedforward model that jointly infers depth,
spatio-temporal correspondence, and camera parameters from a single video.
- It supports sparse 3D tracks, depth-projected reconstruction, and all-pixel
tracking in world coordinates.
Fit:
D4RT is a possible next-generation scene reconstruction layer above the current
SAM3DBody path. SAM3D gives body-centric frame evidence. D4RT would give scene
and camera truth over time: camera motion, pixel tracks, depth, and dynamic
scene reconstruction. That matters for dance footage because not every useful
signal is a skeleton. Hands, deck surfaces, cables, occluders, camera motion,
and stage geometry all affect trust.
Bundle artifact target:
derived/d4rt/
camera_poses.json
tracks_3d.jsonl
pointcloud_world.ply
dynamic_scene_summary.json
source_video_ref.jsonBest first use:
Treat D4RT as a research watch and schema design target. When code or a hosted
runner becomes practical, test it on one K11 first-capture bundle and compare:
- Are deck and mixer surfaces stable?
- Are hands tracked through partial occlusion?
- Does camera motion corrupt AirDeck zone estimates?
- Can it generate better Unity replay cameras?
Do not:
- Do not replace SAM3D until a local or rented-GPU runner produces repeatable
artifacts for the existing `lume.rehearsal_bundle.v1` contract.
- Do not route D4RT output into live AirDeck commands.
4. MDA Depth Ambiguity
Source:
- https://arxiv.org/abs/2606.02552
Observed source facts:
- The paper, submitted 2026-06-01, targets flying-point artifacts near object
boundaries in depth estimation.
- The proposed MDA representation predicts multiple depth hypotheses and
probabilities per pixel instead of a single depth.
- The abstract says this improves boundary reconstruction, handles severe blur,
and extends to transparent objects and sky/finite-depth separation.
Fit:
This is not a standalone LUME module. It is a depth quality principle that
belongs inside `LumeBodyTruth`, Mac4 depth/matte cleanup, and K11 camera safety.
The system already worries about false body truth, shimmer, and false gestures.
Flying points near body boundaries are exactly the kind of artifact that can
make a visualizer ugly and a command gate unsafe.
Bundle artifact target:
derived/depth_quality/
boundary_confidence.png
depth_hypotheses.exr
flying_point_mask.png
depth_quality_report.jsonBest first use:
Before implementing MDA itself, add a cheap boundary-risk score to depth-derived
body truth:
- mark hand/body boundaries as lower confidence;
- suppress gesture candidates when depth confidence is ambiguous;
- expose `sources.depth.boundary_confidence` in `LumeBodyTruth`;
- use the mask to clean Unity silhouettes and particle emission edges.
Do not:
- Do not claim MDA is deployed unless a model or adapter exists locally.
- Do not let prettier depth override K11's command debounce and safety gate.
5. MusePose
Source:
- https://github.com/TMElyralab/MusePose
Observed source facts:
- MusePose is a diffusion-based, pose-guided virtual-human video generation
framework.
- The README says it can generate dance videos of a reference character under
a given pose sequence and includes a pose-align algorithm.
- It requires a CUDA-oriented Python environment with OpenCV, diffusers, mmcv,
mmdet, and mmpose.
- The README lists limitations around detail consistency and flicker/noise in
complex backgrounds.
Fit:
MusePose is a content and preview lane, not a tracking lane. Once K11/Mac5
produce clean pose sequences, MusePose can turn them into stylized avatar
renders for marketing, rehearsal review, album visuals, or "what this gesture
looks like" previews in the AirDeck training library.
Bundle artifact target:
derived/musepose/
pose_sequence_ref.json
reference_image_ref.json
render_manifest.json
generated_avatar_take.mp4Best first use:
After the five K11 `left_hand_raise` takes are captured and reconstructed,
export a normalized pose sequence and run one offline MusePose render on a
rented GPU. Use it only as a visual companion to the real evidence, never as the
evidence.
Do not:
- Do not run this on Mac5 unless CUDA requirements are replaced.
- Do not feed MusePose video into live gesture proof.
- Do not let avatar consistency problems contaminate gesture labels.
6. Magenta RealTime 2
Source:
- https://magenta.withgoogle.com/mrt2
Observed source facts:
- Magenta RealTime 2 is a local live music model for Apple Silicon.
- It ships standalone apps and AU plugins.
- Features include MIDI steering, text-to-synth, audio cloning, prompt mixing,
sound design, and modulation/gesture control.
- The page says low-latency latent steering can use an LFO, MIDI controller, or
camera.
- The Base model requires M3 Pro or M2 Max or higher; the Small model runs on
any Apple Silicon MacBook.
Fit:
This is the cleanest live audio expansion. It should be a generative
accompaniment instrument driven by body truth, not a replacement for Rekordbox.
It can sit on Mac4 or another Apple Silicon host and respond to K11/MotionMix
gesture states by steering latent music, texture, and timbre.
Integration options:
1. AU plugin in a DAW on Mac4.
2. Standalone MRT2 app controlled through MIDI from MotionMix/K11.
3. BodyTruth-to-MIDI bridge mapping `stillness`, `burst`, `velocity`, and
`gesture_candidate` to MRT2 macro controls.
Bundle artifact target:
derived/mrt2/
control_map.json
midi_automation.jsonl
audio_take.wav
prompt_mix_snapshot.jsonBest first use:
Build a read-only BodyTruth-to-MIDI bridge for MRT2:
- `stillness` -> texture density down;
- `burst` -> chaos/energy up;
- `velocity` -> modulation depth;
- `gesture_confirmed` -> musical accent, not Rekordbox key.
Do not:
- Do not let MRT2 control AirDeck state.
- Do not use generated accompaniment as proof that a gesture was recognized.
- Do not route MRT2 latency into the safety-critical DJ path.
7. DEMON
Sources:
- `[home]/Desktop/lume-commerce/viz/lume-pcloud/Docs/LUME_DEMON_ARCHITECTURE_COMPARISON_2026-05-28.md`
- `[home]/Desktop/lume-commerce/viz/lume-pcloud/Docs/LUME_REHEARSAL_BUNDLE_HUB_IMPLEMENTATION_2026-05-28.md`
Observed local facts:
- DEMON is not a pose system, sensor-fusion system, or DJ command system.
- DEMON is a controllable music diffusion runtime built around ACE-Step v1.5
and a StreamDiffusion-style audio ring buffer.
- It consumes source audio, prompts, references, LoRAs, and time-varying control
curves, then emits generated or transformed audio.
- The intended runtime is Python/CUDA/TensorRT. Current K11/Mac4/Mac5 hardware
should emit manifests now and run DEMON later on CUDA/cloud.
- Existing bundle work already treats `derived/demon/request.json` as
`schema: lume.demon_request.v1` with `status: research_manifest_only`.
Fit:
DEMON is the heavier cousin of the Magenta RT2 lane. Magenta RT2 is the local
Apple Silicon instrument for immediate live play. DEMON is the offline or
cloud-rendered music-diffusion lane for turning reviewed motion templates into
audio candidates. It should consume LUME's motion lexicon and template windows,
not raw cameras.
Correct integration:
K11 rehearsal bundle
-> derived/nko/motion_lexicon.json
-> derived/templates/gesture_templates.json
-> derived/demon/request.json
-> derived/demon/control_curves.json
-> DEMON runtime on CUDA/cloud
-> generated/transformed audio candidates
-> audition / compare / archiveBundle artifact target:
derived/demon/
summary.json
request.json
control_curves.json
render_manifest.json
generated_audio_candidates/Best first use:
Build the control-curve renderer before trying to run the model:
- `wave_color` -> prompt-blend or source-preservation curve;
- `burst_high_energy` -> denoise/guidance/transformation envelope;
- `weighted_slow_power_hold` -> source-preservation hold;
- `airdeck_platter_spin` -> audio LFO candidate, not a Rekordbox scratch.
Do not:
- Do not route camera, mocopi, or DEMON MCP output directly into Rekordbox.
- Do not let DEMON-generated confidence write back into `LumeBodyTruth`.
- Do not treat generated audio as proof that a gesture was recognized.
- Do not run DEMON on Mac4/Mac5 until the CUDA/runtime requirement changes.
8. Kinesis
Source:
- https://github.com/amathislab/Kinesis
Observed source facts:
- Kinesis is an ICRA 2026 motion-imitation framework for physiologically
plausible musculoskeletal motor control.
- The README says Kinesis 2.0 supports musculoskeletal embodiments up to 290
muscles, fatigue, and downstream tasks such as football penalty kicks.
- It is trained on 1.8 hours of locomotion data and supports locomotion,
text-to-motion control, and high-level control examples.
- Full-body with arms is listed as coming soon.
Fit:
Kinesis is a long-term motor prior. LUME does not only need to know whether a
motion happened; it needs to know whether a movement is physically plausible,
fatiguing, expressive, repeatable, or likely to be noise. Kinesis can become an
offline evaluator that scores reconstructed motion windows before they become
training material.
Bundle artifact target:
derived/kinesis/
motor_plausibility.json
fatigue_score.json
imitation_error.jsonl
task_control_summary.jsonBest first use:
Do not start with deck-hand gestures. Start with locomotion and body-energy
windows from Mac4 long takes:
- still recovery baseline;
- burst high energy;
- torso lean / weight shift;
- walking or circling.
Use Kinesis as a scoring layer for motion quality, not as an AirDeck classifier.
Do not:
- Do not claim it handles LUME full-body DJ gestures until the arm/full-body
path is actually available and verified.
- Do not run muscle simulation in live performance.
9. MAMMA
Source:
- https://arxiv.org/abs/2506.13040
Observed source facts:
- MAMMA is a markerless multi-person motion-capture pipeline.
- The abstract says it recovers SMPL-X parameters from multi-view video of
two-person interactions.
- It predicts dense 2D contact-aware surface landmarks conditioned on
segmentation masks, with person-specific correspondences under occlusion.
- The paper constructs a synthetic multi-view dataset with extreme poses, hand
motions, close interactions, SMPL-X ground truth, and dense 2D landmarks.
Fit:
MAMMA is the strongest direct fit for the current Mac5 reconstruction lane. The
existing SAM3DBody path is verified, but it is body/frame evidence. MAMMA points
toward full SMPL-X body parameters, multi-person handling, contact, occlusion,
hands, and close interaction. That is closer to real dance, club, and
performance scenes than single-person sparse skeleton output.
Bundle artifact target:
derived/mamma/
summary.json
smplx_params.npz
dense_landmarks_2d.jsonl
person_tracks.jsonl
contact_events.jsonl
reconstruction_review_frames/Best first use:
Define MAMMA as a second reconstruction backend beside SAM3D:
derived/sam3d/...
derived/mamma/...The worker contract should mirror the existing Mac5 SAM3D contract:
- never fabricate evidence;
- explicit placeholder status when unavailable;
- write review frames and logs;
- return artifacts to K11;
- map to AirDeck only through human review and dry-run gates.
Do not:
- Do not document Mac5 as running MAMMA until code/model/data access and a
repeatable local or rented-GPU run are proven.
- Do not collapse MAMMA into live control. It is an offline truth/backfill lane.
Recommended Build Order
Wave 1 - Immediate, low-risk integration
1. Add `external_research_fit` entries to the rehearsal bundle schema.
2. Add depth-boundary confidence fields to `LumeBodyTruth` as a cheap MDA-style
safety layer.
3. Run ENTHEA as an isolated Mac4 browser/OBS visual source and drive it from
audio only.
4. Prototype Magenta RT2 with a read-only BodyTruth-to-MIDI bridge.
5. Add a DEMON control-curve renderer that writes manifest-only artifacts.
Wave 2 - Offline artifact expansion
1. Add optional `derived/mamma`, `derived/d4rt`, `derived/musepose`,
`derived/kinesis`, `derived/cosmos_grounding`, `derived/mrt2`, and
`derived/demon` directories to the bundle contract.
2. Extend the K11 review workbench so each candidate can show multi-backend
evidence: SAM3D, MAMMA, D4RT, depth QA, and visual preview.
3. Add provenance flags:
{
"source_kind": "real_capture | synthetic | generated_preview | reconstructed",
"live_control_eligible": false,
"requires_human_review": true
}Wave 3 - Heavy backend trials
1. Run MAMMA, if available, on one K11 first-capture bundle.
2. Run D4RT, if available, on one Mac4 long take.
3. Run MusePose on one approved reconstructed pose sequence.
4. Run Cosmos + Locate Anything on synthetic AirDeck negative-control scenes.
5. Run Kinesis scoring on locomotion/body-energy windows.
6. Run DEMON on a CUDA/cloud host using one reviewed bundle request.
Decision Rules
- Real capture beats synthetic generation.
- K11 safety gates beat every model.
- Offline reconstruction can enrich review but cannot command.
- Generated avatar/video output is content, not evidence.
- Depth ambiguity should lower confidence, not invent certainty.
- Multi-person and occlusion-aware reconstruction is strategic because club
footage will rarely be clean single-person lab footage.
- Apple Silicon local audio generation is useful because it can run beside the
performance, but it must stay outside the Rekordbox command path.
- CUDA/cloud music diffusion is useful for rendered candidates and offline show
assets, but it must not become a live performance gate.
Primary Sources
- Cosmos + Locate Anything DGX: https://github.com/joeynyc/cosmos-locateanything-dgx
- ENTHEA live app: https://elder-plinius.github.io/ENTHEA/
- ENTHEA source: https://github.com/elder-plinius/ENTHEA
- D4RT project: https://d4rt-paper.github.io/
- MDA depth ambiguity: https://arxiv.org/abs/2606.02552
- MusePose: https://github.com/TMElyralab/MusePose
- Magenta RealTime 2: https://magenta.withgoogle.com/mrt2
- DEMON local comparison: [home]/Desktop/lume-commerce/viz/lume-pcloud/Docs/LUME_DEMON_ARCHITECTURE_COMPARISON_2026-05-28.md
- DEMON bundle handoff: [home]/Desktop/lume-commerce/viz/lume-pcloud/Docs/LUME_REHEARSAL_BUNDLE_HUB_IMPLEMENTATION_2026-05-28.md
- Kinesis: https://github.com/amathislab/Kinesis
- MAMMA: https://arxiv.org/abs/2506.13040
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
computational-choreography/09-reference/external-research.md
Detected Structure
Method · Evaluation · References · Code Anchors · Architecture