Grand Diomande Research · Full HTML Reader

HF Paper Batch -> LUME Evaluation - 2026-05-24

- `https://huggingface.co/papers/2605.22809` - `https://huggingface.co/papers/2605.22717` - `https://huggingface.co/papers/2605.17991` - `https://huggingface.co/papers/2605.18714`

Embodied Trajectory Systems experiment experiment writeup candidate score 24 .md

Full Public Reader

HF Paper Batch -> LUME Evaluation - 2026-05-24

User supplied:

`https://huggingface.co/papers/2605.22809`
`https://huggingface.co/papers/2605.22717`
`https://huggingface.co/papers/2605.17991`
`https://huggingface.co/papers/2605.18714`

Local Staging

Cloned code/project repos under:

`Desktop/MotionMix/research/external/audio-ai/live-music-diffusion-models`
Git: `ab74346` (`2026-05-22 fix readme`)
`Desktop/MotionMix/research/external/audio-ai/stable-audio-3`
Git: `fa5ee84` (`2026-05-21 Merge pull request #36 from Stability-AI/apg-float64-mps`)
`Desktop/MotionMix/research/external/audio-ai/sgt-project-page`
Git: `dfb0941` (`2026-05-19 add links`)

No heavy installs or model-weight downloads were run. Stable Audio 3 weights
are gated on Hugging Face and require accepting Stability/Gemma terms. LMDM
needs a pretrained audio checkpoint before useful inference.

2605.22809 - Sensor2Sensor

Title: `Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving`

Core idea:

- Convert unstructured monocular dashcam video into a structured multi-sensor
suite using 4D Gaussian Splatting plus diffusion.
- The important abstraction for LUME is not autonomous driving; it is
cross-embodiment sensor conversion.

LUME relevance:

- Treat K11 Bolt, MotionMix iPhones, Insta360, and future room cameras as
different embodiments of the same body-state event.
- Use a canonical latent/evidence schema first, then learn conversion/fusion
between sources later.
- This supports the current design: one accepted live-control source, many
record-only evidence sources.

Action:

- Do not chase a Sensor2Sensor implementation today.
- Add future experiment: train source-to-source reconstruction from recorded
sessions, e.g. phone-side torso evidence -> expected Bolt hand/body geometry,
as an offline confidence/fusion model.

2605.22717 - Live Music Diffusion Models

Title: `Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators`

Local code:

- `Desktop/MotionMix/research/external/audio-ai/live-music-diffusion-models`

Core idea:

- Streaming autoregressive music diffusion.
- Generates audio block by block over a sliding context window.
- `generate_diffusion_cond_blockar` supports block-wise AR generation and a KV
cache for faster streaming.

LUME relevance:

- This is the strongest match for Mo's real-time DJ direction.
- It is not a drop-in replacement for Rekordbox today; it is the lane for
gesture-conditioned accompaniment and live loop continuation.

Action:

- Keep Rekordbox/loopMIDI as the stable live surface.
- Build an offline-first spike:
- use K11 motion DB windows as prompts/control curves;
- generate short continuation/accompaniment blocks;
- save WAV stems for Rekordbox or LUME stem playback.
- Avoid live inference in the bar loop until checkpoint access and latency are
proven.

2605.17991 - Stable Audio 3

Title: `Stable Audio 3`

Local code:

- `Desktop/MotionMix/research/external/audio-ai/stable-audio-3`

Core idea:

- Open platform for generated audio/music.
- Small Music/SFX models are CPU-capable; Medium is CUDA/GPU-oriented.
- Supports text-to-audio, audio-to-audio editing, inpainting, continuation, and
LoRA fine-tuning.

Model access caveat:

- HF model pages require accepting model access conditions.
- Model license is `stable-audio-community`; repo code has MIT license, but
weights and model use are governed separately.

LUME relevance:

- Best immediate route for pre-generating gesture-reactive loops, fills, and
transition material from motion labels.
- Small Music is the likely local/Mac first target. Medium is for a GPU box.

Action:

Do not attempt ungated download in automation.
Once access is accepted, create a small batch generator:
inputs: motion label segment, BPM, energy/arms/spread curves;
output: normalized WAV loops/fills;
destination: `C:\lume\stems\generated\...` or MotionMix artifact folder.

2605.18714 - Semantic Generative Tuning

Title: `Semantic Generative Tuning`

Local project page:

- `Desktop/MotionMix/research/external/audio-ai/sgt-project-page`

Core idea:

- Uses segmentation as a generative proxy for unified multimodal models.
- High-level semantic structure beats low-level pixel reconstruction as an
alignment target.

LUME relevance:

- Directly supports using segmentation/body masks as coaching and training
targets.
- For K11, "what part of Mo is visible?" is more useful than only raw pixels or
pose points.

Action:

- Shipped the first perception lane:
- person segmentation overlay in the Pose Coach;
- pose-derived torso and hand ROI segmentation evidence;
- segmentation quality features recorded to `body_motion.sqlite3`;
- report tool at `C:\temp\lume_segmentation_report.py`.
- Next step: use segmentation confidence to decide whether hand/nod gestures are
trusted.
- This is a better next AI-coach primitive than calling a large model every
frame.

Architecture Decision

Keep three lanes:

1. Live control lane:
- K11 Bolt Pose Coach + MediaPipe body/hands.
- Deterministic gestures into Rekordbox (`Z` play/pause).
- No diffusion model inside the live control loop yet.

2. Evidence/training lane:
- MotionMix iPhone SAN + camera pose data.
- K11 body/hand pose DB.
- Future segmentation confidence and source-to-source reconstruction.

3. Generative/offline lane:
- Stable Audio 3 for generating loops/fills.
- LMDM for streaming/accompaniment research once checkpoints are available.
- Generated audio is rendered to WAV/stems first, then brought into Rekordbox
or LUME stem playback.

This preserves the working hand gesture system while opening the path toward AI
audio and learned multi-sensor body understanding.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

MotionMix/research/hf-paper-batch-lume-eval-2026-05-24.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture