Grand Diomande Research · Full HTML Reader

OpenCode - Training Pairing Lane

MotionMix now has enough runtime infrastructure that the next deep branch can start without blocking today’s AI photoshoot work. This branch is the offline training/data lane. The live system already has capture, live direction, motion-derived state, and audio-prep artifacts. What it does not yet have is a reproducible paired dataset builder that aligns real audio features with real pose/session data. Your job is to start that branch cleanly. You are not here to touch the live runtime, Live Director UI, device cont

Agents That Account for Themselves technical note backlog reference score 22 .md

Full Public Reader

OpenCode - Training Pairing Lane

Prime The Model

MotionMix now has enough runtime infrastructure that the next deep branch can start without blocking today’s AI photoshoot work. This branch is the offline training/data lane. The live system already has capture, live direction, motion-derived state, and audio-prep artifacts. What it does not yet have is a reproducible paired dataset builder that aligns real audio features with real pose/session data. Your job is to start that branch cleanly. You are not here to touch the live runtime, Live Director UI, device controls, Convex session logging, or transport. You are here to prepare the data substrate for later model training by building the pairing engine and dataset builder on top of real local artifacts.

Shared Awareness

Other agents are working in parallel.

Codex / Orchestrator

Owns live runtime convergence, deployment truth, and integration validation.

Do not touch:

  • `multicam-server` live behavior
  • Live Director UI
  • phone deployment
  • runtime device control

Claude Agent 1

Owns Ghost-mode UI only.

Do not work in that lane.

Claude Agent 2

Owns Convex production-session/event writes for the live photoshoot.

Do not modify Convex production logging.

Your Mission

Build Phase 1 of the offline training pipeline:

  • audio/pose pairing engine
  • dataset creation path
  • reproducible manifest builder

This should output a dataset that later model-training work can consume. Stop before invasive model training unless the dataset build is complete and clearly working.

Use This Existing Context

Docs:

  • `[home]/Desktop/MotionMix/HANDOFF-AUDIO-PREP.md`
  • `[home]/Desktop/MotionMix/CANONICAL-STATE-CONTRACT.md`
  • `[home]/Desktop/MotionMix/ARCHITECTURE.md`
  • `[home]/Desktop/MotionMix/AI-PHOTOSHOOT-MOTION-MUSIC-PLAN.md`

Likely code/data surfaces:

  • `[home]/Desktop/MotionMix/training`
  • `[home]/Desktop/MotionMix/sessions`
  • `[home]/Desktop/MotionMix/audio`
  • `[home]/Desktop/MotionMixApp/MotionMixApp/MotionMixApp.swift`
  • `[home]/Desktop/MotionMixApp/MotionMixApp/Services/DiffusionService.swift`

If existing prep outputs live elsewhere on disk, find them and document the discovered path explicitly.

Deliverables

1. Pairing Engine

Load:

  • audio feature artifacts
  • pose/session logs
  • timestamps / bar or beat references if present

Produce:

- aligned windows that map pose dynamics to audio feature windows

Be explicit about all alignment assumptions.

2. Dataset Builder

Create a reproducible builder that can:

  • scan the local dataset sources
  • build a manifest
  • emit trainable paired samples
  • split train/validation if enough data exists

3. Documentation

Write a runbook/README that explains:

  • input locations
  • expected file formats
  • commands to run
  • output structure
  • known limitations

File Boundaries

You may add/edit files under:

  • `[home]/Desktop/MotionMix/training`
  • `[home]/Desktop/MotionMix/output`
  • `[home]/Desktop/MotionMix` for docs only

Avoid editing:

  • `[home]/Desktop/MotionMix/multicam-server/src/main.rs`
  • `[home]/Desktop/MotionMixLiveDirector`
  • `[home]/Desktop/Comp-Core/apps/convex-memory/convex`
  • `[home]/Desktop/MotionMixApp/MotionMixApp/Services/CameraService.swift`

Do not change live app/runtime behavior.

Non-Goals

  • Do not redesign live transport.
  • Do not implement Ghost UI.
  • Do not change Convex production logging.
  • Do not claim end-to-end model training is done unless you actually completed it.
  • Do not silently invent synthetic data if real local data is missing.

Preferred Output Shape

Ideal outcome:

  • a dataset builder script/module
  • a manifest output
  • a sample paired-output preview
  • a small README/runbook

If you cannot get enough real data, the correct output is:

  • a builder that works on discovered real files
  • a precise report of what inputs exist and what is still missing

Suggested Verification

Use real local data only.

Minimum evidence:

  • the dataset builder runs
  • it finds real sessions and real audio features
  • it emits at least sample aligned windows or a manifest proving alignment is working

If you add Python tooling, include the exact command used to run it and any environment assumptions.

Report Format

When done, include:

  • exact input directories used
  • exact output directories written
  • a summary of sample counts
  • one concrete example of a paired sample
  • what assumptions were necessary for timestamp alignment

RTD Verification

Structure

List files added/changed and their role in the pairing pipeline.

Compilation

Report lint/run/build status for the new training utilities.

Integration

Prove the builder consumed real MotionMix audio/pose data rather than placeholders.

Content

Explain what the produced dataset contains.

User Journey

Describe how a future model-training engineer would use the artifacts you produced.

Deployment

State where the outputs were written and how to rerun the pipeline from scratch.

---

Implementation Status (2026-04-08)

✅ Completed

1. Audio Prep Pipeline — `[home-path]`
- Downloaded 4 playlists from SoundCloud/YouTube
- Extracted librosa features (mel spec, BPM, beat frames, etc.)
- Built manifests for playlist-01 and playlist-02

2. LiveDirector Training Capture — 3 sessions recorded
- `Your Mix 3-2026-04-08T02-02-29Z` — 132 pose snapshots
- `playlist-01-2026-04-08T13-31-32Z` — 292 pose snapshots
- `playlist-01-2026-04-08T18-28-54Z` — 1 pose snapshot

3. Discovery Script — `discover_inventory.py` ✅ RUNS
- Scans HD1 for all playlists and sessions
- Found 176 NPZ files, 425 pose snapshots, 3.87 hours audio
- Generated `inventory.json` with full metadata

🛠️ Scripts Created

FilePurposeStatus
`discover_inventory.py`Scan HD1 for audio + pose dataCOMPLETE & TESTED
`pair_audio_pose.py`Align features with dynamics🟡 Ready to implement
`inventory.json`What exists on diskGENERATED (176 features, 425 poses)
`pairs_manifest.json`Aligned training windows⏳ Pending pairing run

🚀 Next Command: Run Pairing Engine

bash
# Create pairing script
cat > [home-path] << 'SCRIPT'
# (Script from OPENCODE-TRAINING-PAIRING.md)
SCRIPT

# Run pairing
python3 [home-path]

📊 Actual Data Discovered

Audio Features (176 NPZ files):
- playlist-01: 42 features (95.7 min)
- playlist-02: 56 features (136.3 min)
- playlist-03: 44 features (no manifest)
- playlist-04: 34 features (no manifest)

Pose Sessions (425 snapshots):
- Your Mix 3 (132 poses) — Earliest session
- playlist-01 session 1 (292 poses) — Main session
- playlist-01 session 2 (1 pose) — Test session

Pairing Challenge: Need to match which audio tracks were playing during each session (use session metadata + timestamps).

⚠️ Known Assumptions

1. Session metadata has track info — Need to verify `metadata.json` contains track title/index
2. 1Hz pose capture — Will interpolate to 43fps for audio alignment
3. Clock sync — Assuming system clock consistent between audio start and pose capture
4. 4 cameras — Session metadata should list active devices

📋 Next Steps (Phase 2)

Once pairing completes:

1. PyTorch Dataset class
2. DataLoader with augmentation
3. Train/val/test split (80/10/10 by session)
4. Model architecture (FlowGenerator1Step)
5. Training loop on Mac5 GPU
6. CoreML export

---

Current Status: 🟢 Discovery complete — 176 features + 425 poses found. Ready to build pairing engine.

Promotion Decision

Keep in the searchable backlog until it intersects a live paper or system.

Source Anchor

MotionMix/agent-handoffs/OPENCODE-TRAINING-PAIRING.md

Detected Structure

Method · References · Code Anchors · Architecture