OpenCode - Training Pairing Lane
MotionMix now has enough runtime infrastructure that the next deep branch can start without blocking today’s AI photoshoot work. This branch is the offline training/data lane. The live system already has capture, live direction, motion-derived state, and audio-prep artifacts. What it does not yet have is a reproducible paired dataset builder that aligns real audio features with real pose/session data. Your job is to start that branch cleanly. You are not here to touch the live runtime, Live Director UI, device cont
Full Public Reader
OpenCode - Training Pairing Lane
Prime The Model
MotionMix now has enough runtime infrastructure that the next deep branch can start without blocking today’s AI photoshoot work. This branch is the offline training/data lane. The live system already has capture, live direction, motion-derived state, and audio-prep artifacts. What it does not yet have is a reproducible paired dataset builder that aligns real audio features with real pose/session data. Your job is to start that branch cleanly. You are not here to touch the live runtime, Live Director UI, device controls, Convex session logging, or transport. You are here to prepare the data substrate for later model training by building the pairing engine and dataset builder on top of real local artifacts.
Shared Awareness
Other agents are working in parallel.
Codex / Orchestrator
Owns live runtime convergence, deployment truth, and integration validation.
Do not touch:
- `multicam-server` live behavior
- Live Director UI
- phone deployment
- runtime device control
Claude Agent 1
Owns Ghost-mode UI only.
Do not work in that lane.
Claude Agent 2
Owns Convex production-session/event writes for the live photoshoot.
Do not modify Convex production logging.
Your Mission
Build Phase 1 of the offline training pipeline:
- audio/pose pairing engine
- dataset creation path
- reproducible manifest builder
This should output a dataset that later model-training work can consume. Stop before invasive model training unless the dataset build is complete and clearly working.
Use This Existing Context
Docs:
- `[home]/Desktop/MotionMix/HANDOFF-AUDIO-PREP.md`
- `[home]/Desktop/MotionMix/CANONICAL-STATE-CONTRACT.md`
- `[home]/Desktop/MotionMix/ARCHITECTURE.md`
- `[home]/Desktop/MotionMix/AI-PHOTOSHOOT-MOTION-MUSIC-PLAN.md`
Likely code/data surfaces:
- `[home]/Desktop/MotionMix/training`
- `[home]/Desktop/MotionMix/sessions`
- `[home]/Desktop/MotionMix/audio`
- `[home]/Desktop/MotionMixApp/MotionMixApp/MotionMixApp.swift`
- `[home]/Desktop/MotionMixApp/MotionMixApp/Services/DiffusionService.swift`
If existing prep outputs live elsewhere on disk, find them and document the discovered path explicitly.
Deliverables
1. Pairing Engine
Load:
- audio feature artifacts
- pose/session logs
- timestamps / bar or beat references if present
Produce:
- aligned windows that map pose dynamics to audio feature windows
Be explicit about all alignment assumptions.
2. Dataset Builder
Create a reproducible builder that can:
- scan the local dataset sources
- build a manifest
- emit trainable paired samples
- split train/validation if enough data exists
3. Documentation
Write a runbook/README that explains:
- input locations
- expected file formats
- commands to run
- output structure
- known limitations
File Boundaries
You may add/edit files under:
- `[home]/Desktop/MotionMix/training`
- `[home]/Desktop/MotionMix/output`
- `[home]/Desktop/MotionMix` for docs only
Avoid editing:
- `[home]/Desktop/MotionMix/multicam-server/src/main.rs`
- `[home]/Desktop/MotionMixLiveDirector`
- `[home]/Desktop/Comp-Core/apps/convex-memory/convex`
- `[home]/Desktop/MotionMixApp/MotionMixApp/Services/CameraService.swift`
Do not change live app/runtime behavior.
Non-Goals
- Do not redesign live transport.
- Do not implement Ghost UI.
- Do not change Convex production logging.
- Do not claim end-to-end model training is done unless you actually completed it.
- Do not silently invent synthetic data if real local data is missing.
Preferred Output Shape
Ideal outcome:
- a dataset builder script/module
- a manifest output
- a sample paired-output preview
- a small README/runbook
If you cannot get enough real data, the correct output is:
- a builder that works on discovered real files
- a precise report of what inputs exist and what is still missing
Suggested Verification
Use real local data only.
Minimum evidence:
- the dataset builder runs
- it finds real sessions and real audio features
- it emits at least sample aligned windows or a manifest proving alignment is working
If you add Python tooling, include the exact command used to run it and any environment assumptions.
Report Format
When done, include:
- exact input directories used
- exact output directories written
- a summary of sample counts
- one concrete example of a paired sample
- what assumptions were necessary for timestamp alignment
RTD Verification
Structure
List files added/changed and their role in the pairing pipeline.
Compilation
Report lint/run/build status for the new training utilities.
Integration
Prove the builder consumed real MotionMix audio/pose data rather than placeholders.
Content
Explain what the produced dataset contains.
User Journey
Describe how a future model-training engineer would use the artifacts you produced.
Deployment
State where the outputs were written and how to rerun the pipeline from scratch.
---
Implementation Status (2026-04-08)
✅ Completed
1. Audio Prep Pipeline — `[home-path]`
- Downloaded 4 playlists from SoundCloud/YouTube
- Extracted librosa features (mel spec, BPM, beat frames, etc.)
- Built manifests for playlist-01 and playlist-02
2. LiveDirector Training Capture — 3 sessions recorded
- `Your Mix 3-2026-04-08T02-02-29Z` — 132 pose snapshots
- `playlist-01-2026-04-08T13-31-32Z` — 292 pose snapshots
- `playlist-01-2026-04-08T18-28-54Z` — 1 pose snapshot
3. Discovery Script — `discover_inventory.py` ✅ RUNS
- Scans HD1 for all playlists and sessions
- Found 176 NPZ files, 425 pose snapshots, 3.87 hours audio
- Generated `inventory.json` with full metadata
🛠️ Scripts Created
| File | Purpose | Status |
|---|---|---|
| `discover_inventory.py` | Scan HD1 for audio + pose data | ✅ COMPLETE & TESTED |
| `pair_audio_pose.py` | Align features with dynamics | 🟡 Ready to implement |
| `inventory.json` | What exists on disk | ✅ GENERATED (176 features, 425 poses) |
| `pairs_manifest.json` | Aligned training windows | ⏳ Pending pairing run |
🚀 Next Command: Run Pairing Engine
# Create pairing script
cat > [home-path] << 'SCRIPT'
# (Script from OPENCODE-TRAINING-PAIRING.md)
SCRIPT
# Run pairing
python3 [home-path]📊 Actual Data Discovered
Audio Features (176 NPZ files):
- playlist-01: 42 features (95.7 min)
- playlist-02: 56 features (136.3 min)
- playlist-03: 44 features (no manifest)
- playlist-04: 34 features (no manifest)
Pose Sessions (425 snapshots):
- Your Mix 3 (132 poses) — Earliest session
- playlist-01 session 1 (292 poses) — Main session
- playlist-01 session 2 (1 pose) — Test session
Pairing Challenge: Need to match which audio tracks were playing during each session (use session metadata + timestamps).
⚠️ Known Assumptions
1. Session metadata has track info — Need to verify `metadata.json` contains track title/index
2. 1Hz pose capture — Will interpolate to 43fps for audio alignment
3. Clock sync — Assuming system clock consistent between audio start and pose capture
4. 4 cameras — Session metadata should list active devices
📋 Next Steps (Phase 2)
Once pairing completes:
1. PyTorch Dataset class
2. DataLoader with augmentation
3. Train/val/test split (80/10/10 by session)
4. Model architecture (FlowGenerator1Step)
5. Training loop on Mac5 GPU
6. CoreML export
---
Current Status: 🟢 Discovery complete — 176 features + 425 poses found. Ready to build pairing engine.
Promotion Decision
Keep in the searchable backlog until it intersects a live paper or system.
Source Anchor
MotionMix/agent-handoffs/OPENCODE-TRAINING-PAIRING.md
Detected Structure
Method · References · Code Anchors · Architecture