Grand Diomande Research · Full HTML Reader

Stage 5: RAIL — Execution Plan

| Step | Task | Machine | Est. | |------|------|---------|------| | 0.1 | Create `model/dit.py` with MotionDiT architecture (Tiny: 4 blocks/128dim, Full: 8 blocks/256dim) | Mac1 | 4h | | 0.2 | Create `model/flow_matching.py` with OT-CFM (training + Euler/midpoint sampling + CFG) | Mac1 | 4h | | 0.3 | Create `training/flow_losses.py` (flow matching loss + existing structure regularizers) | Mac1 | 2h | | 0.4 | Modify `config.py` to add FlowMatchingConfig and DiTConfig blocks | Mac1 | 1h | | 0.5 | Modify `training/tra

Embodied Trajectory Systems research note experiment writeup candidate score 22 .md

Full Public Reader

Stage 5: RAIL — Execution Plan

Phase Sequence (6 phases, ~20 weeks with overlap)

---

### Phase 0: Foundation (Week 1)
Priority: P0 | Machine: Mac1 (controller)
Parallel tracks: None (setup)

StepTaskMachineEst.
0.1Create `model/dit.py` with MotionDiT architecture (Tiny: 4 blocks/128dim, Full: 8 blocks/256dim)Mac14h
0.2Create `model/flow_matching.py` with OT-CFM (training + Euler/midpoint sampling + CFG)Mac14h
0.3Create `training/flow_losses.py` (flow matching loss + existing structure regularizers)Mac12h
0.4Modify `config.py` to add FlowMatchingConfig and DiTConfig blocksMac11h
0.5Modify `training/trainer.py` to support flow matching training loopMac12h
0.6Modify `inference/sampler.py` to add ODE solvers alongside DDIMMac12h
0.7Create `scripts/train_flow.py` entry pointMac11h
0.8Unit tests for DiT forward pass and flow matching lossMac12h

Review checkpoint (end of Week 1): DiT builds, flow matching loss computes, sampling produces shaped output.

---

### Phase 1: Flow Matching Training (Weeks 2-4)
Priority: P0 | Machine: Cloud GPU (training), Mac4/Mac5 (prototyping)

StepTaskMachineEst.
1.1Prepare training data: validate existing phrase bundles, compute statisticsMac12h
1.2Train MotionDiT-Full (8 blocks) on phrase data with flow matching — initial run (10 epochs)Cloud GPU8h
1.3WEEK 2 CHECKPOINT: Evaluate convergence. If loss not decreasing → pivot to DDIM consistency distillationMac11h
1.4Full training run (100 epochs) with WandB loggingCloud GPU24h
1.5Compare quality: Flow 1-step vs Flow 4-step vs Flow 10-step vs DDIM-50 baselineMac14h
1.6Run SanityChecker + MusalityScorer on flow matching outputs vs DDIM outputsMac12h
1.7Select best step count for quality/speed tradeoffMac11h

Gate: Flow matching 4-step must match or beat DDIM-50 on SanityChecker pass rate AND MusalityScorer mean score.

---

### Phase 2: On-Device Proof (Weeks 4-7)
Priority: P1 | Machine: Mac1 (iOS), Mac4/Mac5 (MLX)

StepTaskMachineEst.
2.1Train MotionDiT-Tiny (4 blocks, 128dim) with consistency distillation from Full teacherCloud GPU12h
2.2Create `export/coreml.py`: PyTorch → ONNX → CoreML conversion pipelineMac14h
2.3Convert Tiny model to CoreML with fixed T=64 input shapesMac12h
2.4Apply 6-bit weight palettizationMac11h
2.5Profile with Instruments on iPhone 15 Pro — measure ANE/GPU split, latencyMac14h
2.6Create `export/mlx_model.py`: native MLX implementation of MotionDiT-TinyMac44h
2.7Benchmark MLX on Mac4 (M2) and Mac5 (M4)Mac4/Mac52h
2.8Create Swift wrapper: `OnDeviceMotionGen.swift` for CreativeDirectorMac14h
2.9Integration test: on-device generation → SanityChecker → CompCoreBridgeMac14h

Gate: <100ms on iPhone 15 Pro. SanityChecker pass rate ≥90

---

### Phase 3: Sensor Pipeline (Weeks 5-9, overlaps with Phase 2)
Priority: P2 | Machine: Mac1 (iOS)

StepTaskMachineEst.
3.1Create `capture/lifting.py`: 2D→3D pose lifting network (200K params)Mac14h
3.2Collect paired ARKit 3D / Vision 2D data: 30 min of recordings with rear cameraMac1 (physical)2h
3.3Train lifting network on paired dataMac44h
3.4Create `capture/fusion.py`: Python bridge to cc-collection EKFMac14h
3.5Adapt cc-collection Rust EKF for multi-device state (25D + watch IMU + head tracking)Mac18h
3.6Create `capture/cleanup.py`: temporal smoothing, jerk limiting, ground contact snapMac13h
3.7Create `capture/beat_sync.py`: DTW alignment to audio beat gridMac13h
3.8iOS integration: camera + watch pipeline in CreativeDirectorMac18h
3.9End-to-end test: record motion → fuse → cleanup → output 25DMac1 (physical)4h

Gate: <30ms sensor-to-25D latency. Smooth output (jerk < threshold).

---

### Phase 4: Text Conditioning (Weeks 7-12, overlaps with Phase 3)
Priority: P3 | Machine: Mac1 (code), Cloud GPU (training)

StepTaskMachineEst.
4.1Modify `model/conditioning.py`: add CLIP text encoder (frozen ViT-L/14)Mac13h
4.2Implement multi-modal cross-attention in DiT blocksMac14h
4.3Implement multi-modal CFG (independent text/audio dropout)Mac12h
4.4Create `bridge/smpl_to_25d.py`: SMPL 263D → 25D retargetingMac16h
4.5Download HumanML3D dataset, retarget to 25D formatMac14h
4.6Auto-caption existing motion data with LLM (batch script)Mac14h
4.7Train with text+audio conditioning on combined datasetCloud GPU24h
4.8Evaluate: R-Precision on HumanML3D (via 25D→SMPL→evaluator)Mac14h
4.9Create `bridge/twentyfive_to_smpl.py`: 25D → SMPL for benchmarkingMac14h
4.10iOS integration: text prompt field in CreativeDirector for choreography previewMac14h

Gate: R-Precision > 0.40 on HumanML3D. Text-only generation produces recognizable motions.

---

### Phase 5: Unified Intelligence (Weeks 10-16)
Priority: P4 | Machine: Mac1 (code), Cloud GPU (training)

StepTaskMachineEst.
5.1Add task token embedding (6 tasks) to DiTMac12h
5.2Add mask conditioning (concatenate mask with input, expand stem)Mac13h
5.3Generate synthetic training data for EDIT (perturbation pairs)Mac14h
5.4Generate synthetic training data for INBETWEEN (keyframe subsampling)Mac13h
5.5Generate synthetic training data for STYLE (pair same-content different-style)Mac14h
5.6Generate synthetic training data for PREDICT (past/future splits)Mac12h
5.7Progressive multi-task training: GEN only → GEN+PREDICT → all tasksCloud GPU48h
5.8Train style encoder (contrastive on labeled styles)Cloud GPU12h
5.9Implement linear probes for cc-anticipation signal extractionMac14h
5.10Evaluate each task against dedicated baselineMac18h
5.11Distill unified Full → updated Tiny (add PREDICT task to on-device)Cloud GPU8h

Gate: Each task matches or beats dedicated baseline. Style interpolation produces smooth transitions.

---

### Phase 6: Physics + RLHF (Weeks 14-20)
Priority: P5 | Machine: Mac1 (code), Cloud GPU (training)

StepTaskMachineEst.
6.1Create `training/physics.py`: momentum, joint limits, contact, gravity lossesMac14h
6.2Add physics losses to flow matching training, tune weightsCloud GPU12h
6.3Create `validation/critic.py`: MotionCritic architectureMac14h
6.4Bootstrap critic from MusalityScorer + EchelonAdapter scoresMac14h
6.5Build simple A/B preference UI (web, Streamlit)Mac14h
6.6Collect ~500 human preference comparisonsManual10h
6.7Fine-tune critic on human preferences (Bradley-Terry)Mac44h
6.8Create `training/dpo.py`: DPO alignment trainerMac14h
6.9DPO fine-tuning with critic as reward modelCloud GPU24h
6.10Create `validation/corrector.py`: ProximalCorrectorMac14h
6.11Create `validation/biomechanics.py`: joint limits, self-penetration, CoMMac14h
6.12Physics metrics suite (foot slide, penetration, momentum conservation)Mac13h
6.13Full regression testing on all tasksMac14h

Gate: Zero foot sliding. Zero joint limit violations. Critic R^2 > 0.8 on held-out preferences.

---

Machine Assignments Summary

MachinePhasesRole
Mac1AllController, iOS builds, code authoring
Mac42, 3, 6MLX experiments, local training, critic fine-tuning
Mac52MLX benchmarking (M4 chip)
Cloud GPU1, 2, 4, 5, 6Production training runs (A100)

Critical Path

Phase 0 (1w) → Phase 1 (3w) → Phase 2 (3w) → Phase 4 (5w) → Phase 5 (6w) → Phase 6 (6w)
                                   ↑
                          Phase 3 (4w, parallel)

Total serial: 24 weeks | With parallelization: ~20 weeks

## Review Checkpoints
- Week 2: Flow matching convergence check (CRITICAL — pivot point)
- Week 7: On-device latency validation
- Week 9: Sensor pipeline end-to-end demo
- Week 12: Text-to-motion quality evaluation
- Week 16: Unified model multi-task evaluation
- Week 20: Physics + RLHF final quality gate

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

omega-output/cc-motion-gen-20260321/05-execution-plan.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture · is Stage Research