Stage 5: RAIL — Execution Plan
| Step | Task | Machine | Est. | |------|------|---------|------| | 0.1 | Create `model/dit.py` with MotionDiT architecture (Tiny: 4 blocks/128dim, Full: 8 blocks/256dim) | Mac1 | 4h | | 0.2 | Create `model/flow_matching.py` with OT-CFM (training + Euler/midpoint sampling + CFG) | Mac1 | 4h | | 0.3 | Create `training/flow_losses.py` (flow matching loss + existing structure regularizers) | Mac1 | 2h | | 0.4 | Modify `config.py` to add FlowMatchingConfig and DiTConfig blocks | Mac1 | 1h | | 0.5 | Modify `training/tra
Full Public Reader
Stage 5: RAIL — Execution Plan
Phase Sequence (6 phases, ~20 weeks with overlap)
---
### Phase 0: Foundation (Week 1)
Priority: P0 | Machine: Mac1 (controller)
Parallel tracks: None (setup)
| Step | Task | Machine | Est. |
|---|---|---|---|
| 0.1 | Create `model/dit.py` with MotionDiT architecture (Tiny: 4 blocks/128dim, Full: 8 blocks/256dim) | Mac1 | 4h |
| 0.2 | Create `model/flow_matching.py` with OT-CFM (training + Euler/midpoint sampling + CFG) | Mac1 | 4h |
| 0.3 | Create `training/flow_losses.py` (flow matching loss + existing structure regularizers) | Mac1 | 2h |
| 0.4 | Modify `config.py` to add FlowMatchingConfig and DiTConfig blocks | Mac1 | 1h |
| 0.5 | Modify `training/trainer.py` to support flow matching training loop | Mac1 | 2h |
| 0.6 | Modify `inference/sampler.py` to add ODE solvers alongside DDIM | Mac1 | 2h |
| 0.7 | Create `scripts/train_flow.py` entry point | Mac1 | 1h |
| 0.8 | Unit tests for DiT forward pass and flow matching loss | Mac1 | 2h |
Review checkpoint (end of Week 1): DiT builds, flow matching loss computes, sampling produces shaped output.
---
### Phase 1: Flow Matching Training (Weeks 2-4)
Priority: P0 | Machine: Cloud GPU (training), Mac4/Mac5 (prototyping)
| Step | Task | Machine | Est. |
|---|---|---|---|
| 1.1 | Prepare training data: validate existing phrase bundles, compute statistics | Mac1 | 2h |
| 1.2 | Train MotionDiT-Full (8 blocks) on phrase data with flow matching — initial run (10 epochs) | Cloud GPU | 8h |
| 1.3 | WEEK 2 CHECKPOINT: Evaluate convergence. If loss not decreasing → pivot to DDIM consistency distillation | Mac1 | 1h |
| 1.4 | Full training run (100 epochs) with WandB logging | Cloud GPU | 24h |
| 1.5 | Compare quality: Flow 1-step vs Flow 4-step vs Flow 10-step vs DDIM-50 baseline | Mac1 | 4h |
| 1.6 | Run SanityChecker + MusalityScorer on flow matching outputs vs DDIM outputs | Mac1 | 2h |
| 1.7 | Select best step count for quality/speed tradeoff | Mac1 | 1h |
Gate: Flow matching 4-step must match or beat DDIM-50 on SanityChecker pass rate AND MusalityScorer mean score.
---
### Phase 2: On-Device Proof (Weeks 4-7)
Priority: P1 | Machine: Mac1 (iOS), Mac4/Mac5 (MLX)
| Step | Task | Machine | Est. |
|---|---|---|---|
| 2.1 | Train MotionDiT-Tiny (4 blocks, 128dim) with consistency distillation from Full teacher | Cloud GPU | 12h |
| 2.2 | Create `export/coreml.py`: PyTorch → ONNX → CoreML conversion pipeline | Mac1 | 4h |
| 2.3 | Convert Tiny model to CoreML with fixed T=64 input shapes | Mac1 | 2h |
| 2.4 | Apply 6-bit weight palettization | Mac1 | 1h |
| 2.5 | Profile with Instruments on iPhone 15 Pro — measure ANE/GPU split, latency | Mac1 | 4h |
| 2.6 | Create `export/mlx_model.py`: native MLX implementation of MotionDiT-Tiny | Mac4 | 4h |
| 2.7 | Benchmark MLX on Mac4 (M2) and Mac5 (M4) | Mac4/Mac5 | 2h |
| 2.8 | Create Swift wrapper: `OnDeviceMotionGen.swift` for CreativeDirector | Mac1 | 4h |
| 2.9 | Integration test: on-device generation → SanityChecker → CompCoreBridge | Mac1 | 4h |
Gate: <100ms on iPhone 15 Pro. SanityChecker pass rate ≥90
---
### Phase 3: Sensor Pipeline (Weeks 5-9, overlaps with Phase 2)
Priority: P2 | Machine: Mac1 (iOS)
| Step | Task | Machine | Est. |
|---|---|---|---|
| 3.1 | Create `capture/lifting.py`: 2D→3D pose lifting network (200K params) | Mac1 | 4h |
| 3.2 | Collect paired ARKit 3D / Vision 2D data: 30 min of recordings with rear camera | Mac1 (physical) | 2h |
| 3.3 | Train lifting network on paired data | Mac4 | 4h |
| 3.4 | Create `capture/fusion.py`: Python bridge to cc-collection EKF | Mac1 | 4h |
| 3.5 | Adapt cc-collection Rust EKF for multi-device state (25D + watch IMU + head tracking) | Mac1 | 8h |
| 3.6 | Create `capture/cleanup.py`: temporal smoothing, jerk limiting, ground contact snap | Mac1 | 3h |
| 3.7 | Create `capture/beat_sync.py`: DTW alignment to audio beat grid | Mac1 | 3h |
| 3.8 | iOS integration: camera + watch pipeline in CreativeDirector | Mac1 | 8h |
| 3.9 | End-to-end test: record motion → fuse → cleanup → output 25D | Mac1 (physical) | 4h |
Gate: <30ms sensor-to-25D latency. Smooth output (jerk < threshold).
---
### Phase 4: Text Conditioning (Weeks 7-12, overlaps with Phase 3)
Priority: P3 | Machine: Mac1 (code), Cloud GPU (training)
| Step | Task | Machine | Est. |
|---|---|---|---|
| 4.1 | Modify `model/conditioning.py`: add CLIP text encoder (frozen ViT-L/14) | Mac1 | 3h |
| 4.2 | Implement multi-modal cross-attention in DiT blocks | Mac1 | 4h |
| 4.3 | Implement multi-modal CFG (independent text/audio dropout) | Mac1 | 2h |
| 4.4 | Create `bridge/smpl_to_25d.py`: SMPL 263D → 25D retargeting | Mac1 | 6h |
| 4.5 | Download HumanML3D dataset, retarget to 25D format | Mac1 | 4h |
| 4.6 | Auto-caption existing motion data with LLM (batch script) | Mac1 | 4h |
| 4.7 | Train with text+audio conditioning on combined dataset | Cloud GPU | 24h |
| 4.8 | Evaluate: R-Precision on HumanML3D (via 25D→SMPL→evaluator) | Mac1 | 4h |
| 4.9 | Create `bridge/twentyfive_to_smpl.py`: 25D → SMPL for benchmarking | Mac1 | 4h |
| 4.10 | iOS integration: text prompt field in CreativeDirector for choreography preview | Mac1 | 4h |
Gate: R-Precision > 0.40 on HumanML3D. Text-only generation produces recognizable motions.
---
### Phase 5: Unified Intelligence (Weeks 10-16)
Priority: P4 | Machine: Mac1 (code), Cloud GPU (training)
| Step | Task | Machine | Est. |
|---|---|---|---|
| 5.1 | Add task token embedding (6 tasks) to DiT | Mac1 | 2h |
| 5.2 | Add mask conditioning (concatenate mask with input, expand stem) | Mac1 | 3h |
| 5.3 | Generate synthetic training data for EDIT (perturbation pairs) | Mac1 | 4h |
| 5.4 | Generate synthetic training data for INBETWEEN (keyframe subsampling) | Mac1 | 3h |
| 5.5 | Generate synthetic training data for STYLE (pair same-content different-style) | Mac1 | 4h |
| 5.6 | Generate synthetic training data for PREDICT (past/future splits) | Mac1 | 2h |
| 5.7 | Progressive multi-task training: GEN only → GEN+PREDICT → all tasks | Cloud GPU | 48h |
| 5.8 | Train style encoder (contrastive on labeled styles) | Cloud GPU | 12h |
| 5.9 | Implement linear probes for cc-anticipation signal extraction | Mac1 | 4h |
| 5.10 | Evaluate each task against dedicated baseline | Mac1 | 8h |
| 5.11 | Distill unified Full → updated Tiny (add PREDICT task to on-device) | Cloud GPU | 8h |
Gate: Each task matches or beats dedicated baseline. Style interpolation produces smooth transitions.
---
### Phase 6: Physics + RLHF (Weeks 14-20)
Priority: P5 | Machine: Mac1 (code), Cloud GPU (training)
| Step | Task | Machine | Est. |
|---|---|---|---|
| 6.1 | Create `training/physics.py`: momentum, joint limits, contact, gravity losses | Mac1 | 4h |
| 6.2 | Add physics losses to flow matching training, tune weights | Cloud GPU | 12h |
| 6.3 | Create `validation/critic.py`: MotionCritic architecture | Mac1 | 4h |
| 6.4 | Bootstrap critic from MusalityScorer + EchelonAdapter scores | Mac1 | 4h |
| 6.5 | Build simple A/B preference UI (web, Streamlit) | Mac1 | 4h |
| 6.6 | Collect ~500 human preference comparisons | Manual | 10h |
| 6.7 | Fine-tune critic on human preferences (Bradley-Terry) | Mac4 | 4h |
| 6.8 | Create `training/dpo.py`: DPO alignment trainer | Mac1 | 4h |
| 6.9 | DPO fine-tuning with critic as reward model | Cloud GPU | 24h |
| 6.10 | Create `validation/corrector.py`: ProximalCorrector | Mac1 | 4h |
| 6.11 | Create `validation/biomechanics.py`: joint limits, self-penetration, CoM | Mac1 | 4h |
| 6.12 | Physics metrics suite (foot slide, penetration, momentum conservation) | Mac1 | 3h |
| 6.13 | Full regression testing on all tasks | Mac1 | 4h |
Gate: Zero foot sliding. Zero joint limit violations. Critic R^2 > 0.8 on held-out preferences.
---
Machine Assignments Summary
| Machine | Phases | Role |
|---|---|---|
| Mac1 | All | Controller, iOS builds, code authoring |
| Mac4 | 2, 3, 6 | MLX experiments, local training, critic fine-tuning |
| Mac5 | 2 | MLX benchmarking (M4 chip) |
| Cloud GPU | 1, 2, 4, 5, 6 | Production training runs (A100) |
Critical Path
Phase 0 (1w) → Phase 1 (3w) → Phase 2 (3w) → Phase 4 (5w) → Phase 5 (6w) → Phase 6 (6w)
↑
Phase 3 (4w, parallel)Total serial: 24 weeks | With parallelization: ~20 weeks
## Review Checkpoints
- Week 2: Flow matching convergence check (CRITICAL — pivot point)
- Week 7: On-device latency validation
- Week 9: Sensor pipeline end-to-end demo
- Week 12: Text-to-motion quality evaluation
- Week 16: Unified model multi-task evaluation
- Week 20: Physics + RLHF final quality gate
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
omega-output/cc-motion-gen-20260321/05-execution-plan.md
Detected Structure
Method · Evaluation · References · Code Anchors · Architecture · is Stage Research