Grand Diomande Research · Full HTML Reader

Stage 5: RAIL — Execution Plan

| Step | Task | Machine | Est. | |------|------|---------|------| | 0.1 | Create `model/dit.py` with MotionDiT architecture (Tiny: 4 blocks/128dim, Full: 8 blocks/256dim) | Mac1 | 4h | | 0.2 | Create `model/flow_matching.py` with OT-CFM (training + Euler/midpoint sampling + CFG) | Mac1 | 4h | | 0.3 | Create `training/flow_losses.py` (flow matching loss + existing structure regularizers) | Mac1 | 2h | | 0.4 | Modify `config.py` to add FlowMatchingConfig and DiTConfig blocks | Mac1 | 1h | | 0.5 | Modify `training/tra

Embodied Trajectory Systems research note experiment writeup candidate score 22 .md

Full Public Reader

Stage 5: RAIL — Execution Plan

Phase Sequence (6 phases, ~20 weeks with overlap)

---

### Phase 0: Foundation (Week 1)
Priority: P0 | Machine: Mac1 (controller)
Parallel tracks: None (setup)

Step	Task	Machine	Est.
0.1	Create `model/dit.py` with MotionDiT architecture (Tiny: 4 blocks/128dim, Full: 8 blocks/256dim)	Mac1	4h
0.2	Create `model/flow_matching.py` with OT-CFM (training + Euler/midpoint sampling + CFG)	Mac1	4h
0.3	Create `training/flow_losses.py` (flow matching loss + existing structure regularizers)	Mac1	2h
0.4	Modify `config.py` to add FlowMatchingConfig and DiTConfig blocks	Mac1	1h
0.5	Modify `training/trainer.py` to support flow matching training loop	Mac1	2h
0.6	Modify `inference/sampler.py` to add ODE solvers alongside DDIM	Mac1	2h
0.7	Create `scripts/train_flow.py` entry point	Mac1	1h
0.8	Unit tests for DiT forward pass and flow matching loss	Mac1	2h

Review checkpoint (end of Week 1): DiT builds, flow matching loss computes, sampling produces shaped output.

---

### Phase 1: Flow Matching Training (Weeks 2-4)
Priority: P0 | Machine: Cloud GPU (training), Mac4/Mac5 (prototyping)

Step	Task	Machine	Est.
1.1	Prepare training data: validate existing phrase bundles, compute statistics	Mac1	2h
1.2	Train MotionDiT-Full (8 blocks) on phrase data with flow matching — initial run (10 epochs)	Cloud GPU	8h
1.3	WEEK 2 CHECKPOINT: Evaluate convergence. If loss not decreasing → pivot to DDIM consistency distillation	Mac1	1h
1.4	Full training run (100 epochs) with WandB logging	Cloud GPU	24h
1.5	Compare quality: Flow 1-step vs Flow 4-step vs Flow 10-step vs DDIM-50 baseline	Mac1	4h
1.6	Run SanityChecker + MusalityScorer on flow matching outputs vs DDIM outputs	Mac1	2h
1.7	Select best step count for quality/speed tradeoff	Mac1	1h

Gate: Flow matching 4-step must match or beat DDIM-50 on SanityChecker pass rate AND MusalityScorer mean score.

---

### Phase 2: On-Device Proof (Weeks 4-7)
Priority: P1 | Machine: Mac1 (iOS), Mac4/Mac5 (MLX)

Step	Task	Machine	Est.
2.1	Train MotionDiT-Tiny (4 blocks, 128dim) with consistency distillation from Full teacher	Cloud GPU	12h
2.2	Create `export/coreml.py`: PyTorch → ONNX → CoreML conversion pipeline	Mac1	4h
2.3	Convert Tiny model to CoreML with fixed T=64 input shapes	Mac1	2h
2.4	Apply 6-bit weight palettization	Mac1	1h
2.5	Profile with Instruments on iPhone 15 Pro — measure ANE/GPU split, latency	Mac1	4h
2.6	Create `export/mlx_model.py`: native MLX implementation of MotionDiT-Tiny	Mac4	4h
2.7	Benchmark MLX on Mac4 (M2) and Mac5 (M4)	Mac4/Mac5	2h
2.8	Create Swift wrapper: `OnDeviceMotionGen.swift` for CreativeDirector	Mac1	4h
2.9	Integration test: on-device generation → SanityChecker → CompCoreBridge	Mac1	4h

Gate: <100ms on iPhone 15 Pro. SanityChecker pass rate ≥90

---

### Phase 3: Sensor Pipeline (Weeks 5-9, overlaps with Phase 2)
Priority: P2 | Machine: Mac1 (iOS)

Step	Task	Machine	Est.
3.1	Create `capture/lifting.py`: 2D→3D pose lifting network (200K params)	Mac1	4h
3.2	Collect paired ARKit 3D / Vision 2D data: 30 min of recordings with rear camera	Mac1 (physical)	2h
3.3	Train lifting network on paired data	Mac4	4h
3.4	Create `capture/fusion.py`: Python bridge to cc-collection EKF	Mac1	4h
3.5	Adapt cc-collection Rust EKF for multi-device state (25D + watch IMU + head tracking)	Mac1	8h
3.6	Create `capture/cleanup.py`: temporal smoothing, jerk limiting, ground contact snap	Mac1	3h
3.7	Create `capture/beat_sync.py`: DTW alignment to audio beat grid	Mac1	3h
3.8	iOS integration: camera + watch pipeline in CreativeDirector	Mac1	8h
3.9	End-to-end test: record motion → fuse → cleanup → output 25D	Mac1 (physical)	4h

Gate: <30ms sensor-to-25D latency. Smooth output (jerk < threshold).

---

### Phase 4: Text Conditioning (Weeks 7-12, overlaps with Phase 3)
Priority: P3 | Machine: Mac1 (code), Cloud GPU (training)

Step	Task	Machine	Est.
4.1	Modify `model/conditioning.py`: add CLIP text encoder (frozen ViT-L/14)	Mac1	3h
4.2	Implement multi-modal cross-attention in DiT blocks	Mac1	4h
4.3	Implement multi-modal CFG (independent text/audio dropout)	Mac1	2h
4.4	Create `bridge/smpl_to_25d.py`: SMPL 263D → 25D retargeting	Mac1	6h
4.5	Download HumanML3D dataset, retarget to 25D format	Mac1	4h
4.6	Auto-caption existing motion data with LLM (batch script)	Mac1	4h
4.7	Train with text+audio conditioning on combined dataset	Cloud GPU	24h
4.8	Evaluate: R-Precision on HumanML3D (via 25D→SMPL→evaluator)	Mac1	4h
4.9	Create `bridge/twentyfive_to_smpl.py`: 25D → SMPL for benchmarking	Mac1	4h
4.10	iOS integration: text prompt field in CreativeDirector for choreography preview	Mac1	4h

Gate: R-Precision > 0.40 on HumanML3D. Text-only generation produces recognizable motions.

---

### Phase 5: Unified Intelligence (Weeks 10-16)
Priority: P4 | Machine: Mac1 (code), Cloud GPU (training)

Step	Task	Machine	Est.
5.1	Add task token embedding (6 tasks) to DiT	Mac1	2h
5.2	Add mask conditioning (concatenate mask with input, expand stem)	Mac1	3h
5.3	Generate synthetic training data for EDIT (perturbation pairs)	Mac1	4h
5.4	Generate synthetic training data for INBETWEEN (keyframe subsampling)	Mac1	3h
5.5	Generate synthetic training data for STYLE (pair same-content different-style)	Mac1	4h
5.6	Generate synthetic training data for PREDICT (past/future splits)	Mac1	2h
5.7	Progressive multi-task training: GEN only → GEN+PREDICT → all tasks	Cloud GPU	48h
5.8	Train style encoder (contrastive on labeled styles)	Cloud GPU	12h
5.9	Implement linear probes for cc-anticipation signal extraction	Mac1	4h
5.10	Evaluate each task against dedicated baseline	Mac1	8h
5.11	Distill unified Full → updated Tiny (add PREDICT task to on-device)	Cloud GPU	8h

Gate: Each task matches or beats dedicated baseline. Style interpolation produces smooth transitions.

---

### Phase 6: Physics + RLHF (Weeks 14-20)
Priority: P5 | Machine: Mac1 (code), Cloud GPU (training)

Step	Task	Machine	Est.
6.1	Create `training/physics.py`: momentum, joint limits, contact, gravity losses	Mac1	4h
6.2	Add physics losses to flow matching training, tune weights	Cloud GPU	12h
6.3	Create `validation/critic.py`: MotionCritic architecture	Mac1	4h
6.4	Bootstrap critic from MusalityScorer + EchelonAdapter scores	Mac1	4h
6.5	Build simple A/B preference UI (web, Streamlit)	Mac1	4h
6.6	Collect ~500 human preference comparisons	Manual	10h
6.7	Fine-tune critic on human preferences (Bradley-Terry)	Mac4	4h
6.8	Create `training/dpo.py`: DPO alignment trainer	Mac1	4h
6.9	DPO fine-tuning with critic as reward model	Cloud GPU	24h
6.10	Create `validation/corrector.py`: ProximalCorrector	Mac1	4h
6.11	Create `validation/biomechanics.py`: joint limits, self-penetration, CoM	Mac1	4h
6.12	Physics metrics suite (foot slide, penetration, momentum conservation)	Mac1	3h
6.13	Full regression testing on all tasks	Mac1	4h

Gate: Zero foot sliding. Zero joint limit violations. Critic R^2 > 0.8 on held-out preferences.

---

Machine Assignments Summary

Machine	Phases	Role
Mac1	All	Controller, iOS builds, code authoring
Mac4	2, 3, 6	MLX experiments, local training, critic fine-tuning
Mac5	2	MLX benchmarking (M4 chip)
Cloud GPU	1, 2, 4, 5, 6	Production training runs (A100)

Critical Path

Phase 0 (1w) → Phase 1 (3w) → Phase 2 (3w) → Phase 4 (5w) → Phase 5 (6w) → Phase 6 (6w)
                                   ↑
                          Phase 3 (4w, parallel)

Total serial: 24 weeks | With parallelization: ~20 weeks

## Review Checkpoints
- Week 2: Flow matching convergence check (CRITICAL — pivot point)
- Week 7: On-device latency validation
- Week 9: Sensor pipeline end-to-end demo
- Week 12: Text-to-motion quality evaluation
- Week 16: Unified model multi-task evaluation
- Week 20: Physics + RLHF final quality gate

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

omega-output/cc-motion-gen-20260321/05-execution-plan.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture · is Stage Research