Back to corpus
architecturetechnical paper candidatescore 28

Path A: Flow Matching Architecture Upgrade

Layers (8 blocks): ├─ AdaLN-Zero (adaptive layer norm from timestep embedding) ├─ Multi-Head Self-Attention (8 heads, dim=256) over temporal axis ├─ Cross-Attention to audio context c (8 heads) ├─ FiLM modulation from audio (preserved from current system) └─ MLP (256 → 1024 → 256, GELU)

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

## Core Thesis Replace CC-MotionGen's DDPM/DDIM diffusion backbone with Optimal Transport Conditional Flow Matching (OT-CFM), achieving 10-100x speedup while preserving the dual-stage validation advantage. ### Motion DiT (Diffusion Transformer for Motion) Replace U-Net 1D with a transformer-based architecture: **Parameter count**: ~15M (vs current U-Net ~20M). Lighter due to no skip connections. Key advantage: straight-line interpolation paths → fewer steps needed for quality. ### Speed Projections | Steps | Quality (est. FID) | Latency (GPU) | vs Current | |-------|-------------------|---------------|------------| | 1 | ~0.3-0.5 | ~40ms | **50-75x faster** | | 4 | ~0.1-0.2 | ~160ms | **12-18x faster** | | 10 | ~0.05-0.1 | ~400ms | **5-7x faster** | | Current (DDIM 50) | baseline | ~2500ms | 1x |

Promotion decision

What has to happen next

Promote into a technical note or architecture paper with implementation anchors.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.