Echelon Diffusion System - Production Architecture
This is the **production-grade audio diffusion system** for Computational Choreography's Echelon engine. It transforms embodied motion into generative music through a sophisticated pipeline of neural networks.
Full Public Reader
Echelon Diffusion System - Production Architecture
Overview
This is the production-grade audio diffusion system for Computational Choreography's Echelon engine. It transforms embodied motion into generative music through a sophisticated pipeline of neural networks.
┌─────────────────────────────────────────────────────────────────────────────┐
│ ECHELON AUDIO GENERATION PIPELINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Motion Sensors (iPhones) │
│ │ │
│ ▼ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ LIM-RPS │────▶│ Echelon │────▶│ Diffusion │ │
│ │ (cc-mcs) │ │ Conditioning │ │ Model │ │
│ │ │ │ Encoder │ │ │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────┐ │
│ │ VQ-VAE Decoder + Vocoder │ │
│ │ (Audio Tokens → Waveform) │ │
│ └───────────────────────────────┘ │
│ │ │
│ ▼ │
│ 🎵 Audio Output │
└─────────────────────────────────────────────────────────────────────────────┘---
Directory Structure
cc-ml/diffusion/
├── __init__.py # Package exports
├── ARCHITECTURE.md # This file
├── requirements.txt # Dependencies
│
├── configs/ # YAML configurations
│ ├── diffusion.yaml # Diffusion model config
│ ├── vqvae.yaml # VQ-VAE config
│ ├── echelon.yaml # Echelon integration config
│ └── training.yaml # Training hyperparameters
│
├── models/ # Neural network architectures
│ ├── __init__.py
│ │
│ ├── diffusion/ # Diffusion models
│ │ ├── __init__.py
│ │ ├── unet.py # U-Net backbone
│ │ ├── dit.py # Diffusion Transformer (NEW)
│ │ ├── conditioning.py # Conditioning mechanisms
│ │ ├── noise_scheduler.py # Noise schedules (DDPM, cosine)
│ │ ├── samplers.py # DDPM, DDIM, DPM++ samplers
│ │ └── flow_matching.py # Conditional Flow Matching (NEW)
│ │
│ ├── vqvae/ # Audio tokenizer
│ │ ├── __init__.py
│ │ ├── encoder.py # Audio → Latent
│ │ ├── decoder.py # Latent → Audio
│ │ ├── codebook.py # Vector quantization
│ │ ├── vqvae.py # Complete model
│ │ ├── vocoder.py # HiFi-GAN vocoder
│ │ ├── dac.py # Descript Audio Codec (NEW)
│ │ └── losses.py # Training losses
│ │
│ └── common/ # Shared components (NEW)
│ ├── __init__.py
│ ├── attention.py # Multi-head attention variants
│ ├── normalization.py # LayerNorm, RMSNorm, AdaLN
│ ├── activations.py # SiLU, GELU, Swish
│ ├── embeddings.py # Positional, sinusoidal, rotary
│ └── blocks.py # Residual, transformer blocks
│
├── data/ # Data processing
│ ├── __init__.py
│ ├── audio_loader.py # Audio I/O
│ ├── beat_tracker.py # Beat detection
│ ├── segmenter.py # Phrase segmentation
│ ├── feature_extractor.py # Mel, MFCC, chroma
│ ├── phrase_database.py # SQLite + FAISS
│ ├── phrase_embedder.py # Phrase encoder
│ ├── motion_dataset.py # Motion-audio pairs (NEW)
│ └── augmentations.py # Data augmentation (NEW)
│
├── integration/ # Echelon bridge
│ ├── __init__.py
│ ├── echelon_conditioning.py # Latent → Conditioning (DONE)
│ ├── motion_phrase_mapper.py # Motion → Phrase
│ ├── phrase_retriever.py # FAISS retrieval
│ ├── motion_diffusion_bridge.py # Full pipeline
│ ├── realtime_engine.py # Real-time streaming (NEW)
│ └── metrics.py # Evaluation metrics
│
├── training/ # Training infrastructure (NEW)
│ ├── __init__.py
│ ├── trainer.py # Base trainer
│ ├── diffusion_trainer.py # Diffusion training loop
│ ├── vqvae_trainer.py # VQ-VAE training loop
│ ├── losses.py # Training losses
│ ├── optimizers.py # AdamW, Lion, etc.
│ ├── schedulers.py # LR schedules
│ └── callbacks.py # Logging, checkpointing
│
├── inference/ # Production inference (NEW)
│ ├── __init__.py
│ ├── pipeline.py # End-to-end pipeline
│ ├── streaming.py # Real-time streaming
│ ├── batched.py # Batch generation
│ └── export.py # ONNX/TorchScript export
│
├── scripts/ # CLI tools
│ ├── train_vqvae.py
│ ├── train_diffusion.py
│ ├── generate_audio.py
│ ├── build_phrase_database.py
│ └── evaluate.py
│
└── tests/ # Unit tests (NEW)
├── test_models.py
├── test_data.py
├── test_integration.py
└── test_inference.py---
Model Specifications
1. VQ-VAE (Audio Tokenizer)
| Parameter | Value |
|---|---|
| Sample Rate | 44,100 Hz |
| Codebook Size | 2,048 |
| Embedding Dim | 64 |
| Compression | 64× (689 tokens/sec) |
| Latent Rate | ~689 Hz |
2. Diffusion Model
| Parameter | Value |
|---|---|
| Architecture | U-Net / DiT |
| Base Channels | 256 |
| Channel Mults | [1, 2, 4, 8] |
| Attention Resolutions | [8, 4] |
| Conditioning Dim | 256 |
| Training Steps | 1,000 |
| Inference Steps | 50 (DDIM) |
3. Echelon Conditioning
| Parameter | Value |
|---|---|
| Input Dim | 25 (LatentState) |
| Window Size | 46 (30 past + 1 + 15 future) |
| Trajectory Dim | 64 |
| Dynamics Dim | 64 |
| Transition Dim | 16 |
| Device Dim | 32 |
| Output Dim | 256 |
---
Key Features
### ✅ Production-Ready
- Stateless neural modules
- External state management
- Deterministic inference
- ONNX exportable
### ✅ Real-Time Capable
- ~50ms latency target
- Streaming inference
- Pre-computed conditioning
- Optimized samplers
### ✅ Echelon-Native
- Transition field encoding
- Dual-sensor support
- Section-aware generation
- Embodied conditioning
### ✅ Observable
- Embedding statistics
- NaN detection
- Loss tracking
- Generation metrics
---
Training Data Requirements
| Dataset | Size | Purpose |
|---|---|---|
| House Music | 100+ hours | Base style |
| Electronic | 50+ hours | Variation |
| Motion-Audio Pairs | 20+ hours | Conditioning |
---
Version History
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | Dec 2024 | Initial production release |
| 0.9.0 | Nov 2024 | Echelon conditioning encoder |
| 0.8.0 | Oct 2024 | VQ-VAE and base diffusion |
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
Comp-Core/core/ml/cc-ml/diffusion/ARCHITECTURE.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture