Computational Choreography — Architecture Design V1
``` ┌─────────────────────────┐ │ MotionMix 9:16 │ ← italic serif watermark │ │ │ ┌───────────────┐ │ │ │ │ │ │ │ YOUR CHEST │ │ │ │ (cropped) │ │ │ │ │ │ │ │ HOUSE │ │ ← genre + BPM overlay │ │ 134 BPM │ │ │ └───────────────┘ │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ ← thin divider │ ┌───────────────┐ │ │ │ │ │ │ │ ● ORB ● │ │ ← reactive to movement │ │ /|||||||||\ │ │ spikes = energy │ │ │ │ color = genre │ └───────────────┘ │ │ │ │ ● REC 02:34 │ ← recording indicator └─────────────────────────┘ ```
Full Public Reader
Computational Choreography — Architecture Design V1
Divergent Rail: 4 Pillars Built in Parallel
P1[C:visual_experience | A:sensor_pipeline | B:sound_design | C:training_arch] -> GATEAll 4 pillars designed simultaneously, converge into a unified system.
---
Pillar 1: The Visual Experience
The Brand
The CC content identity is: body on top, math on bottom, no face.
┌─────────────────────────┐
│ MotionMix 9:16 │ ← italic serif watermark
│ │
│ ┌───────────────┐ │
│ │ │ │
│ │ YOUR CHEST │ │
│ │ (cropped) │ │
│ │ │ │
│ │ HOUSE │ │ ← genre + BPM overlay
│ │ 134 BPM │ │
│ └───────────────┘ │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ ← thin divider
│ ┌───────────────┐ │
│ │ │ │
│ │ ● ORB ● │ │ ← reactive to movement
│ │ /|||||||||\ │ │ spikes = energy
│ │ │ │ color = genre
│ └───────────────┘ │
│ │
│ ● REC 02:34 │ ← recording indicator
└─────────────────────────┘### Split Screen Ratios
- Single camera: 55
- Multi-camera with AutoDirector: 60
- Flex mode: Body fills 70
### Camera Crop
- Top 25
- Bottom 15
- Result: chest, shoulders, arms, belt line visible
- Applied via SwiftUI `.mask()` on iOS, CSS `clip-path` on web
### Orb Behavior
The orb is NOT just a visualizer. It's a biofeedback mirror:
| Body State | Orb Response |
|---|---|
| Still | Dim core, no spikes, breathing pulse |
| Subtle movement | 4-6 small spikes, warm purple |
| Active groove | Full radial spikes, pink/red gradient, 60Hz pulse |
| Peak energy | Spikes explode past outer ring, white flash core |
| Section transition | Color shift (purple→red→orange based on section) |
| Flex detected | Single sharp spike in flex direction (L or R) |
### Multi-Camera Visual
When AutoDirector is active with 3+ cameras:
┌─────────────────────────┐
│ MotionMix 3 📷 9:16│ ← camera count badge
│ │
│ ┌─────────────────┐ │
│ │ FRONT / SIDE / │ │ ← AutoDirector switches
│ │ OVERHEAD │ │ based on Echelon state
│ │ (auto-cut) │ │
│ │ │ │
│ │ TECHNO │ │
│ │ 132 BPM │ │
│ └─────────────────┘ │
│ ┌──┐ ┌──┐ ┌──┐ │ ← thumbnail strip of all angles
│ │F │ │S │ │O │ │ highlighted = active
│ └──┘ └──┘ └──┘ │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ ┌───────────────┐ │
│ │ ● ORB ● │ │
│ └───────────────┘ │
└─────────────────────────┘### Overlays During Recording
- Flex mode: "L PEC" / "R PEC" / combo counter (fades after 800ms)
- Gesture detected: Gesture name badge (top, fades after 1.5s)
- Section change: Section name pulses once (GROOVE → BUILD → CLIMAX)
- Beat position: 16-dot indicator (subtle, bottom of orb area)
- Recording: Red dot + timer (top-right)
All overlays are part of the ReplayKit capture — what you see is what gets recorded.
---
Pillar 2: The Sensor Pipeline
Device Inventory
| Device | Sensors | Data Rate | Protocol | Role |
|---|---|---|---|---|
| iPhone 1 (main) | Camera + CoreMotion + Vision | 30fps pose, 50Hz IMU | Native | Director + front camera |
| iPhone 2 | Camera + CoreMotion | 30fps pose, 50Hz IMU | MultipeerConnectivity | Side angle |
| iPhone 3 | Camera + CoreMotion | 30fps pose, 50Hz IMU | MultipeerConnectivity | Other side / overhead |
| iPad 1 | Camera | 30fps | MultipeerConnectivity | Wide angle + display monitor |
| iPad 2 | Camera | 30fps | MultipeerConnectivity | Alternate angle + display |
| MacBook | Webcam | 10fps JPEG | WebSocket to :9405 | Static wide angle |
| Apple Watch | Accel, gyro, quaternion | 50Hz | WCSession → iPhone | Wrist IMU |
| Mocopi (6 sensors) | 9-DOF IMU × 6 | 50Hz | UDP :12351 | 27-bone 3D skeleton |
Unified Data Flow
┌─────────────┐
iPhone 2 ──MCPeer──► │ │
iPhone 3 ──MCPeer──► │ iPhone 1 │──► Capture Server :9405
iPad 1 ──MCPeer──► │ (Director) │ │
iPad 2 ──MCPeer──► │ │ │
└─────┬───────┘ │
│ │
Watch ──WCSession──► iPhone 1 │
Mocopi ──UDP :12351──► mocopi-bridge.py ──►│
MacBook ──WebSocket──► webcam-relay ───────►│
│
┌────────▼────────┐
│ Capture Server │
│ :9405 │
│ │
│ Records ALL as │
│ PhraseData │
└────────┬─────────┘
│
┌────────▼────────┐
│ Session Output │
└─────────────────┘Session Output Format
Each session produces:
sessions/2026-03-23_morning/
├── capture.json # Session metadata, device list, angles
├── echelon_latent.npy # (T, 16) Echelon brain latent trace
├── echelon_sections.npy # (T,) Section states 0-6
├── motion_25d.npy # (T, 25) Encoded 25D motion
├── clip_embeddings.npy # (T/4, 512) CLIP embeddings every 0.5s
├── clip_clusters.npy # (T/4,) Cluster labels
├── pose_front.npy # (T, 14, 2) Vision BodyPose from front camera
├── pose_side.npy # (T, 14, 2) Vision BodyPose from side camera (if available)
├── mocopi_skeleton.npy # (T, 27, 7) Mocopi quaternion+position per bone (if available)
├── watch_imu.npy # (T, 6) Watch accel+gyro (if available)
├── audio.npz # mel, chroma, mfcc, rms, centroid, onset
├── video_front.mov # Cropped vertical video from front camera
├── video_composite.mov # AutoDirector composite (multi-angle edit)
└── metadata.json # tempo, duration, participant, genre, devices usedSensor Fusion Hierarchy
Not all devices are always present. The system gracefully degrades:
| Level | Devices | Quality |
|---|---|---|
| Full | 3 iPhone + 2 iPad + Watch + Mocopi + MacBook | Research-grade capture |
| Standard | 1 iPhone + Watch + Mocopi | Great for daily practice |
| Minimal | 1 iPhone only | Still works — camera + CoreMotion |
| Web only | MacBook webcam | Pixel-diff + chest flex model |
Each level produces valid training data. More sensors = richer data, but the pipeline never breaks from missing devices.
---
Pillar 3: Sound Design
Movement Cluster → Sound Mapping
CLIP found 3 natural clusters in your 9-minute video. Each cluster gets a distinct sonic identity:
| Cluster | Movement Pattern | Sound Character | Strudel Layer |
|---|---|---|---|
| 0 (506 frames) | Active groove, sustained dance | Full beat: kick + hat + bass + pad, genre-specific patterns | All 4 layers active, chord progression cycling |
| 1 (332 frames) | Transitions, arm-focused | Harmonic + textural: pad swells, filter sweeps, delay feedback | Pad + bass dominant, percussion sparse |
| 2 (234 frames) | Warm-up, subtle, building | Minimal: isolated kick or hat, long reverb tails, sub-bass only | Single layer, effects-heavy |
Layer Architecture
Layer 1: PERCUSSION (existing)
kick, clap, hat, bass patterns
Driven by: energy state (base/build/drop/chill)
24 patterns across 6 genres
Layer 2: HARMONIC (built)
Chord progressions (3 per genre = 18 total)
Bass follows chord root
Driven by: CLIP cluster + tension scalar
Layer 3: TEXTURAL (to build)
Granular clouds, noise sweeps, spectral freezes
Driven by: movement QUALITY (smooth=warm grain, sharp=digital glitch)
Uses Tone.js GrainPlayer or custom noise generators
Layer 4: STRUCTURAL (to build)
Pattern transformations applied at phrase boundaries
reverse, halftime, doubletime, swing, striate
Driven by: movement TRANSITIONS (cluster change = pattern transform)Movement → Sound Transformation Map
| Body Action | Strudel Transformation | Musical Effect |
|---|---|---|
| Chest flex (both) | `triggerKick()` + filter spike | Deep boom with momentary brightness |
| Chest flex (left) | Left-panned sub hit | Stereo-localized bass |
| Chest flex (right) | Right-panned snap | Stereo-localized percussion |
| Arm wave | `reversePattern()` | Current phrase plays backward |
| Punch/jab | `doubleTimePattern()` | Rhythm doubles, energy spikes |
| Smooth flow | `halftimePattern()` + swing | Groove opens up, breathes |
| Spin | `randomizeHats(0.5)` | Percussion becomes stochastic |
| Freeze/hold | All layers → -60dB over 500ms | Breakdown, silence |
| Resume movement | Layers restore from energy | Music rebuilds with body |
| Cluster transition (0→1) | Chord progression change | Harmonic shift |
| Cluster transition (1→2) | Layer count reduces | Texture simplifies |
| Cluster transition (2→0) | All layers activate | Full beat drops |
Musical Form (Conductor Logic)
The Echelon conductor's 7 sections map to musical form:
Section 0 (Entry) → Cluster 2 sound: minimal, building
Section 1 (MicroInit) → Cluster 2→1 transition: layers adding
Section 2 (Stable) → Cluster 0 sound: full groove
Section 3 (Divergence) → Pattern transforms activate: variety
Section 4 (Transitional)→ Cluster 1 sound: harmonic focus, tension
Section 5 (Reformation) → Cluster 0 peak: all layers, maximum energy
Section 6 (Resolution) → Cluster 2 return: wind down, minimalThis creates a natural song arc from your body's movement arc. The music has structure because YOUR movement has structure.
---
Pillar 4: Training Architecture
Embedding Pipeline
Daily Session Video (9:16 vertical, 5-30 min)
│
├─► Extract frames (2fps, chest-cropped 224×224)
│
├─► CLIP ViT-B/32 encode → 512-dim vectors
│
├─► KMeans cluster → movement vocabulary labels
│
├─► Pair with audio (Encodec tokenize → 128-dim continuous)
│
└─► Training samples:
{
clip_embedding: (512,), # What the body looks like
motion_25d: (T_window, 25), # Echelon-encoded trajectory
audio_tokens: (128, L), # Encodec continuous audio
cluster_label: int, # Which movement vocabulary
anticipation_scalars: (7,), # Geometric scalars (when available)
}Model Training Flow
Week 1-2: Collect 15+ sessions
│
├─► CLIP embeddings accumulated
│ Clusters refine (3 → 5 → 8 as vocabulary grows)
│
├─► CC-MotionGen V2 trains on (audio → motion)
│ MotionDiT-Tiny (1.8M params) on Mac4+5
│ "Given this house track, predict Mohamed's body trajectory"
│
└─► Chest flex model retrains with more angles + contexts
Week 3-4: 30+ sessions accumulated
│
├─► EchelonDiT trains on (motion → audio)
│ 43M params, Encodec codec
│ "Given Mohamed's movement, generate matching audio"
│
├─► Movement vocabulary classifiers trained per cluster
│ One MobileNetV2 per movement type (auto-discovered from CLIP)
│
└─► Rehearsal Engine upgrades:
Linear extrapolation → CC-MotionGen prediction
"Where will Mohamed be in 2 seconds?"
Month 2+: 60+ sessions
│
├─► EchelonDiT replaces Strudel as primary audio engine
│ Neural audio that sounds like YOUR music taste
│
├─► CLIP embeddings as EchelonDiT conditioning
│ Instead of 25D motion: 512D CLIP vectors
│ Richer representation of body state
│
└─► Rehearsal Engine with trained MotionGen
Music arrives BEFORE your body
The system anticipates, you respond
THE LOOP CLOSESThe Feedback Loop (Why Daily Practice Matters)
Day N: You move → System records → Models train overnight
│
Day N+1: Models predict better → Music responds faster
│ │
└─► You respond to better music → New movement patterns emerge
│
Day N+2: Models learn your RESPONSE patterns ◄──────┘
The system learns how you respond
to being musically led
│
Day N+3: Rehearsal Engine predicts your response
to its own predictions
│
└─► EMERGENT CHOREOGRAPHY
Neither you nor the system "lead"
The movement-music relationship
evolves as its own entityData Volume Estimates
| Timeframe | Sessions | Frames | CLIP Embeddings | Audio Hours | Training Viability |
|---|---|---|---|---|---|
| Day 1 | 1 | ~2K | ~1K × 512 | 0.25 | Chest flex only |
| Week 1 | 7 | ~14K | ~7K × 512 | 1.75 | CC-MotionGen-Tiny trainable |
| Week 2 | 14 | ~28K | ~14K × 512 | 3.5 | CLIP clusters stable (5-8 movements) |
| Month 1 | 30 | ~60K | ~30K × 512 | 7.5 | EchelonDiT-Small trainable |
| Month 2 | 60 | ~120K | ~60K × 512 | 15 | Full system operational |
---
Convergence: The Complete System
YOUR BODY
│
├── Mocopi (27 bones) ──UDP──┐
├── Watch (wrist IMU) ──WS──┤
├── iPhones (camera×3) ─────┤
├── iPads (camera×2) ───────┤ ┌──────────────────┐
└── MacBook (webcam) ───────┼──► │ CAPTURE SERVER │
│ │ :9405 │
│ │ │
│ │ Records: │
│ │ - 25D motion │
│ │ - CLIP embeddings │
│ │ - Echelon latent │
│ │ - Audio │
│ │ - Multi-angle vid │
│ └────────┬──────────┘
│ │
┌─────────────────┤ │
▼ │ ▼
┌──────────────┐ │ ┌──────────────────┐
│ ECHELON BRAIN│ │ │ TRAINING PIPELINE │
│ (Rust, <5ms) │ │ │ │
│ 16D latent │ │ │ CC-MotionGen │
│ 7 sections │ │ │ EchelonDiT │
│ DELL equilib │ │ │ Movement classif. │
└──────┬───────┘ │ │ CLIP clustering │
│ │ └────────┬──────────┘
│ OSC @ 60Hz │ │
┌─────┼──────┐ │ ┌────────▼──────────┐
▼ ▼ ▼ │ │ REHEARSAL ENGINE │
┌─────┐┌─────┐┌──────┐ │ │ │
│Strud││Orb ││Touch │ │ │ MotionGen predicts │
│el ││Viz ││Dsgner│ │ │ EchelonDiT pre-gen │
│Audio││ ││(live)│ │ │ Music LEADS body │
└─────┘└─────┘└──────┘ │ └───────────────────┘
│ │ │ │
▼ ▼ ▼ │
┌──────────────────────┐ │
│ RECORDING (ReplayKit)│ │
│ Split-screen 9:16 │ │
│ Body + Orb + Overlays│ │
│ Auto-cut multi-cam │ │
└──────────┬────────────┘ │
│ │
▼ │
┌──────────────────────┐ │
│ CONTENT PIPELINE │ │
│ │ │
│ Auto-clip reels │ │
│ 3/day to @granddiomande│ │
│ Build→Peak arcs │ │
│ Mystery→Reveal series│ │
└───────────────────────┘ │---
Implementation Priority
| What | When | Effort |
|---|---|---|
| Single-phone session with capture | TODAY | Ready |
| Strudel Layer 3 (textural) | Day 2 | Small |
| Strudel Layer 4 (structural transforms) | Day 2 | Small |
| Multi-camera mesh test (2 devices) | Day 3 | Medium |
| CLIP embedding pipeline automated | Day 4 | Small |
| Full 6-device setup | Week 1 | Medium |
| CC-MotionGen training begins | Week 2 | Time |
| EchelonDiT training begins | Week 3 | Time |
| TouchDesigner visualization | Week 2 | Medium |
| Rehearsal Engine with trained models | Month 2 | Large |
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
omega-output/cc-body-instrument-20260323/cc-architecture-v1.md
Detected Structure
Method · Code Anchors · Architecture · is Stage Research