MotionMix — Technical Architecture
> Full system architecture: hardware sensors → Rust engine → neural synthesis → multi-machine rendering > Last updated: 2026-04-16
Full Public Reader
MotionMix — Technical Architecture
> Full system architecture: hardware sensors → Rust engine → neural synthesis → multi-machine rendering
> Last updated: 2026-04-16
---
System Overview
MotionMix is a real-time motion-to-music synthesis platform. A performer's body movement is captured via wearable sensors and iPhone cameras, processed through a Rust engine (Echelon) into a 128-dimensional canonical vector, fed through a 5-layer neural network (SAN), and output as audio parameters, camera decisions, visual effects, and 3D body rendering across a fleet of devices.
Sony Mocopi (27 bones) ──┐
iPhone IMU (60Hz) ──┤
iPhone Camera (Vision) ──┤──→ Echelon (Rust) ──→ SAN (5-layer) ──→ Audio + Camera + Visuals
Apple Watch (wrist) ──┘ │ │
│ ├──→ LiveDirector (Mac)
├──→ Avatar Pipeline ──→ Metal mesh ├──→ TouchDesigner (Mac5)
└──→ Accountability ──→ rep/sleep └──→ Unity VFX (Mac4)---
Layer 0: Hardware Sensors
### Sony Mocopi
- 27 inertial measurement units worn on body joints
- Streams skeleton data at 30Hz via UDP/OSC
- Each bone: world-space position [x,y,z] + orientation quaternion [qw,qx,qy,qz]
- OSC formats: `/mcp/sklt/joint/N` (per-bone), `/mocopi/skel` (flat 189 floats), `/mcp/BonePos/{name}` (position-only)
### iPhone CoreMotion
- Accelerometer, gyroscope, gravity vector, attitude quaternion
- 60Hz sampling via CMMotionManager
- Feeds the LIM-RPS latent space dynamics in Rust
### iPhone Camera + Vision
- AVCaptureSession rear camera
- Apple Vision VNDetectHumanBodyPoseRequest extracts 14 body joints
- Each joint: normalized [x, y, confidence]
- Derived metrics: body energy, bouncing (hip Y oscillation), core motion, leg motion
### Apple Watch
- WatchConnectivity session for wrist energy + jerk
- Supplements IMU data when available
---
Layer 1: Sensor Ingestion (Swift)
All ingestion services run in the MotionMixApp iOS application.
### MocopiReceiver.swift
- `@MainActor final class MocopiReceiver: ObservableObject`
- NWListener on UDP `:9500` using Apple Network framework
- Pure-Swift OSC parser (no third-party dependencies)
- Auto-detects quaternion ordering (qw-first vs qw-last) by magnitude comparison
- 27-bone accumulator with auto-flush at 23+ bones or 33ms time boundary
- Output: `echelonBridge.mocopiExtractor.update(bones:)` on MainActor
- All parsing methods are `nonisolated` for background queue safety
### SensorService.swift
- CMMotionManager at 60Hz
- Buffers device motion frames
- Watch session via WCSession
- Bonjour mesh relay discovery for multi-device sync
- Output: `echelonBridge.updateSensor()` + `onLatentUpdate` callback
### CameraService.swift
- AVCaptureSession with configurable position (front/back)
- Distributes CGImage frames to: PoseService, LiveStreamServer, RecordingService, FaceAnalyzer
- Gemini analysis frames every ~3 seconds
### PoseService.swift
- VNDetectHumanBodyPoseRequest on every camera frame
- Extracts 14 joints with confidence thresholds
- Computes 6 pose features: meanX, meanY, stdX, stdY, rangeX, rangeY
- These features overwrite 128D vector positions [63:69]
- Output: `onPoseUpdated` callback with MotionMixPoseFrame
---
Layer 2: Rust Engine — Echelon
Source: `Desktop/Comp-Core/core/audio-media/cc-echelon/`
Binary: `Desktop/MotionMixApp/Frameworks/libechelon_ios.a` (22MB arm64)
Header: `Desktop/MotionMixApp/Frameworks/include/echelon.h`
2a: EchelonCore (cc-brain)
The core motion intelligence engine.
LIM-RPS Latent Space
- 16-dimensional Riemannian manifold
- Geodesic dynamics with velocity, curvature, jerk
- State: z[16] (position), velocity[16], plus 7 scalar features (norm, speed, curvature, grounding, verticality, rotation, coherence)
Lexicon
- 8 expressive scalars derived from latent dynamics:
- tension, divergence, transition_intensity, dissolution
- reformation, resolution, energy, expressivity
Section State Machine
- 7 states: Entry → MicroInitiation → StableSection → Divergence → Transitional → Reformation → Resolution
- Transitions driven by latent space dynamics (speed, curvature, divergence thresholds)
Motion Pipeline
- Anticipation Kernel: commitment, uncertainty, transition_pressure, recovery_margin, phase_stiffness, novelty, stability
- Gesture Classifier: class_id, confidence, commitment (from pose joint trajectories)
- Bar Boundary Detector: tempo-aware beat tracking for musical structure
2b: SAN Pipeline (Somatic Adaptive Network)
5-layer neural network replacing 200+ hardcoded thresholds with learned mappings.
135K parameters. Runs at 30Hz.
128D input ──→ [L1: FAN] ──→ [L2: FuseMoE] ──→ [L3: NHA] ──→ [L4: TTT] ──→ [L5: FiLM Heads]
│ │ │ │ │
Normalizer 6 experts 7-mode ODE Hebbian 5 output
(input→128D) top-2 gate RK4 integr. fast weights heads
128D→40D/expert learned phase bar-boundary (44D total)| Layer | Name | Params | Function |
|---|---|---|---|
| L1 | FAN (Feature-Aligned Normalizer) | ~400 | Running mean/var normalization for 128D input |
| L2 | FuseMoE (Fused Mixture of Experts) | ~51K | 6 experts (128D→40D each), top-2 gating, load balancing |
| L3 | NHA (Neural Harmonic Attractor) | ~15K | 7-mode ODE with RK4 integration, learned phase coupling |
| L4 | TTT (Test-Time Training) | ~6.8K | Hebbian fast weights, adapts at bar boundaries within a session |
| L5 | FiLM Heads | ~62K | 5 output projections with feature-wise linear modulation |
Output Heads (44D total):
- Audio (20D): 4 instruments (kick, hihat, bass, pad) x 5 params each
- Camera (9D): 7 angle scores + cut timing + transition type
- Pattern (2D): intensity + variation
- Phrase (4D): forming + growing + stable + dissolving
- Gesture (9D): 8 class probabilities + confidence
2c: 128D Canonical Vector Layout
The central data structure connecting all subsystems. Assembled from multiple sources in `getDynamics128()`:
Index Source Content
──────────────────────────────────────────────────────────────
[0:16] Rust z vector (16D LIM-RPS latent position)
[16:32] Rust z padding (zeros)
[32:48] Rust velocity (16D LIM-RPS latent velocity)
[48:63] Rust velocity padding (zeros)
[63:69] Swift OVERRIDE pose features [meanX, meanY, stdX, stdY, rangeX, rangeY]
(Mocopi 6D if fresh, else Vision 6D)
[69:75] Rust temporal scalars [internal_tempo, phase, periodicity,
grounding, verticality, rotation, coherence]
[75] Swift modality mask (camera=1, pocket=2, mocopi=4, watch=8, /15)
[76:100] Swift Mocopi 24D features (joint vel, limb ratios, symmetry)
[100:102] Swift Pocket IMU (pitch, roll)
[102:104] Swift Watch (HR normalized, wrist energy)
[104:128] — reserved (future: Femto Bolt, LiDAR, etc.)Critical invariant: Swift overwrites [63:69] after Rust writes [0:76]. The Rust values at [64:68] (norm, speed, curvature, curvature_rate, jerk) are clobbered. Training data must match this runtime layout exactly.
Training note: V5 weights were trained on 104D only (dims [104:128] = zeros). The SAN Rust config already declares `input_dim: 128`. V6 retrain will activate the full 128D with Mocopi/Watch/IMU features.
2d: Avatar Pipeline (cc-brain/src/avatar/)
Ported from Meta's AI4AnimationPy (Facebook Research, Paul & Sebastian Starke). 3,115 lines of Rust.
| File | Lines | Purpose |
|---|---|---|
| skeleton.rs | 766 | Quaternion [w,x,y,z] math, 4x4 matrix transforms, forward kinematics chain |
| bvh.rs | 449 | BVH animation file parser, ZXY Euler convention, frame-by-frame playback |
| skinning.rs | 453 | Linear Blend Skinning + Dual Quaternion skinning, 4 bones/vertex max |
| glb.rs | 1,026 | Full glTF/GLB binary parser, accessor resolution, mesh primitives, skeleton extraction |
| mod.rs | 421 | AvatarPipeline struct + 9 FFI symbols |
Data flow:
Mocopi 27 bones [x,y,z,qw,qx,qy,qz] per bone
↓ avatar_update_bones()
Forward Kinematics (parent→child chain)
↓
World-space bone positions (27 x [x,y,z])
↓ Linear Blend Skinning
Deformed mesh vertices (N x [x,y,z])
↓ avatar_get_deformed_positions()
Metal vertex buffer → GPU rendering44 tests pass. Supports both live Mocopi input and BVH animation replay.
2e: Accountability Engine (cc-brain/src/accountability/)
Exercise detection and sleep/wake classification. 6 files, ~900 lines, 27 tests, 9 FFI symbols.
| File | Purpose |
|---|---|
| joint_angle.rs | `joint_angle_3d()` / `joint_angle_2d()` for elbow, knee, torso angles from bone positions |
| rep_counter.rs | EMA-filtered peak-valley rep detector (alpha=0.3, hysteresis=3 frames, min_separation=15 frames) |
| exercise.rs | PushUpDetector (elbow+torso FSM), SquatDetector (knee FSM), ExerciseClassifier with bout detection |
| sleep.rs | SleepWakeDetector: 30s EMA on hip height ratio + body energy, 5-min threshold |
| types.rs | ExerciseType, SleepState, RepEvent, AccountabilityEventFFI enums/structs |
| mod.rs | AccountabilityEngine + 9 FFI functions |
Exercise Detection State Machines:
Push-up: IDLE → DOWN (elbow<120, torso<30, 3 frames) → UP (elbow>155, 3 frames) → counted
Squat: STANDING → DESCENDING (knee<150) → BOTTOM (knee<110) → ASCENDING (knee>120) → counted (knee>155)Sleep States: AwakeActive (0), AwakeStill (1), Resting (2), Sleeping (3)
- Classification: height ratio vs standing + body energy threshold
- Sleeping requires sustained low activity for 5 minutes (9000 frames at 30Hz)
Bout Detection: Counts active-exercise frames (not rep events). Threshold: 90 frames (3 seconds at 30Hz).
---
Layer 3: Swift Bridge — EchelonBridge.swift
`@MainActor` class running at 60Hz via CADisplayLink.
Key invariant: EchelonBridge owns SANService as a non-optional `let` property. Never use @StateObject for SANService.
60Hz Loop
Display Link fires (60Hz)
↓
echelon_update_sensor() — feed buffered IMU frames to Rust
↓
echelon_step(dt) — advance Rust engine one timestep
↓
echelon_get_latent() — read latent state
echelon_get_lexicon() — read expressive scalars
echelon_get_anticipation() — read anticipation kernel
echelon_get_gesture() — read gesture classifier
↓
getDynamics128() — assemble 128D canonical vector:
[0:76] from Rust echelon_get_dynamics_128()
[63:69] overwrite with pose features (Mocopi or Vision)
[75] modality mask
[76:100] Mocopi 24D from MocopiFeatureExtractor
[100:104] Pocket IMU + Watch features
[104:128] reserved (zeros)
↓
san_step() — feed 128D to SAN pipeline (30Hz, every other frame)
↓
san_get_output() — read SAN output (44D)
↓
Distribute to consumers: AudioEngine, AutoDirector, StrudelEngine### MocopiFeatureExtractor
- Receives 27 bones from MocopiReceiver
- Extracts 24D features: joint velocities, limb ratios, symmetry metrics
- Written into 128D vector at positions [76:100]
---
Layer 4: Output Consumers (Swift)
### AudioEngine
- AVAudioEngine with source node render callback
- SAN audio head drives 4 instruments: kick, hihat, bass, pad
- Each instrument: 5 parameters (energy, onset probability, spectral centroid, modulation, presence)
- StrudelWebEngine: WebKit-based pattern sequencer, receives SAN pattern head (intensity + variation)
- Mix factor crossfade: 0.0 = pure heuristic thresholds, 1.0 = pure SAN learned output
### AutoDirector
- SAN camera head: 7 camera angle scores for multi-cam switching
- Cut timing + transition type (hard cut, crossfade, etc.)
- Connected to DirectorHubClient (WebSocket) for centralized orchestration
### LiveDirector (macOS — MotionMixLiveDirector)
- Receives pose telemetry from all iOS devices via persistent WebSocket
- MJPEG preview streams from each camera node
- Centralized camera switching decisions
- SwiftUI for Mac, built from `Desktop/MotionMixLiveDirector/`
### Visual Pipeline
- MetalRenderer: particle system driven by latent state (orb, spine, horizon parameters from UIState)
- ChestFlexDetector: maps pectoral flex to visual + audio triggers
- Flex direction feeds Metal shader for particle offset
---
Layer 5: 3D Rendering Pipeline (Cross-Machine)
TouchDesigner (Mac5 — [ip])
mocopi_to_td.py (Mac1, HTTP :9407)
↓ converts to OSC
↓ fan-out to FANOUT_TARGETS
TouchDesigner :9501 — skeleton (27 bones pos + rot)
TouchDesigner :9500 — performance signals (energy, tension, brightness, density)
↓
Render: geometry → camera → lights → render → bloom → level → output
↓
1920x1080 real-time 3D body visualizationScripts: `Desktop/cc-touchdesigner/`
- `cc_network_builder.py` — builds /cc container in TD (aurora + bloom pipeline)
- `mocopi_to_td.py` — HTTP→OSC bridge with fan-out relay
- `osc_bridge.py` — performance signal poller (20Hz)
### Unity (Mac4 — [ip])
- Unity 6 LTS, project: DepthReactiveVisuals
- Receives texture/data from Mac5 via Thunderbolt 5 direct link (sub-millisecond latency)
- VFX Graph production rendering for audience-facing display
- Adobe suite (Illustrator/Photoshop/AE/Premiere) for generative art runs alongside
### Fan-Out Relay
`mocopi_to_td.py` on Mac1 acts as a multi-destination relay:
FANOUT_TARGETS = [
("[ip]", 9501), # Mac5 TouchDesigner
# ("192.168.1.X", 9500), # iPhone MocopiReceiver (add device IP)
]Every OSC message is duplicated to all targets. Enables simultaneous iPhone SAN processing + Mac5 3D rendering from a single Mocopi stream.
---
Layer 6: Training Pipeline (Offline)
### Capture
- SANTrajectoryLogger: JSONL capture at 5Hz on device
- Records: 128D input vector + SAN output + track metadata
- Stored in `Documents/san-training/*.jsonl` on device
- NSLock-protected I/O for thread safety
Transfer + Alignment
devicectl copy ← device Documents/san-training/
↓ /Volumes/HD1/training-phrases/device_captures/
build_v5_pairs.py
↓ align 128D captures with audio features from playlist NPZ
↓ per-frame: rms_energy, onset_strength, chroma[12], mfcc[20], spectral_centroid
Aligned training pairs (input: 128D, target: 44D)### Training
- `train_san_v5.py`: MLX framework, AdamW optimizer, early stopping
- Current: V5 weights, 5,408 real training pairs, val loss 0.028
- Output: `san_v5_weights.bin` + `san_v5_manifest.json`
Deployment
Copy weights to MotionMixApp/Resources/
↓
Rebuild libechelon_ios.a (if Rust code changed)
↓
xcodebuild -workspace ... ENABLE_DEBUG_DYLIB=NO
↓
xcrun devicectl device install app --device {ID} {APP}
xcrun devicectl device process launch --device {ID} com.openclaw.MotionMixApp---
FFI Surface
All C-ABI symbols declared in `echelon.h`, linked via `libechelon_ios.a`:
| Module | Symbols | Purpose |
|---|---|---|
| EchelonCore | 18 | Lifecycle, sensor input, processing, latent/lexicon/pose output |
| SAN | 13 | Pipeline lifecycle, step, output, training data, weight loading, benchmark |
| ClaimBridge | 5 | N'Ko inscription detection from 128D latent vector |
| Avatar | 9 | Skeleton FK, GLB mesh, bone update, deformed vertices, triangle indices |
| Accountability | 9 | Exercise detection, sleep/wake, rep counting, event polling |
| Total | 54 |
---
Device Fleet
| Device | ID | Role |
|---|---|---|
| iPhone 16 Plus | 880B4058 | Primary (full mode — all sensors + audio + SAN) |
| iPhone 16 Pro Max | 84109044 | Secondary (full mode) |
| iPhone 14 Pro Max | 45896348 | Camera node (pose streaming only, no SAN) |
| iPad A16 (Mohamed's) | 1DE6FABC | ShootView gallery |
| iPad A16 | 1938B9B3 | ShootView gallery |
| Mac1 | local | Build host, relay, orchestration |
| Mac4 | [ip] | Unity VFX + Adobe generative art |
| Mac5 | [ip] | TouchDesigner 3D rendering, ML compute |
---
Build Commands
# Rust engine tests
cd Desktop/Comp-Core/core/audio-media/cc-echelon
cargo test -p cc-brain --lib
# Rust engine build (iOS arm64)
cargo build -p echelon-ios --target aarch64-apple-ios --release
cp target/aarch64-apple-ios/release/libechelon_ios.a Desktop/MotionMixApp/Frameworks/
# iOS app
cd Desktop/MotionMixApp
xcodebuild build -workspace MotionMixApp.xcworkspace -scheme MotionMixApp \
-destination 'generic/platform=iOS' ENABLE_DEBUG_DYLIB=NO
# LiveDirector (macOS)
cd Desktop/MotionMixLiveDirector
xcodebuild build -project MotionMixLiveDirector.xcodeproj \
-scheme MotionMixLiveDirector -destination 'platform=macOS'
# Deploy to device
xcrun devicectl device install app --device {DEVICE_ID} {APP_PATH}
xcrun devicectl device process launch --device {DEVICE_ID} com.openclaw.MotionMixApp---
Key Invariants
1. EchelonBridge owns SANService — `let san = SANService()` as non-optional. Never @StateObject.
2. 128D [63:69] overwrite — Swift pose features (Mocopi or Vision) clobber Rust values at these indices. Training data must match.
3. Rust/Python naming swap — MoE experts: Python `down` = Rust `up` (first projection). Weight loading cross-maps.
4. @MainActor for all @Published — No DispatchQueue.main.async wrappers. Direct synchronous updates.
5. No Mirror(reflecting:) — Banned in 30Hz hot path. Use direct tuple indexing for FFI structs.
6. Camera-node mode — Camera nodes skip SAN, audio, training capture. Early return in wireServices().
7. nonisolated parsing — MocopiReceiver parsing chain is nonisolated. Only flush() crosses to MainActor via Task.
8. Install + launch — devicectl install does NOT auto-launch. Always follow with devicectl process launch.
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
MotionMixApp/ARCHITECTURE.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture