Grand Diomande Research · Full HTML Reader

MotionMix Unified Director — Agent Handoff

// After: adaptive — performer phase OR metronome fallback poseBarCounter += 1 let echelonBarFired = echelonBridge.shouldFireBar() let fixedBarFired = poseBarCounter % 30 == 0 guard (echelonBarFired || fixedBarFired), let hub else { return } ```

Agents That Account for Themselves technical note experiment writeup candidate score 32 .md

Full Public Reader

# MotionMix Unified Director — Agent Handoff
> Generated 2026-04-04 | Cross-session synthesis of Echelon work + Multicam Unified Director evolution

---

What You Just Built (Previous Session — Evaluated)

The previous session completed 8 layers of foundational Echelon work. Here is what now exists and its exact state:

### Layer 1 — FFI Thread Safety ✅
- All brain-side FFI calls serialized onto `@MainActor` in `EchelonBridge.swift`
- Sensor frames queued via `NSLock` into `pendingSensorFrames`, drained in `step()` on MainActor
- Audio handle (`MotionSynth`) stays on audio render thread — safe to access concurrently
- False comment removed

### Layer 2 — Canonical 104D State Contract ✅
`getDynamics104()` now returns the full vector:

[0:16]   z          — LIM-RPS 16D equilibrium state
[16:32]  velocity   — 16D latent velocity
[32:39]  7 scalars  — curvature, grounding, verticality, rotation, coherence, norm, speed
[39:51]  activations  (filled Swift-side from MotionModelOutput)
[51:63]  expressions  (filled Swift-side)
[63:69]  pose features ← STILL ZEROS (see open issue below)
[69:74]  temporal     — tempo_estimate, tempo_confidence, phase_position (NEW)
[74:104] reserved zeros

### Layer 3 — Backend Selector ✅
`echelon_create_with_backend(config, backend, weights_path)` now available:
- backend=0 → SimpleLatentUpdater (default)
- backend=1 → LearnedLatentUpdater
- backend=2 → DellLatentUpdater

### Layer 4 — Temporal Intelligence ✅ This is the critical one
- `detect_periodicity()` was already in `utils.rs` — wired it properly
- Added `previous_phase: f32` + `bar_trigger_pending: bool` to `EchelonCore`
- Phase crossing (>0.8 → <0.2 when periodicity > 0.3) sets `bar_trigger_pending = true`
- `echelon_get_bar_trigger()` atomically reads and clears the flag
- Swift: `shouldFireBar()` → delegates to FFI, falls back to `false`
- MotionMixApp.swift: Fixed the 30-frame fixed metronome:

swift

  // Before: fixed 30-frame metronome
  guard poseBarCounter

  // After: adaptive — performer phase OR metronome fallback
  poseBarCounter += 1
  let echelonBarFired = echelonBridge.shouldFireBar()
  let fixedBarFired = poseBarCounter
  guard (echelonBarFired || fixedBarFired), let hub else { return }

### Layers 5-6 — Flow Model + CoreML ✅
- `FlowGenerator1Step.mlpackage` (38MB, 1-step) + `ConditioningEncoder.mlpackage` (1.1MB) in Xcode target
- `DiffusionService.swift` updated: single forward pass (noise → cond → logits → argmax → TokenGrid)
- `buildDynamicsVector()` calls `echelonBridge?.getDynamics104()` with heuristic 7D fallback

### Layer 7 — Docs ✅
- `ECHELON-CANONICAL.md` (238 lines) — definitive spec
- `ECHELON-VISION.md` (635 lines) — intellectual history
- `chatgpt-echelon-history.md` (187K lines) — full conversation database

---

Two Open Issues from Previous Session

### Issue 1 — Pose features still zeros [BLOCKING for Performance scoring]
What it is: `dynamics[63:69]` = (meanX, meanY, stdX, stdY, rangeX, rangeY) from Vision landmarks. These six values are computed in `MotionMixApp.swift` from Vision body pose results, but never routed to `DiffusionService.buildDynamicsVector()`.

Why it matters now: The MotionMix Unified Director's Performance mode scoring needs body position in frame (where is the performer? center vs edge?) to decide which camera angle is most relevant. These come from dynamics[63:69].

Fix needed in `DiffusionService.swift`:

swift

// In buildDynamicsVector(), after calling getDynamics104():
// Route Vision landmarks into the pose feature slots [63:69]
// MotionMixApp.swift already computes these — pass them as a parameter
// or wire echelonBridge so Vision fills them directly

Concrete fix: In `MotionMixApp.swift`, wherever Vision body pose results are processed (look for the `VNHumanBodyPoseRequest` handler), extract the 6 pose features and pass them to a new method `echelonBridge.setPoseFeatures(meanX:meanY:stdX:stdY:rangeX:rangeY:)`. That method writes them into the pending dynamics vector before the next `getDynamics104()` call.

### Issue 2 — echelonBridge not wired to DiffusionService [BLOCKING for canonical path]
What it is: `DiffusionService.buildDynamicsVector()` has the code to call `echelonBridge?.getDynamics104()`, but the bridge reference is `nil` at runtime because the wiring line was never added to the app startup sequence.

Fix needed: In `MotionMixApp.swift` (or wherever app-level wiring happens — look for where `DiffusionService` and `EchelonBridge` are both accessible):

swift

diffusionService.echelonBridge = echelonBridge

This single line activates the canonical 104D path. Without it, every diffusion call falls back to the heuristic 7D vector.

---

What Needs to Be Built Next — Unified Director

### The Core Problem
There are currently two conflicting cut decision systems:
1. iOS `AutoDirector.swift` — runs at 60Hz, Echelon-driven, makes cut decisions locally
2. Rust `director_loop` in `multicam-server/src/main.rs` — runs every 250ms, Murch scoring, makes cut decisions server-side

Neither knows what the other is doing. They fire independently. Cuts are uncoordinated.

Additionally: `AutoDirector` only works when Echelon motion signals are active. When Mohamed sits and speaks to camera (Creator mode), `latent.speed ≈ 0`, all cameras score equally, no cuts happen.

The Fix — Server-Sovereign Unified Director

Architecture in one sentence: The multicam-server becomes the sole cut authority. The iOS `AutoDirector` becomes display-only (score visualization + failover). All cut commands flow server → DirectorHubClient WS → MultiCamView.

Two modes in the same server loop:
- Performance mode: Echelon-driven + beat-quantized. Cuts gated to bar boundaries from `shouldFireBar()`.
- Creator mode: Face proximity + speech pause gating. Cuts land between sentences, not mid-word.
- Auto-detection: if Echelon motion < 0.15 for >5s → Creator mode. If > 0.3 → Performance mode. User can force either.

---

Implementation Plan — 7 Waves, 37 Tasks

Wave 0: Foundation (Days 1-2)

0.1 [human] Benchmark WS round-trip latency DirectorHubClient ↔ multicam-server on same WiFi.
- Send 100 ping-pong via WS, measure p50/p95/p99
- Kill criterion: if p95 > 200ms → switch to MultipeerConnectivity distribution, skip server WS for cuts

0.2 [human] Benchmark Vision FaceLandmarks + BodyPose concurrent on iPhone 14 Pro.
- Must stay combined <30ms per frame at 10Hz

0.3 [agent] Create `Desktop/MotionMixApp/MotionMixApp/Services/FaceAnalyzer.swift` (~180 lines):

swift

@MainActor
final class FaceAnalyzer: ObservableObject {
    @Published var gazeScore: Float = 0       // 1 = looking directly at camera
    @Published var expression: Float = 0       // 0-1 expression intensity
    @Published var headPoseQuality: Float = 0  // 1 = fully frontal
    @Published var faceDetected: Bool = false

    private let visionQueue = DispatchQueue(label: "com.motionmix.face", qos: .userInitiated)
    var isPerformanceMode: Bool = false  // true → 2Hz, false → 10Hz

    nonisolated func processFrame(_ cgImage: CGImage) {
        // VNDetectFaceLandmarksRequest + gaze/expression/headPose computation
        // Gaze: inner eye vector relative to face normal → 1.0 = camera-forward
        // Expression: mouth openness + eyebrow raise normalized to 0-1
        // HeadPose: face rect aspect ratio vs frontal template
    }
}

0.4 [agent] Create `Desktop/MotionMixApp/MotionMixApp/Services/SpeechEnergyDetector.swift` (~80 lines):

swift

final class SpeechEnergyDetector {
    private(set) var speechEnergy: Float = 0
    private(set) var inSpeechPause: Bool = true
    var pauseThreshold: Float = 0.05

    nonisolated func feedBuffer(_ buffer: AVAudioPCMBuffer, sampleRate: Float) {
        // Band-pass 300-3000Hz (speech range), compute RMS
        // EMA smooth, track 300ms pause window
    }

    func calibrate(from buffers: [AVAudioPCMBuffer]) {
        // Auto-set threshold from first 3s: mean + 1.5*std * 0.3
    }
}

0.5 [agent] Add `FaceSignal` struct and "face" WS message handler to `multicam-server/src/main.rs`:

rust

struct FaceSignal {
    gaze_score: f32,
    expression: f32,
    head_pose_quality: f32,
    speech_energy: f32,
    in_speech_pause: bool,
    last_update_ms: u64,
}
// Add: Arc<Mutex<HashMap<String, FaceSignal>>> to AppState
// Add: handle "face" type in device WS handler

---

Wave 1: Creator Mode Scoring (Days 3-4)

1.1 [agent] Add `director_score_creator()` to `main.rs`:

rust

fn director_score_creator(
    face: &FaceSignal,
    is_active: bool,
    elapsed_since_cut_ms: u64,
) -> (f32, &'static str) {
    if is_active { return (0.0, "active"); }
    let face_quality = face.gaze_score * 0.4 + face.expression * 0.3 + face.head_pose_quality * 0.3;
    let staleness = 1.0 - (-(elapsed_since_cut_ms as f32) / 10000.0).exp();
    let audio_score = face.speech_energy;
    let raw = face_quality * 0.50 + staleness * 0.30 + audio_score * 0.20;
    let reason = if face_quality > 0.6 { "face" } else if staleness > 0.5 { "staleness" } else { "audio" };
    (raw.clamp(0.0, 1.0), reason)
}

1.2 [agent] Add mode detection + branching to `director_loop` in `main.rs`:

rust

#[derive(Clone, Debug, PartialEq)]
enum DirectorScoringMode { Creator, Performance }

fn detect_director_mode(pose: &PoseSignal, echelon: &Option<EchelonState>) -> DirectorScoringMode {
    let motion_active = echelon.as_ref().map(|e| e.speed > 0.15).unwrap_or(false);
    if motion_active { DirectorScoringMode::Performance } else { DirectorScoringMode::Creator }
}

1.3 [agent] Wire `FaceAnalyzer` output to `DirectorHubClient.swift`:

swift

// New method in DirectorHubClient.swift:
func updateFaceSignal(gaze: Float, expression: Float, headPose: Float,
                      speechEnergy: Float, inSpeechPause: Bool) {
    // Debounced to 5Hz
    send(["type": "face", "id": deviceId,
          "gaze": gaze, "expression": expression, "head_pose": headPose,
          "speech_energy": speechEnergy, "in_speech_pause": inSpeechPause])
}
// Wire: FaceAnalyzer.gazeScore → DirectorHubClient.updateFaceSignal() in CameraService callback

1.4 [agent] Add speech pause gating to Creator cuts in `director_loop`:
- Cut only executes when `face_signal.in_speech_pause == true` AND pending cut exists
- Fallback: if pending cut > 4s, execute regardless

1.5 [agent] Add `cut_execute` WS message handler to `DirectorHubClient.swift`:

swift

case "cut_execute":
    let camera = json["camera"] as? String ?? ""
    let transition = json["transition"] as? String ?? "hardCut"
    let reason = json["reason"] as? String ?? ""
    await MainActor.run {
        self.autoDirector?.serverCut(cameraID: camera, transition: transition, reason: reason)
    }

Add `serverCut(cameraID:transition:reason:)` to `AutoDirector.swift` — sets `activeCameraID`, fires `onCut`, logs the cut.

1.6 [human] Test: 3 cameras, seated person talking, 2 min session → must produce ≥ 8 cuts.

---

Wave 2: Performance Mode + Echelon Integration (Days 4-6)

2.1 [agent] Forward Echelon state from iOS to server at 4Hz in `EchelonBridge.swift`:

swift

// In the 60Hz display link loop, add:
echelonForwardCounter += 1
if echelonForwardCounter % 15 == 0 {  // 60Hz / 15 = 4Hz
    let lat = getLatentState()
    let lex = getLexicon()
    directorHubClient?.updateEchelonState(
        section: getCurrentSection(),
        tension: lex.tension,
        transitionIntensity: lex.transition_intensity,
        speed: lat.speed,
        rotation: lat.rotation
    )
}

2.2 [agent] Forward `bar_fire` events immediately on `shouldFireBar()`:

swift

// In the same display link loop:
if shouldFireBar() {
    let temporal = getTemporalState()
    directorHubClient?.sendBarFire(tempo: temporal.tempo, phase: temporal.phase, confidence: temporal.confidence)
}

Add `sendBarFire()` + `updateEchelonState()` to `DirectorHubClient.swift`.

2.3 [agent] Add `EchelonState` storage + handler to `main.rs`:

rust

struct EchelonState {
    section: u8,
    tension: f32,
    transition_intensity: f32,
    speed: f32,
    rotation: f32,
    last_update_ms: u64,
}
// WS handler for "echelon" and "bar_fire" message types

2.4 [agent] Implement `BeatQuantizer` in `main.rs` (~120 lines):

rust

struct BeatQuantizer {
    pending_cut: Option<PendingBeatCut>,
    last_bar_fire_ms: u64,
    last_speech_pause_ms: u64,
    mode: DirectorScoringMode,
}
impl BeatQuantizer {
    fn on_bar_fire(&mut self, tempo: f32) { self.last_bar_fire_ms = now_ms(); }
    fn on_speech_pause(&mut self) { self.last_speech_pause_ms = now_ms(); }
    fn propose(&mut self, camera: String, score: f32, reason: String) {
        // Replace pending if score is higher
    }
    fn tick(&mut self) -> Option<PendingBeatCut> {
        match self.mode {
            Performance => { if now_ms() - self.last_bar_fire_ms < 100 || age > 2000 { return take } }
            Creator     => { if now_ms() - self.last_speech_pause_ms < 200 || age > 4000 { return take } }
        }
        None
    }
}

Integrate into `director_loop`: scoring → `quantizer.propose()` → `quantizer.tick()` → broadcast `cut_execute`.

2.5 [agent] Deprecate `AutoDirector` cut execution in `AutoDirector.swift`:
- `updateFromEchelon()` still computes `cameraScores` for display
- Remove `executeCut()` call from `updateFromEchelon()` — scores only
- Add `serverCut()` method (see Wave 1.5)
- Retain `manualSelect()` for user tap overrides

2.6 [agent] Broadcast `echelon_state` to TD bridge via `/td/ws` in `main.rs`:
- When Echelon state updates, forward to TD WebSocket channel
- TD receives: `{type: "echelon_state", section: "build", tension: 0.72, speed: 0.8}`
- This means TD visual transitions are driven by the same Echelon events as camera cuts — synchronized at the source

2.7 [human] Test Performance mode: Mocopi + 3 cameras, 3 min dance → ≥80

---

Wave 3: Unified Session Controller (Days 6-8)

3.1 [agent] Create `Desktop/MotionMixApp/MotionMixApp/Services/UnifiedSessionController.swift` (~250 lines):

swift

@MainActor
final class UnifiedSessionController: ObservableObject {
    enum Phase: String { case idle, prep, countdown, rolling, stopping, review }
    enum ModePreference: String, CaseIterable { case auto, creator, performance }

    @Published var phase: Phase = .idle
    @Published var sessionId: String = ""
    @Published var sessionElapsed: TimeInterval = 0
    @Published var activeMode: String = "auto"
    @Published var cutLog: [CutLogEntry] = []
    @Published var modePreference: ModePreference = .auto

    // Dependencies (injected at app wiring time)
    weak var directorHub: DirectorHubClient?
    weak var recordingService: RecordingService?
    weak var echelonBridge: EchelonBridge?
    weak var peerDiscovery: PeerDiscovery?

    func startSession() async { /* prep → countdown → rolling */ }
    func stopSession() async { /* stopping → review + exports */ }
    func logCut(_ entry: CutLogEntry) { cutLog.append(entry) }
    func exportCSV() -> URL? { /* write to MotionMixRushes/ */ }
    func exportEDL() -> URL? { /* CMX 3600 format */ }
}

struct CutLogEntry: Codable, Identifiable {
    let id: UUID
    let sessionRelativeMs: Int
    let fromCamera: String?
    let toCamera: String
    let transition: String
    let reason: String
    let score: Float
    let mode: String       // "creator" | "performance"
    let section: String?   // Echelon section if Performance mode
}

3.2-3.6: Wire session controller → server endpoints, countdown broadcast, PerformView UI.

---

Wave 4: Post-Production Export (Days 8-10)

4.1 [agent] `GET /session/:id/csv` in `main.rs` — returns:

session_id,timestamp_ms,rel_time_s,from_camera,to_camera,transition,reason,score,mode,section

4.2 [agent] `GET /session/:id/edl` in `main.rs` — returns CMX 3600 EDL:

TITLE: MotionMix Session abc123
FCM: NON-DROP FRAME

001  iPhone-A V  C        00:00:00:00 00:00:04:15 00:00:00:00 00:00:04:15
002  iPhone-B V  D 015    00:00:04:15 00:00:10:00 00:00:04:15 00:00:10:00
* TRANSITION: crossfade (section-change, score=0.78)

4.3-4.6: Download buttons in review phase, clip registry, DirectorOverlayView, test DaVinci import.

---

Waves 5-6: TD Sync, Polish, Hardening (Days 10-14)

5.1 Add session_start/stop to `motionmix_ws.py` — TD starts visual recording in sync
5.2 Echelon state → TD CHOP channels via bridge — visual transitions synchronized with cuts
5.3 Mode indicator badge in `MultiCamView.swift` top bar: CREATOR / PERFORMANCE / AUTO
5.4 Server failover in `AutoDirector.swift`: if no `cut_execute` from server in 5s, re-enable local cuts
5.5 Cut log ticker in `MultiCamView.swift`: scrolling last 3 cuts with reasons
6.1-6.5: Server reconnect resilience, variety director fallback, thermal throttling

---

The Integration Point Between Sessions

The previous Echelon session and this Unified Director work share one critical seam: `shouldFireBar()`.

In the previous session: `shouldFireBar()` was wired to detect the performer's actual rhythmic phase via Echelon's `detect_periodicity()`. The performer's body phase crossing (> 0.8 → < 0.2) fires the bar trigger.

In this session: `shouldFireBar()` feeds directly into `DirectorHubClient.sendBarFire()` → BeatQuantizer in the server → camera cuts land at the exact moment the performer's body crosses the phase boundary.

This is the "performer = clock" principle made real for video: the performer's movement phase now determines not just when music generates, but when the camera cuts.

The TD visual transitions receive the same Echelon state events → camera cuts, music changes, and visual transitions all quantize to the same body-rhythm clock.

---

File Touch Map

### New Files
| File | Lines | Wave |
|------|-------|------|
| `Services/FaceAnalyzer.swift` | ~180 | 0.3 |
| `Services/SpeechEnergyDetector.swift` | ~80 | 0.4 |
| `Services/UnifiedSessionController.swift` | ~250 | 3.1 |
| `Views/DirectorOverlayView.swift` | ~100 | 4.6 |

### Modified Swift Files
| File | What Changes |
|------|-------------|
| `Services/EchelonBridge.swift` | Add `directorHubClient` property, `echelonForwardCounter`, 4Hz state send, immediate `bar_fire` send |
| `Services/DirectorHubClient.swift` | Add `updateFaceSignal()`, `updateEchelonState()`, `sendBarFire()`, handle `cut_execute` WS message |
| `MultiCam/AutoDirector.swift` | Remove `executeCut()` from `updateFromEchelon()`, add `serverCut()`, retain `manualSelect()` |
| `Views/MultiCamView.swift` | Mode badge, cut log ticker, session controls |
| `Views/PerformView.swift` | Session phase UI, UnifiedSessionController integration |
| `MotionMixApp.swift` | Wire: `diffusionService.echelonBridge = echelonBridge`, pose features → dynamics[63:69] |

### Modified Rust
| File | What Changes |
|------|-------------|
| `multicam-server/src/main.rs` | Add: FaceSignal, EchelonState, BeatQuantizer, SessionManager, director_score_creator(), detect_director_mode(), "face"/"echelon"/"bar_fire"/"cut_execute" WS messages, /session/* routes (~450 new lines) |

### Modified Python
| File | What Changes |
|------|-------------|
| `motionmix_ws.py` (TD bridge) | Handle session_start/stop + echelon_state → CHOP channel writes |

---

Immediate Next Steps (in order)

1. Fix the two open issues first (30 min):
- Add `diffusionService.echelonBridge = echelonBridge` to app wiring in `MotionMixApp.swift`
- Route Vision pose landmarks into dynamics[63:69] via new `EchelonBridge.setPoseFeatures()`

2. Wave 0.3 + 0.4 (agent): Create `FaceAnalyzer.swift` + `SpeechEnergyDetector.swift`

3. Wave 0.5 (agent): Add `FaceSignal` + handler to multicam-server

4. Wave 1 (agent, parallel): Creator scoring in server + wire face signals from iOS

5. Wave 2.1-2.4 (agent): Echelon state forwarding + BeatQuantizer

6. Human test (Wave 1.6): 3 cameras, sitting, speaking — verify ≥ 8 cuts in 2 min

---

## Tagline
MotionMix Unified Director: One brain, every angle, every moment.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

MotionMixApp/UNIFIED-DIRECTOR-HANDOFF.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture