Comparative Analysis: Two Computational Choreography Documentation Sets
**Set B** — `Desktop/MotionMix/research/computational-choreography-nko-2026-05-27/` (38 files) Research-architectural, K11 AirDeck DJ control + movement language.
Full Public Reader
Comparative Analysis: Two Computational Choreography Documentation Sets
Set A — `Desktop/computational-choreography/` (45 files)
Implementation-grounded, production system documentation.
Set B — `Desktop/MotionMix/research/computational-choreography-nko-2026-05-27/` (38 files)
Research-architectural, K11 AirDeck DJ control + movement language.
---
What Each Set Is
Set A is the full production stack written from the inside out — reading real source files.
It documents EchelonBridge, SAN, DELL, LIM-RPS, the 128D canonical vector, training data, KARL reward,
the distributed camera mesh, and NKo synthesis as a future arc. The primary output is music/audio:
the body drives sound through a learned latent space.
Set B is a research design pack written from the outside in — describing what the system should become.
It documents the K11 AirDeck gesture control pipeline, BodyTruth as a sensor contract, camera-first as a
design requirement, the movement lexicon as a teachable dictionary, and NKo as the notation and memory
layer. The primary output is DJ command dispatch: the body controls Rekordbox.
Neither set is wrong. They describe the same body, different destinations.
---
Where They Converge
1. Computational Choreography as the Core Discipline
Both arrive at the same definition, in different language.
Set A: The body produces a learned latent state (z*) through fixed-point convergence. That state is a
compositional representation, not a button press.
Set B: "A choreography phrase is: body_state + spatial_zone + movement_shape + timing + intent +
safety_policy + evidence." The phrase carries meaning because it includes all its context.
Both reject "gesture detection = left hand up -> press Z." Both insist on richer structure. One
formalizes it mathematically (z*, DEQ); the other formalizes it as a named structured record.
2. Named Movement Vocabulary
Both define a lexicon of gestures with names, body requirements, and status tracking.
Set A (`04-generative-output/motion-lexicon.md`): 20 gestures (Arm Sweep, Torso Twist, Jump Peak,
Contraction, etc.), 8 emotion archetypes (Tension, Release, Float, Ground, etc.), each with SAN output
routing implications.
Set B (`04-gesture-library/movement-lexicon.md`): `left_hand_raise_play`, `airdeck_platter_scratch_left`,
`airdeck_safe_stop`, etc. — each with phrase name, family, body parts, zone, motion type, command, and
promotion status.
The vocabularies are different (musical emotion vs. DJ command) but the architecture is the same: named
phrases, structured requirements, trackable status.
3. NKo as Notation and Memory, Not Live Control
Both sets arrive at the same NKo position despite approaching from different angles.
Set A (`07-nko-synthesis/overview.md`, `maoe-routing.md`): NKo CTC routing (MAOE) is a V6+ future
direction. The live control loop today uses FuseMoE, not MAOE-NKo.
Set B (`07-nko-computational-language/body-inscriptions.md`): "Do not force NKO into the live control
loop yet. The first job is to make the motion language stable. NKO should enter as the durable notation
layer after the gesture schema is reliable."
Agreement: NKo provides naming, inscription, and cultural memory for movement phrases. It enters the
architecture after the gesture vocabulary is stable — not before.
4. Failed Captures Are Useful Data
Set A (`05-training-and-learning/data-capture.md`): Training captures include mis-fires, still periods,
and every session frame without filtering.
Set B (`03-perception/quality-gates.md`): "Failed captures should not promote commands, but they should
be kept. They teach the coach: out of frame, terminal covering screen, wrong side of body, hand hidden."
Both treat negative examples as a first-class training resource.
5. Sensor Redundancy and Graceful Degradation
Set A: The 128D vector has a `modality_mask` bit at position [75]. Mocopi features at [76:100] decay
exponentially with staleness (200ms threshold, `exp(-staleness/0.5)`). The system doesn't break without Mocopi.
Set B: BodyTruth has an explicit `degraded_camera_only` mode. "Fuse for confidence. Gate for safety.
Do not make one optional sensor a hard dependency."
Both encode the same principle. Set B names it more explicitly.
6. Session-Specific Adaptation
Set A: TTT (test-time training) adapts SAN weights within the first ~30 seconds of a session. The SAN
learns "this dancer's specific body-music grammar."
Set B: The Coach Training Loop + capture gate + promotion manifest. Each recording session produces
evidence that shapes what the system will recognize.
Different mechanisms (online weight updates vs. offline promotion pipeline), same intent: the system
becomes more accurate about the specific performer over time.
7. Distributed Ownership, No Central Mac Required
Set A: Distributed camera nodes — each iPhone is self-contained, StageView discovers via Bonjour,
no multicam-server required.
Set B: K11 owns live command safety; Mac4 owns visuals; MotionMix owns archive. "Mac4 can provide rich
body data but it should not own Rekordbox."
Both distribute responsibility by machine role and reject single-point-of-failure centralization.
---
Where They Diverge
These are complementary differences, not conflicts.
1. Output Domain
Set A: body → latent space → music synthesis. The SAN produces audio parameters: tempo, pitch, timbre,
reverb, filter. Sound is the primary output.
Set B: body → gesture recognition → DJ command. The command gate produces Rekordbox keystrokes:
play/pause, sync, next track, scratch nudge. Live DJ control is the primary output.
These are different products sharing the same underlying movement language.
2. Primary Sensor
Set A: IMU (iPhone accelerometer/gyro in pocket) is the primary sensor. Camera (Vision, 19 joints at
30Hz) is the secondary path. Mocopi is a quality upgrade for V6.
Set B: K11 camera (Orbbec Bolt RGB, MediaPipe BlazePose) is the primary sensor. Everything else —
Mocopi, Watch, iPhone — is an optional confidence boost.
This reflects deployment context, not disagreement. Set A's performer has an iPhone on their body.
Set B's performer stands in front of a mounted camera. Neither sensor hierarchy is universally correct —
the correct one depends on which rig is deployed.
3. Temporal Resolution
Set A: 60Hz, driven by CADisplayLink. Every audio frame correlates with exactly one display frame.
Body-music synchronization requires sub-17ms response.
Set B: No explicit frame rate commitment. "Pose freshness within the live threshold." DJ control
tolerance is more generous — a 100ms gesture recognition latency is acceptable for a hand raise.
4. Latent Representation Depth
Set A: 128D canonical vector. DELL fixed-point iteration converges to z*. The latent space is a
continuous, learned, mathematically grounded representation. Every output passes through this bottleneck.
Set B: BodyTruth is a confidence/state object (present, posture, seated, sources, mode). It's a
structured status report, not a learned embedding. The AirDeck pipeline is rule-based at inference time.
5. Where Machine Learning Lives
Set A: SAN inference runs live at 30Hz (every other EchelonBridge tick). ML is always in the loop,
shaping every music parameter in real time.
Set B: ML lives in the training/promotion pipeline, not in live control. The live control path is
threshold + zone + timing rules. ML trains the coach; the coach authorizes the rules.
6. Safety Architecture Formality
Set A: No explicit promotion gate. Gestures drive music parameters continuously. There's no concept
of "this gesture must prove itself before it fires."
Set B: 7-stage promotion pipeline (define → self-play → capture → gate → proof matrix → manifest → load).
Hard invariant: "A newly detected gesture cannot become live without proof." Baseline exception for
the two proven hand raises.
This difference exists because the stakes differ. A wrong SAN output changes the music — recoverable.
A wrong Rekordbox command stops a live set — not recoverable mid-performance.
7. Virtual Performer / Self-Play
Set B has an explicit self-play system: a virtual performer generates synthetic body and hand
trajectories through the same AirDeck detector used by live camera data. It can run without the user
standing in frame.
Set A has no equivalent. All training data comes from real captured sessions via SANTrajectoryLogger.
Synthetic 128D vectors are not generated.
8. Recording Pipeline Formality
Set B: One session folder per recording. Manifest JSON, raw AVI, overlay AVI, display AVI, keyframes,
pose JSONL, labels JSONL, gesture events, command candidates, capture gate report, promotion check.
Everything is traceable.
Set A: SANTrajectoryLogger writes one JSONL file of (input_128d, output_target) pairs. No video
archive, no session manifest, no capture gate report. The training data is minimal by design —
just enough for pair-based SAN training.
---
Where They Contradict
These are genuine conflicts that require resolution.
1. Sensor Dependency Framing (the most important contradiction)
Set A documents Mocopi as a quality upgrade that improves the 128D representation. The architecture
shows [76:100] Mocopi slots in the canonical vector. The V6 roadmap says "get Mocopi signal into V6
training as a real contributor." The overall framing is: with Mocopi you get more, without it you get
less.
Set B makes a harder claim: "Mocopi must not be treated as required. If hand raise works with the
camera, then camera-first AirDeck is valid. The full gesture library should be built around camera
pose and then enhanced with optional sensors."
The difference is subtle but architecturally critical. Set A's framing allows a system where Mocopi
absence causes silent performance degradation with no explicit fallback signal. Set B's framing
requires that camera-only be a first-class tested baseline, not a fallback.
The resolution: Set B is correct as a design discipline. The camera-first principle should be
explicitly documented in Set A's sensor fusion section. The V6 roadmap should not treat Mocopi
as a required upgrade — it should treat it as an optional confidence booster for which the camera
baseline must remain fully functional.
2. The Term "Movement Phrase" Means Opposite Things
Set A: A movement phrase is an output of the SAN — the system classifies the current body state and
selects a phrase label (e.g., "Arm Wave") that routes SAN outputs. The phrase is what the machine
produces from movement.
Set B: A movement phrase is an input gesture description (e.g., `left_hand_raise_play`) — the
specification of what the body should do to trigger a command. The phrase is what the human performs.
These are using the same term for genuinely different things: input command specification vs.
output classification label.
The resolution: rename one of them. Set A's "motion phrase" → "motion regime" or "motion archetype"
(it's more like a named cluster of SAN output routing). Set B's "choreography phrase" keeps its name
as the canonical input command unit. The glossary for the combined system needs to be explicit.
3. NKo Implementation Layer
Set A (`07-nko-synthesis/maoe-routing.md`): MAOE-NKo is a proposed CTC routing architecture
inside the live inference pipeline — a parallel to FuseMoE with anticipatory routing, lookahead
window, and orthogonality constraints. It would run at inference time.
Set B (`07-nko-computational-language/body-inscriptions.md`): NKo is the inscription and memory
layer. "Do not force NKO into the live control loop yet." It's positioned explicitly outside
the live inference path.
These are not fully contradictory — Set A's MAOE is explicitly V6+, and Set B is about the
current AirDeck design. But a reader studying both would get two different pictures of NKo's role:
future live inference mechanism (Set A) vs. permanent notation layer that should stay outside
the live loop (Set B).
The resolution: clarify the architectural split. NKo has two roles that don't conflict:
(a) notation and inscription layer — always present, records movement phrases as durable claims
(this is Set B's layer, and it's what should be built first); (b) anticipatory routing layer —
MAOE-NKo CTC, future V6+ direction, only for the music/movement latent space, not for command
dispatch (this is Set A's layer, and it's an extension on top of (a) after the notation layer
is established).
---
Synthesis: How to Combine the Literature
The Shared Stack Model
The two sets describe different vertical slices of the same body-as-computer system:
body movement
|
v
[BodyTruth contract] <- Set B contributes this explicitly
|
+---> [AirDeck gesture recognition] <- Set B
| |
| v
| [command gate]
| |
| v
| [Rekordbox / DJ output]
|
+---> [128D canonical vector / DELL] <- Set A
| |
| v
| [SAN pipeline]
| |
| v
| [music / audio output]
|
+---> [LUME visuals] <- Set A (with optional Mocopi)
|
+---> [body inscriptions / KARL cards] <- both sets, different namingThese paths are not in competition. They share the sensor layer and the movement language;
they diverge at the output domain.
Five Specific Actions for the Combined Literature
1. Write a unified sensor contract. BodyTruth (Set B) should replace the implicit modality_mask
logic in Set A. The contract is: one object that answers "who is present, how confident, what sources,
what mode (active/degraded/absent)." Both music generation and DJ command dispatch consume it.
This becomes `03-perception/body-truth.md` in the combined structure.
2. Merge the movement lexicons. Create one table with columns:
phrase name / family / body parts / zone / output domain / SAN regime / DJ command / risk / status.
Set A's 20 music-gesture archetypes and Set B's AirDeck phrases share rows in this table.
Some phrases will appear in both output columns (a hand raise that triggers play/pause AND routes
SAN output toward a particular music regime). That's a feature, not a conflict.
3. Adopt Set B's promotion discipline for SAN gestures. Set A currently has no gate before a
gesture affects music output. For performance use, apply Set B's capture-gate / self-play / promotion
pipeline to Set A's SAN gesture modes. The KARL trajectory card (Set A) becomes the evidence record
for promotion; the body inscription claim (Set B) becomes the durable status marker.
4. Apply the camera-first rule to Set A's sensor documentation. Add an explicit section to
`02-body-as-input/sensor-fusion.md`: "Camera (Vision, 19 joints) is the camera-first baseline
for gesture detection. IMU is the primary body state sensor. Mocopi enhances confidence but is
never required for gesture eligibility." Retire any framing that implies Mocopi is a prerequisite.
5. Build the self-play system for Set A. Set B's virtual performer concept should be implemented
for the music side: a synthetic 128D vector generator that can step through pose sequences without
real sensors. This enables SAN regime testing, training data augmentation, and coverage analysis
(are all 8 emotion archetypes reachable from plausible body states?).
Recommended Combined Structure
For a single integrated documentation set, merge into this structure:
00-index.md
01-foundation/
computational-choreography.md <- merged thesis statement
why-this-matters.md
glossary.md <- unified terminology (resolve "phrase" conflict)
02-body-sensing/
body-truth.md <- Set B's BodyTruth as the shared contract
camera-first.md <- Set B's principle, applied to both pipelines
imu-and-motion.md <- Set A
vision-pose.md <- Set A
mocopi-bones.md <- Set A (with camera-first correction)
femto-bolt-depth.md <- Set A
sensor-fusion.md <- merged (Set A + Set B fusion rule)
quality-gates.md <- Set B
03-movement-language/
lexicon.md <- merged: music archetypes + AirDeck phrases
128d-canonical-vector.md <- Set A
dell-architecture.md <- Set A
lim-rps-theory.md <- Set A
deck-zones.md <- Set B
event-taxonomy.md <- Set B
04-output-paths/
music-pipeline.md <- Set A (SAN + audio engine)
airdeck-dj-control.md <- Set B (command gate + Rekordbox)
lume-visuals.md <- Set A
photography.md <- Set A
05-training-and-memory/
data-capture.md <- Set A (SAN pairs)
annotation-schema.md <- Set B
session-archive-structure.md <- Set B
dataset-splits.md <- Set B
promotion-pipeline.md <- Set B (adopted for both pipelines)
virtual-performer.md <- Set B (to be built for SAN too)
body-inscriptions.md <- Set B = KARL trajectory cards under NKo names
karl-reward.md <- Set A
06-distributed-mesh/ <- Set A (largely unchanged)
07-nko-synthesis/
overview.md
movement-notation.md <- Set B layer: inscription and naming (build first)
claim-types.md <- Set B
maoe-routing.md <- Set A layer: V6+ CTC routing (build after notation)
embodied-to-digital.md <- Set A
convergence-vision.md <- Set A
08-machine-roles/ <- Set B (K11/Mac4/MotionMix/iPhone)
safety-boundaries.md <- Set B (hard invariant: only K11 sends Rekordbox)
09-roadmap/
next-30-days.md <- Set B (concrete and actionable)
v6-roadmap.md <- Set A
open-questions.md <- Set B
10-reference/ <- Set A (hardware, ports, build commands, critical files)---
Summary Table
| Dimension | Set A | Set B | Resolution |
|---|---|---|---|
| Output | Music/audio | DJ command dispatch | Both valid, shared sensor layer |
| Primary sensor | IMU (pocket iPhone) | Camera (K11 Bolt) | Context-dependent; camera-first rule applies to both |
| ML at inference | Live (SAN, 30Hz) | No; rules-based live, ML in training | Both correct for their domain |
| Movement phrase | Output regime label | Input command spec | Rename Set A to "motion regime" |
| Latent representation | 128D z* (DELL) | BodyTruth state object | Both; BodyTruth wraps z* derivation |
| Safety gate | None | 7-stage promotion pipeline | Adopt Set B's pipeline for Set A too |
| Self-play | Not present | Virtual performer | Build for Set A |
| NKo role | Future live routing (MAOE-NKo, V6+) | Notation layer (build now) | Two layers, not conflicting; notation first |
| Recording | JSONL pairs only | Full session archive | Adopt Set B's archive structure |
| Sensor dependency | Mocopi as upgrade | Camera-first, Mocopi optional | Apply Set B's camera-first discipline to Set A |
The two literatures are most valuable together. Set A gives the production depth and mathematical
foundation. Set B gives the design discipline, safety architecture, and near-term AirDeck roadmap.
Combined, they describe the full body-as-computer stack from raw sensors through learned representation
through command dispatch through cultural inscription.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
computational-choreography-comparison.md
Detected Structure
Method · Evaluation · Architecture