Grand Diomande Research · Full HTML Reader

Computational Choreography: A Dense Manuscript for the LUME Stack

Computational choreography is the name for the layer of LUME that interprets a performer's body as a live compositional instrument. It is not a synonym for motion capture, gesture recognition, depth rendering, or visual reactivity, although it depends on all of them. It is the discipline of deciding what the machine believes about the body, how that belief changes over time, how movement becomes intention, and how intention becomes a bounded visual or musical event. The current LUME stack already contains the physi

Embodied Trajectory Systems working paper preprint structure candidate score 76 .md

Full Public Reader

Computational Choreography: A Dense Manuscript for the LUME Stack

Draft: 2026-05-27
System: LUME pCloud, Unity DYK, MotionMix, Sony mocopi, Femto/Mega body, Web Fluid Lab, K11 AirDeck
Form: essay manuscript

Abstract

Computational choreography is the name for the layer of LUME that interprets a
performer's body as a live compositional instrument. It is not a synonym for
motion capture, gesture recognition, depth rendering, or visual reactivity,
although it depends on all of them. It is the discipline of deciding what the
machine believes about the body, how that belief changes over time, how movement
becomes intention, and how intention becomes a bounded visual or musical event.
The current LUME stack already contains the physical and computational materials
for such a system: Femto/Mega depth gives the real visible body, Sony mocopi
gives a motion backbone, MotionMix gives a distributed body-state and camera
mesh, Unity DYK gives the live pCloud performance host, the web fluid lab gives
a fast experimental surface, rehearsal capture gives the training archive, and
K11 AirDeck gives a separate safety-gated DJ control path. The task now is to
bind these parts into a choreographic stack whose behavior is stable enough for
performance, expressive enough to approach the Duncan reference direction, and
introspective enough to learn Mo's movement vocabulary over time.

1. The Problem Is Not Reactivity

The central problem in LUME is no longer simply how to make graphics react to a
body. That is the easy version of the problem. A reactive visual system can
sample a depth mask, measure frame difference, increase a glow, and throw
particles when motion crosses a threshold. That produces activity, but it does
not produce choreography. It creates the sensation that something is listening,
but not necessarily that it understands what is being said. The visual may pulse
when the performer is still, explode when the sensor jitters, ignore a subtle
torso shift, or bury the real body under generic effects. These failures are not
usually failures of aesthetic ambition. They are failures of interpretation.

Computational choreography begins from a different premise. The body is not just
a source of signal; it is the author of phrase, weight, direction, hesitation,
pressure, and recovery. A wave is not merely left-right velocity. A weighted
hold is not merely low movement. Recovery is not absence. A burst is not
violence. The system has to preserve those distinctions if it is going to feel
like an extension of a performer rather than a decorative response to sensor
noise. The work, therefore, is to create a middle layer between raw perception
and visual output. This middle layer receives evidence from cameras, sensors,
and rehearsal annotations, forms a belief about the body, extracts movement
primitives, recognizes or estimates gesture templates, constrains their motion
through time, and only then drives visual synthesis.

This is why LUME needs both a practical engineering stack and a choreographic
theory. The engineering stack keeps devices alive, routes packets, renders
frames, records sessions, and prevents unsafe commands. The choreographic theory
decides how to compose with the body. Without the engineering, nothing runs.
Without the theory, everything runs but nothing quite means anything.

2. The Existing Stack Is Already a Choreographic Machine

The current stack should be understood as a distributed instrument rather than a
single application. Mac4 is the visual and sensor-first machine. It owns Unity
pCloud, the live DYK rendering context, the Femto/Mega performer surface, the
local mocopi feed into Unity, the operator monitor, and the web fluid lab. It is
where the visible performer is shaped. It is also where the rehearsal layer
should feel immediate, because this is the machine on which Mo sees whether the
body is actually present, whether the camera sees the body, whether mocopi is
fresh, whether the web lab is reacting, and whether the current visual grammar
is worth keeping.

MotionMix is the shared body-state and device mesh. Its role is not to replace
Unity as the live visual host. Its role is to collect and expose distributed
evidence: iPhones, body truth, device snapshots, linked recording state, and
future fused body signals. MotionMix gives the stack a wider body context. It
can know that three iPhones are present, that a body-state endpoint is stale, or
that a remote recording command was issued. It is the place where rehearsal
becomes multi-angle and multi-device. For the first live DYK path, however,
Unity should not block on MotionMix. A visual performance cannot depend on an
HTTP telemetry path staying perfect. MotionMix is the bus and archive; Unity is
the stage.

K11 is the DJ control and safety machine. Its purpose is intentionally separate
from visual exploration. K11 runs Rekordbox, AirDeck gestures, keyboard/MIDI
command safety, and BodyTruth gating. The most important principle here is that
neither Unity nor mocopi nor the browser should directly command Rekordbox.
BodyTruth can veto unsafe gesture conditions. It can say that the performer is
absent, stale, or low confidence. It can allow the existing local K11 gesture
logic to continue when conditions are healthy. But it should not itself be a
source of DJ commands. This separation lets visual choreography become strange
and expressive without turning every movement experiment into a risk for the
music system.

The iPhones and review cameras occupy a different role. They are not currently
the primary visual body, and they do not need to be perfect skeleton sources to
be valuable. They are evidence machines. They give rehearsal context, alternate
views, positioning feedback, and later training material. A body seen from one
angle may be ambiguous. A phrase labeled in one stream may be clear when reviewed
from another. The iPhones extend the stack from live rendering into memory.

3. Femto/Mega Is the Visible Body

The Femto/Mega body is the core visual material. It gives LUME its real
performer surface: the cyan figure, the pCloud texture, the matte, the depth
grain, the silhouette edge, the body bounds, the center, and the sense that the
visual is being produced by an actual person standing in the room. This matters
more than it might seem. A mocopi skeleton can provide motion, but it cannot
provide the body as seen by the camera. It cannot produce the particular outline,
depth breakup, occlusion, and surface irregularity that make the performer feel
embodied. The Femto/Mega feed is not merely a control input. It is the canvas.

This also explains why fake dancers, stick bodies, skeleton avatars, or generic
stand-ins feel wrong in the DYK direction. They may be useful debugging tools,
but they are not the final aesthetic. The reference direction requires the real
body to be treated as a force field: a luminous body whose edge bends the wall,
whose movement leaves wakes, whose stillness calms the environment, and whose
gestures produce recognizable changes in color, pressure, ribbon, and density.
If the real body disappears under visual effects, the system loses its anchor.
If the body is replaced by mocopi geometry, the system loses its witness.

The Femto/Mega body also has limitations, and those limitations have to be
handled choreographically rather than denied. Depth can fall away when Mo steps
too far back. Arms can thin out. Edges can shimmer. Digital framing can make the
body blocky. Background motion can create false activity. The correct response
is not to abandon the depth body, but to interpret it through Body Truth and
temporal constraints. A real body source can be imperfect and still be the visual
authority, as long as the stack knows when it is fresh, when it is stale, when it
is noisy, and when it should fade rather than hallucinate a performer.

4. Mocopi Is the Motion Backbone, Not the Visible Performer

Sony mocopi solves a different problem. It gives the system continuity of motion
when the camera is uncertain. It can track head, torso, hips, arms, orientation,
lean, and whole-body timing. It can continue to report motion when the depth
body is partially lost. It is especially useful for stillness, upper-body
motion, side energy, torso lean, rotational dynamics, and distinguishing
left-side from right-side movement. It therefore belongs in the choreography
layer as a motion brain.

The distinction is subtle but crucial: mocopi should influence how the real body
affects the environment, but it should not replace the real body in the
environment. If the camera sees the performer, the visible performer should come
from the camera. If mocopi says the left arm is moving, the visual system should
increase the left-side force, shift the color field, or bias the fluid current
near the body. But it should not draw a mocopi avatar and pretend that this is
the dancer. The most promising aesthetic is a fusion of roles: Femto/Mega as
presence and surface, mocopi as continuity and phrase.

When mocopi is off or charging, the system should degrade gracefully into
body_only. That is not a failure mode; it is a valid operating mode. The body
mask can still drive soft fluid, contour, trails, and calm response. When
Femto/Mega body is lost but mocopi remains fresh, the system can preserve
low-intensity motion fields, but it should not invent a visible body. This is
the ethical and aesthetic line of the stack: motion may persist; the performer
should not be fabricated.

5. Body Truth Is the First Choreographic Authority

Body Truth is the system's current belief about the performer. It is not merely
a boolean saying "person present." It is a confidence-weighted state that tells
the rest of the stack what sources are fresh, what sources are stale, whether
the body is present, whether mocopi is fresh, what the body center and bounds
are, and whether the stack should trust motion enough to cause visual or musical
events. Body Truth is the first layer at which sensor data becomes interpretation.

The current web lab already uses a v1 state model: fused, mocopi_only, body_only,
stale_body, and absent. These names are simple, but they encode an important
choreography principle. A system should know not only what it sees, but how it
sees. fused means the visible body and motion backbone agree. body_only means
the performer is visible but skeleton motion is stale or off. mocopi_only means
motion is fresh but the visible body is missing. stale_body means the system has
recent memory but should be cautious. absent means the system should not pretend.

This state model is also what prevents false reactivity. If the depth mask sees
shimmer but Body Truth says confidence is low, the system can damp sparks. If
mocopi is fresh but the mask is missing, it can continue gentle motion but avoid
rendering a fake body. If both are fresh, it can authorize richer gestures. Body
Truth therefore becomes the gate through which sensor evidence becomes
choreographic permission.

6. Movement Primitives Are Below Gestures

The system should not jump directly from raw sensor data to named gestures.
Between them lies a set of movement primitives. These are measurable qualities
that can be combined into many gestures: presence, stillness, lateral sweep,
upper reach, torso lean, expansion, burst, and recovery. They are deliberately
not too specific. A lateral sweep may contribute to a wave, but it may also
contribute to a wall bend. Stillness may contribute to a weighted hold, recovery,
or contour focus. Burst may contribute to ribbons, fluid impulse, or a temporary
increase in visual density.

Good primitives are cheap, continuous, and inspectable. They should be recorded
into rehearsal data. They should be visible in the HUD or summary. They should
make sense to Mo when reviewing a session. If a labeled wave moment does not
show lateral sweep or side alternation, the primitive layer is wrong. If a
weighted hold shows high velocity, either the capture is noisy or the movement
was not actually a hold. This is why primitives are the right level for early
debugging. They let the system be corrected before the visual layer is tuned.

The primitive layer is also the first place where body and mocopi can be fused
without becoming confused. If mocopi is fresh, left and right arm motion can
come from mocopi. If mocopi is stale, side activity can be estimated from body
mask motion and center drift. If both are present, the system can combine them.
The output remains the same primitive vocabulary, even when the input modality
changes. That consistency is what makes the later visual mapping stable.

7. Gesture Templates Are Hypotheses, Not Final Truth

Gesture templates sit above primitives. They are not yet machine-learned
classes. They are hypotheses about Mo's movement vocabulary. The current useful
templates are wave, weighted hold, torso lean, both-arm burst, recovery, and
false-reactive. Each one is a soft value, not a hard label. A moment can be 0.7
wave, 0.2 lean, and 0.1 recovery. This matters because dance is rarely a clean
classification problem. A phrase can contain multiple intentions.

The wave template currently means lateral sweep or side alternation. Its first
visual mapping is color phase shift and controlled ribbon response. A weighted
hold means present, stable, low-speed pressure. Its mapping is deep glow, slow
density, and pressure field. A torso lean means body center shift or torso drift,
mapped into wall bend, smear, or lateral fluid current. Both-arm burst means
strong bilateral energy or expansion, mapped into controlled ribbons and fluid
impulse rather than violent sparks. Recovery means the transition from activity
back into stillness, mapped into cooling and calm. False-reactive means activity
without enough confidence, mapped into spark damping.

These templates should be treated as the first draft of a movement language.
They are meant to be tested against rehearsal sessions. During a full weighted
dance session, Mo can mark moments as wave, hold, burst, lean, recovery, or bad.
After the session, the analyzer can compare labeled windows against the template
values. If repeated wave labels do not score as wave, the wave template needs
work. If the system fires wave during false-reactive moments, it needs better
gating. If weighted holds emerge with a consistent feature pattern the template
misses, the template should be updated before any neural network is trained.

8. Bounded Response Is the Constraint That Makes It Feel Choreographed

LIM-RPS remains useful research vocabulary, but the implementation-grounded rule
is simpler: LUME's visual decisions must be bounded, memoryful, multimodal, and
source-aware. This is the difference between a sensor spike and a choreographic
response.

The Lipschitz part means outputs cannot change arbitrarily fast. Every
primitive, template, and effect has an attack and release rate. If the wave
target jumps from 0.0 to 1.0, the actual value moves toward it at a controlled
rate. If burst falls, it decays rather than disappearing in one frame. This
prevents jitter from becoming visual violence and gives the visuals the musical
quality of attack, sustain, and release.

The implicit map means the system does not need one rigid table that maps a
single gesture to a single effect. Instead, it evaluates a continuous field:
Body Truth, primitives, templates, memory, and confidence become effect weights.
Color shift, pressure glow, wall bend, ribbon impulse, calm field, contour
focus, fluid impulse, and spark damping emerge from that map. The result is more
like a mixer than a switchboard.

The recursive part means the present frame depends on previous state. A hold is
defined by duration. A wave is partly defined by side alternation over time.
Recovery is defined by a drop from previous activity. False reactivity is often
recognized by inconsistency across time and sources. Without recursive memory,
the system sees poses but not phrases.

The polymodal part means the stack accepts many kinds of evidence. Femto/Mega
body, mocopi skeleton, pCloud motion, iPhones, MotionMix telemetry, audio, and
labels can all contribute. No one source is treated as eternal authority. The
authority changes with freshness and confidence. The synthesis part is the
output: visuals, trails, fluid, color, and, in the K11 branch, safety gating.
LIM-RPS can describe that ethic in research writing, but implementation docs
must still name the actual source modules and rate-limit logic.

9. Unity DYK Is the Stage

Unity DYK remains the serious performance host. It owns the real pCloud body,
the render pipeline, the camera calibration context, the operator monitor, and
the stable live output. It is the place where the actual performer should be
visible. It is also the place where the system can record synchronized body
state, mocopi state, shader globals, labels, and session summaries.

Unity's role should not be diluted by every experiment. The camera preview
problems already showed that putting too many monitoring surfaces inside Unity
can destabilize the editor and slow the machine. The correct pattern is to keep
Unity's Game View clean and performance-oriented, while operator surfaces and
camera grids sit beside it or outside it. The live visual output should be the
art, not the control room.

The DYK direction specifically requires the body to behave like a field. The
reference is not simply a cyan body on a red wall. It is a body whose movement
authors the surrounding environment. Hands and arms should have local influence.
Fast movement should throw controlled particles or ribbons. Stillness should
return the scene to a calm breathing field. Trails should be persistent enough
to show movement history but not so heavy that the body disappears. The body
should be readable before the effects become maximal.

10. The Web Fluid Lab Is the Sketchbook

The web lab is valuable because it allows faster visual iteration. It can adapt
Three.js ideas, fluid fields, refraction, body-mask emitters, mode comparison,
browser HUDs, and shader experiments more quickly than Unity. It is a place to
discover visual grammar. The strongest results can then either remain as a
parallel local output or be ported back into Unity once proven.

The web lab must follow the same aesthetic rule as Unity: do not fake the
performer. The body in the browser should come from the real Femto/Mega mask or
pCloud-derived feed. Mocopi can drive motion truth, anchors, stillness, and
gesture values, but it should not become a rendered avatar. The browser can
refract, smear, contour, glow, and fluidize the real body. It should not invent
a dancer when the body is missing.

Mode D, the Duncan body oracle, is currently the flagship direction. Its value
is that it tries to make the real body feel dimensional, strange, and embodied
without leaning on fake skeletons. Recent mask-enhancement work belongs to this
same philosophy. The Femto mask is smoothed temporally, pinholes are filled,
isolated sparkles are suppressed, and contours are preserved. The goal is not to
make the body generic. The goal is to make the real body more legible as a live
visual material.

11. Rehearsal Capture Is the Bridge to Mo's Vocabulary

The stack should learn Mo's movement vocabulary from rehearsal, not from generic
assumptions. The planned full weighted session is exactly the right next step.
A long take with weights will produce real phrases: slow pressure, fatigue, holds,
arm sweeps, torso shifts, recovery, and moments where the system misreads
movement. These are more valuable than isolated demo clips because the system
needs to learn transitions and repeated motifs, not just poses.

The recording should not split the session into clips while it is happening.
Instead, it should write continuous data and timestamped markers. Mo should be
able to press a label key during or after a moment: keep, wave, burst, lean,
weighted hold, recovery, or false-reactive. The analyzer can later build windows
around those labels. This preserves the integrity of the dance while still
giving the system landmarks.

The important data streams are mocopi skeleton snapshots, DYK shader globals,
Femto/Mega body summary, pCloud/mask state, mode state, gesture triggers,
MotionMix telemetry when available, and review-camera evidence. motion.jsonl is
the temporal spine. labels.jsonl is the semantic annotation layer. session.json
is the context. Summaries and future template files are the feedback loop. The
rehearsal archive is how computational choreography stops being a hand-tuned
fantasy and becomes Mo-specific.

12. Training Is Later, Not Never

A neural network may eventually be useful, but it is not the immediate answer.
Training too early would teach the system the current setup's noise: depth
dropout, camera framing instability, inconsistent mocopi availability, and
unclean labels. A model trained before the movement vocabulary is understood
would become an expensive way to preserve confusion.

The right sequence is template logic first, rehearsal capture second, human
review third, template refinement fourth, lightweight classifiers fifth, and
neural networks only after repeated labeled sessions demonstrate that rules are
insufficient. The future model should be trained to recognize Mo's personal
movement phrases, not generic wave or dance labels. It should operate on curated
windows with positive and negative examples. It should include false-reactive
labels. It should remain subordinate to BodyTruth confidence and bounded-response
constraints.

The best first model, if needed, is probably not a large vision model. It is
more likely a small temporal classifier over primitives, mocopi trajectories,
body mask dynamics, and label windows. The visual body can remain real and
shader-driven while the learned layer improves gesture phrase recognition. The
neural network should sharpen the choreography layer, not replace it.

13. MotionMix Turns Performance Into Evidence

MotionMix matters because choreography needs memory and evidence. A single live
view can be beautiful, but a rehearsal system needs to review what happened. The
iPhones, linked recording, body-truth endpoints, and device snapshots make the
stack more than a renderer. They make it a system that can ask what the body did,
what the sensors saw, what the visuals believed, and whether those things agreed.

The operator monitor should therefore be thought of as a rehearsal cockpit, not
just a debugging panel. It should tell Mo whether MotionMix sees phones, whether
body truth is fresh, whether linked recording is armed, whether the local Unity
take is recording, and whether the current setup has enough coverage. But it
should not overload Unity with camera previews if doing so makes the visual
system lag. The better pattern is to separate monitoring surfaces from the live
art path while keeping their data synchronized.

In the long run, MotionMix can become the canonical bus for body-state history.
In the short run, it is already useful as telemetry, device coordination, and
review infrastructure. The distinction matters: MotionMix can help record the
truth without becoming a hard dependency for every visual frame.

14. K11 Proves the Safety Principle

The K11 AirDeck branch is a useful proof of a broader principle: choreography
can influence permission without becoming command. BodyTruthGate on K11 does
not generate Rekordbox commands. It only prevents local gesture commands when
shared body truth is stale, absent, or low confidence. The command source
remains the K11 bridge and its own gesture logic.

This separation is essential as the visual system becomes more expressive. A
web lab wave, a mocopi arm motion, or a DYK burst can be visually dramatic
without accidentally firing a DJ command. Safety is an architectural boundary,
not a late-stage patch. If the system eventually maps dance phrases to musical
events, those events still need to pass through a K11-style gate. Expressive
freedom is only sustainable when command authority is controlled.

15. The Aesthetic Standard

The Duncan-level target is not achieved by adding more particles. It is achieved
when the system makes the body feel like it has consequence. A still body should
quiet the field. A slow weighted hold should create pressure and density. A wave
should alter color and current. A torso lean should bend space. A burst should
be controlled and directional, not a violent spray. Recovery should be a visible
cooling of the system. The performer should be readable throughout.

The body should not look like a blob, but it also should not become a clean
cartoon cutout. It should preserve edge detail, depth grain, internal contour,
and the instability of real capture. That instability is part of the material,
as long as it is choreographically constrained. The goal is not clinical
segmentation. The goal is an embodied field of light that still clearly belongs
to the person in front of the camera.

This is where the web lab and Unity can help each other. The browser can explore
stranger fluids, refraction, mode grammars, and mask treatments. Unity can keep
the real pCloud body and projection path stable. The final aesthetic may be a
Unity port of the best browser ideas, a browser output fed by Unity, or a hybrid
where Unity remains the body authority and the browser remains a secondary visual
instrument. The decision should be made by rehearsal evidence and live performance
stability, not by preference for one tool.

16. What the Stack Is Becoming

The simplest description of the stack is:

body → truth → primitives → templates → bounded outputs → rehearsal → refinement

The deeper description is that LUME is becoming a recursive performance system.
It senses the body, interprets movement, produces visuals, records the result,
lets Mo label what mattered, analyzes the session, and then changes how it
responds next time. This loop is the actual computational choreography. It is
not any single shader, model, camera, or machine.

The stack should therefore be judged by whether the loop improves. Does the
system better distinguish stillness from false motion? Does a wave become more
recognizable? Does a weighted hold produce a signature pressure field? Does
recovery calm the screen? Does the body remain visible? Does MotionMix preserve
enough evidence for review? Does K11 remain safe? Does the system become more
Mo-specific after every rehearsal?

If those answers improve, the project is moving in the right direction.

17. Immediate Research Program

The immediate program is straightforward. First, stabilize the real body
aesthetic in mode D and Unity DYK. The body must remain readable, dimensional,
and real. Second, run a full weighted rehearsal session with continuous capture
and labels. Third, review the session and identify repeated movement phrases.
Fourth, adjust gesture templates before training any model. Fifth, promote the
strongest phrase-to-effect mappings into production visual modes. Sixth, only
after enough repeated examples exist, consider a lightweight temporal model for
personal gesture recognition.

This path avoids the trap of chasing general intelligence before the system has
learned the local body. It also avoids the trap of overbuilding visuals before
the motion truth is stable. It treats rehearsal as the source of knowledge. That
is the right posture for a system whose subject is not a generic user, but Mo's
specific movement, weight, rhythm, and taste.

18. Closing

Computational choreography is the layer that lets LUME become more than a
projection, more than a sensor demo, and more than a DJ controller. It is the
practice of giving the machine a body-aware memory and a bounded expressive
grammar. The Femto/Mega body gives the visible performer. Mocopi gives the
motion spine. MotionMix gives distributed evidence. Unity gives the stage. The
web lab gives the sketchbook. The rehearsal recorder gives memory. K11 gives a
safety boundary. Bounded response gives the temporal discipline that keeps all
of it from becoming noise; LIM-RPS is one research vocabulary for that
discipline, not the universal implementation name.

The work now is to rehearse with the system until Mo's movement vocabulary is
visible in the data and unmistakable in the visuals. When that happens, the
visuals will not merely react to the body. They will begin to answer it.

Promotion Decision

Convert into the standard paper schema, add citations, and render a draft PDF.

Source Anchor

LUME-CC/01-foundation/mac4-manuscript.md

Detected Structure

Abstract · Method · Evaluation · Figures · Architecture