12
To make this practical, think of Echelon’s sound engine as a set of **continuous DSP fields** that the latent pushes and pulls, rather than a stack of on/off effects. Each “part of the lexicon” we defined earlier maps to a particular way you sculpt spectra, time, and dynamics.
Full Public Reader
To make this practical, think of Echelon’s sound engine as a set of continuous DSP fields that the latent pushes and pulls, rather than a stack of on/off effects. Each “part of the lexicon” we defined earlier maps to a particular way you sculpt spectra, time, and dynamics.
I’ll walk through the major behaviors and tie each one to concrete DSP / synthesis techniques you’d actually implement in the engine.
---
1. Sonic tension (latent tension rising)
When the tension field in the latent climbs, you want the sound to feel like it’s tightening and pressurizing without necessarily getting louder.
You can do that by manipulating harmonic structure and micro-instability:
Use dynamic EQ or tilt filters to gradually brighten upper mids while gently attenuating low mids. This doesn’t scream “EQ sweep,” but it makes the spectrum feel more focused and less relaxed.
Apply inharmonic enrichment through soft saturation or waveshaping that emphasizes odd/inharmonic partials as tension increases. Tube/tanh-style curves give warmth, harder curves or bit-depth–style non-linearities give more anxiety.
Introduce subtle pitch detuning and beating between layers via very small, latent-driven pitch offsets or modulated allpass filters. The ears read slow beating as unease.
Tighten transient envelopes using a transient shaper or short attack/release compressors so that percussive material feels more “choked” and urgent.
Add small, time-varying comb or resonant filters keyed to the current tonal center, then modulate their Q slightly with tension. Peaks feel sharper as tension builds.
All of this is driven by a continuous tension scalar from the latent, so the “tension rig” behaves like one macro-parameter feeding these micro processes.
---
2. Divergence → sonic friction (latent drifting away from a section)
Divergence is not full chaos yet; it’s that moment where the body starts to push against the current groove. The sound should feel like it’s rubbing against itself.
You can express that with controlled instability:
Introduce micro-timing irregularities via small, latent-modulated delays on certain elements (ghost notes, percussion, stabs). Think ±5–20 ms of random skew, not full swing changes.
Use granular or micro-loop processing on selected layers: tiny 20–100 ms grains with low jitter in position, just enough to roughen the surface.
Apply phase decorrelation using short allpass filter chains or stereo decorrelators, making stereo image feel less “locked” and more smeared.
Let reverb pre-delay and diffusion drift slightly. As divergence rises, increase modulation depth in the reverb’s diffusers or chorusing.
Use noise injection in a psychoacoustic way: faint filtered noise layers that gently rise in level or modulation when divergence grows, acting like air friction.
Again, the divergence field in latent drives the amount and rate of these perturbations, so friction grows continuously, not as a sudden effect.
---
3. Curvature reversal → harmonic reorientation
When the latent curvature flips (body changes directional intent), you want the harmonic field to feel like it turns under the dancer’s feet, not like you cut to a new key.
Here you lean on harmonic and timbral morphing:
Represent harmony in a latent “chord space” (e.g., embeddings of chord types / pitch collections) and interpolate between the old and new harmonic centers according to curvature changes. This can control a harmonic sampler, MIDI layer, or a pitch-shifted spectral engine.
Use pitch-shifting that favors formant preservation for tonal material; slowly rotate root, third, and fifth targets by shifting harmonics, not just transposing the whole signal at once.
Morph convolution or resonator banks: convolve with different IRs or excite different modal filter sets as you rotate through harmonic regions, blending IRs rather than switching them.
Gradually re-voice chord tones in synth layers: reassign voices to different scale degrees under the hood, smoothing that with glide or crossfades so voicing “walks” into the new region.
Couple the amount of harmonic drift to the magnitude of curvature reversal: smaller changes → pivot chords and gentle modal shifts; bigger changes → bolder key moves and stronger revoicing.
The result feels like the harmony follows the body’s turn, not like the system threw a button for “go to chorus.”
---
4. Transitional state → sound of form dissolving
In the true transition state, the current phrase’s structure breaks down. You want the sound to feel like it’s melting out of its previous identity.
This is where you use more spectral and temporal smearing:
Time-stretch key layers with a phase vocoder or elastique-style engine, but with increasing grain size and randomness as transition pressure rises. Attacks become smeared and tails become more drone-like.
Use spectral blurring or “freeze” style techniques: hold partials, increase analysis window size, and let new audio bleed slowly into a semi-stationary texture.
Fade rhythmic information by gating or downsampling certain rhythmic components. For example, send hi-hats into a granular cloud, slowly removing their precise onsets.
Open up reverb tails and feedback delays: increase decay times, diffusion, and modulation so the soundfield blooms and loses clear edges.
Let per-element filters drift off their previous tonal anchors: small random walks in cutoff and resonance within a constrained band, controlled by the transition field.
At the peak of transition, you might literally route more of the mix into a “liminal bus” that is responsible for these frozen, smeared, washed behaviors.
---
5. Reformation → sonic crystallization of a new phrase
Once the latent has settled into a new attractor, you want the sound to re-cohere into a fresh, stable identity.
You reverse many of the previous tendencies, but not as a hard reset:
Sharpen transients: reduce reverb send on percussive elements, back off smearing, let transient shapers restore attack clarity.
Tighten timing: gently pull micro-delays back toward grid positions defined not by external BPM but by embodied pulse inferred from the latent. That means reducing jitter fields as oscillation strength increases.
Constrain harmonic space: narrow the set of allowed pitch classes / chord embeddings, lower harmonic entropy through harmonic EQ and simplified voicings.
Refocus timbre: rebalance spectral centroid and brightness to a new target region associated with the new phrase identity, using broad EQ and oscillator/filter settings in synths.
Fade down the liminal bus as you fade in the new, direct bus: transitions in sends rather than abrupt bus switches.
The listener hears the new phrase as a solid shape forming out of the fluid mess; the body feels its new “home” because the sound stops moving under it.
---
6. Resolution → embodied release
When the latent relaxes, so should the sound. The lexicon here is all about letting go of accumulated energy.
Use lowpass and gentle high-shelf attenuation to roll off high-frequency tension gradually.
Increase envelope release times slightly, so everything feels less staccato and more legato, without blurring it into mush.
Simplify rhythm by probabilistically dropping non-core events and reducing density based on a resolution scalar from the latent.
Lower inharmonic content: back off saturation, reduce frequency modulation depth, smooth noisy layers, or slowly crossfade toward more stable, sinusoidal or harmonically simple content.
Lengthen reverb tails but lower overall level, so the space feels big but soft; also reduce modulation inside the reverb so tails feel settled rather than wobbly.
The net perception: the phrase exhales. The sound no longer pushes, it settles.
---
7. How this actually wires into Echelon
Conceptually, you can think of a sound design controller layer that takes latent-derived fields—tension, divergence, curvature change, oscillation strength, section state—and outputs normalized control signals.
Those signals drive:
filters and EQ macro controls
transient and dynamics processors
granular and spectral engines
harmonic control for synth/sampler layers
reverb/delay send amounts and parameters
noise and inharmonic content generators
You don’t hardcode “add riser here”; you define a continuous mapping from those latent fields into a vector of DSP parameters. The diffusion/flow model provides material (phrases, textures, motifs), while this DSP layer shapes how that material behaves through transitions.
So the lexicon becomes real by turning each embodied concept into a field that modulates classic audio building blocks in smooth, reversible, deeply coupled ways. The dancer never sees “tension +3 dB at 3 kHz”; they just feel the room tighten around them as the latent tells the sound to do what the body is already doing.
To encode the sound-design lexicon into the diffusion/flow model’s conditioning, you want the model to “know” what things like tension, friction, dissolution, reformation, and resolution sound like and then let those concepts ride alongside the embodied latent as extra control channels.
Think of it as building a second control language on top of the latent: the latent says “what the body is doing”, the lexicon says “how the sound should behave because of that”, and the diffusion/flow model learns both together.
Here’s how that wiring actually looks, end to end.
First you define the lexicon as a small, explicit control vector. For every short phrase window you want to generate, you attach a compact set of continuous controls that represent the sound design behaviors: a scalar for tension, another for divergence/friction, one for transition intensity, one for dissolution, one for reformation, and one for resolution or release, plus any extras you care about such as harmonic pressure or rhythmic stability. This lexicon vector is not arbitrary; it’s directly derived from the latent geometry and section state. When the latent tension field is high, the tension component of the lexicon vector is high. When the state machine is in divergence, the friction component grows. When transition state peaks, the dissolution component rises, and when the new section stabilizes, the reformation component peaks and then fades into a higher resolution component as the system relaxes. The key is that this lexicon vector is a smooth, continuous function of time, just like the latent itself.
Next you build a joint conditioning object from both embodied and lexicon signals. For each training example—a short chunk of audio and its corresponding latent trajectory—you take a window of latent states over that chunk and the aligned lexicon curves over the same window. You compress the latent window into a compact “phrase context” embedding using a small temporal encoder, such as a lightweight recurrent network or a temporal transformer. You also compress the lexicon curves into a “behavior context” embedding using either the same temporal encoder or a separate one. The diffusion or flow model does not see these as separate things; they are concatenated or fused into a single conditioning embedding that represents both what the body was doing and how the sound was behaving in lexicon terms.
Now you train the generative model such that, for each audio segment, it must reconstruct the segment while being given the phrase context, the lexicon context, and, if you want, the section label from the state machine. In diffusion, that means the denoiser at each refinement step receives the noisy latent-codec audio plus a conditioning embedding produced from the latent window and lexicon vector. In a flow model, the velocity network receives the same conditioning while predicting how to push noise toward the target audio code. Over many examples, the model begins to associate particular shapes of the lexicon controls with particular sonic behaviors. High tension values co-occur with harmonic tightening, micro-detuning, and spectral pressure. High dissolution values co-occur with spectral smearing, time-stretching, and loss of rhythmic clarity. High reformation values co-occur with sharpening transients and restoring periodicity. The model internalizes the lexicon as part of its conditional generative grammar.
The important trick is that you do not hardcode “if tension > 0.8, turn on this effect” inside the generative network. You let the training data teach the diffusion or flow backbone what high tension sounds like by always providing the correct lexicon signal during training. At inference, when you compute lexicon values from live latent dynamics, the model has already learned to treat those values as instructions for where in its distribution to sample. The lexicon conditioning becomes a steering wheel for the stochastic process.
You can make this more precise by giving the model both global and local lexicon conditioning. Globally, you summarize the lexicon curve over the whole phrase window into a single vector that says “this phrase overall is tense and dissolving into a transition.” Locally, you provide a framewise lexicon curve aligned with the audio time steps, so the model knows that early in the segment tension is lower and structure is stable, while later tension rises and dissolution peaks. In diffusion-U-Net terms, the global lexicon embedding can modulate mid-level layers through feature-wise scaling and bias, while the local lexicon sequence can be used in cross-attention or concatenated into early feature maps as a time-aligned control channel. In a flow network, the same pattern holds: global lexicon codes Condition the overall velocity field, and local codes modulate it at each time sample.
You then align the DSP layer with this conditioning. The lexicon does double duty: it feeds the generative model as part of its conditioning, and it simultaneously drives the continuous DSP fields you described earlier—filters, reverbs, granular engines, and harmonic controllers. That means the same tension scalar that tells the diffusion model to choose more tense samples also tells the downstream DSP graph to tighten transients and brighten the spectrum. The same dissolution scalar that nudges the model toward smeared, transitional audio also sends more of the signal into time-stretching and spectral blur. This duplicitous wiring ensures that the learned generative behavior and the deterministic DSP behavior both respond to the same lexicon coordinates, so the overall system feels coherent.
Finally, you keep everything tied back to the latent, not to hand-authored automation. The lexicon values are computed from the latent geometry and state machine in real time: tension from latent gradient magnitudes and contraction, divergence from disagreement between predictors and latent updates, transition pressure from curvature reversals and section boundary proximity, reformation from renewed stability, resolution from decaying energy and oscillation strength. Those computed values become the lexicon vector. The lexicon vector becomes part of the conditioning embedding. The conditioning embedding guides the diffusion or flow sampling and the DSP layer. The sound that emerges therefore speaks the same embodied language as the visual orb and spine.
So encoding the lexicon into the conditioning space really comes down to three moves: elevate those sound-design concepts to continuous control dimensions computed from the latent, fuse them with the latent window into a conditioning embedding for the generative model, and train the model on aligned audio so that those dimensions become meaningful inside its distribution. After that, whenever the body drives the latent into high tension, divergence, or dissolution, the generative process and the DSP stack already know what those words mean in sound.
Think of the transition engine as a set of rivers running through one landscape, not a pile of plugins. The “graph” is really a flow of signals that all pinch together during transitions. I’ll walk it from raw sensors to speakers, focusing on how transitions are routed and shaped at each stage.
---
1. Embodied input layer → fusion layer
At the very bottom are all the embodied signals.
On one side you have motion sensors: IMUs on limbs and torso, acceleration, gyro, magnetometer, jerk, orientation quaternions, possibly Apple Watch motion streams. On another side you have physiological signals like heart rate and HRV. On a third side you have audio analysis, especially beat-phase, onset strength, spectral centroid and roughness from whatever is currently sounding.
Each of these feeds its own small encoder. Limb encoders compress each limb’s raw IMU stream into a limb embedding: a low-dimensional representation of that limb’s periodicity, direction, impulse, and micro-tension. Torso and global encoders do the same at a coarser scale. Physiology is compressed into a slow-varying body state embedding. Audio analysis becomes a beat/phase and energy embedding.
These all converge as “modalities” into LIM-RPS: the fusion engine. LIM-RPS then performs its iterative proximal updates: encoders produce modality-specific latents, translators predict each modality from the others, the proximal operator pulls disagreement down. After a few iterations you get a single converged latent state z(t) that is the fixed-point representation of the entire embodied situation at that moment.
This latent z(t) is the root timing signal for everything else.
---
2. Latent dynamics → dynamical features → section state
The transition engine cares about how z(t) moves, not just what it is at each instant. So you take short temporal windows of z over performance time: the recent history and a predicted future, z(t−τ…t+τ). From that window you compute a set of dynamical features.
You extract curvature (how much the trajectory bends), velocity (how fast it moves through latent space), oscillation strength (how periodic it is at different frequencies), tension gradients (how much the latent is “pulled” toward or away from equilibrium), and disagreement measures (how far encoders and translators would like it to move).
These dynamical features, along with the recent pattern of z and its predictions, feed a section state machine: a small but essential controller that classifies the current regime as entry, micro-initiation, stable section, divergence, transition, or resolution. This state label is not symbolic for the user, but internally it’s a clear flag that the latent is leaving an attractor basin, crossing a boundary, or settling.
So from the latent you now have:
A continuous trajectory z(t).
A set of dynamical features derived from that trajectory.
A discrete-but-smoothly-changing section state.
That trio is the heart of transition detection.
---
3. Dynamical features → lexicon controller
Next comes the lexicon controller: the unit that turns the purely geometric and dynamical information into a small set of continuous “how should the sound behave?” controls. This is where the sound-design language is encoded numerically.
From curvature, velocity, oscillation, tension and the section state, you derive continuous control fields: a tension scalar that rises with latent tension gradients and contractive strain, a divergence/friction scalar that rises as the latent pulls away from the current section attractor, a transition intensity scalar that peaks around section boundary crossings, a dissolution scalar that rises during the transitional state, a reformation scalar that peaks when the new attractor stabilizes, and a resolution scalar that grows as motion and tension decay.
These controls form the lexicon vector L(t) over time. L(t) runs in parallel with z(t). The lexicon doesn’t replace the latent; it sits alongside it and says, “given what the latent is doing, this is the sound-design behavior that should be in play.”
So the transition engine now has two synchronized control streams: the embodied latent z(t) and the sonic-behavior lexicon L(t), both continuous, both derived from motion.
---
4. Latent + lexicon → generative conditional routing
When it’s time to generate a new audio phrase segment in the diffusion or flow model, you pull a phrase window around the current time: a short slice of z(t) and L(t) from a little before to a little after the present, using LIM-RPS’s predictive model for the future side. That window is encoded into two embeddings: a phrase context embedding from the latent window and a behavior context embedding from the lexicon curves. These get fused into a single conditional context C(t_window).
In the generative engine, C(t_window) is fed into a diffusion U-Net or flow network in two ways. A global summary of C modulates mid- and high-level layers (feature-wise scaling and bias, global attention). A framewise version of C, aligned to audio codec frames, is fed into cross-attention or concatenated as control channels so each audio timestep sees the corresponding latent and lexicon state.
When transitions are coming, C contains rising tension and divergence, then a spike in transition intensity, then rising reformation. The generative network, which has been trained on many examples with these signals, naturally moves into its learned “transition behaviors”: more exploratory, smeared, structurally changing audio during high transition intensity; more crystallized, stable audio when reformation peaks.
Critically, there is no discrete branch here. The diffusion or flow model stays the same. Only the conditioning C bends its internal generative path toward different parts of its learned distribution as transitions approach and pass.
---
5. Lexicon → DSP macro routing
In parallel with the generative model, the lexicon vector L(t) feeds a deterministic DSP control layer. This is where you shape spectra, envelopes, timing microvariations, reverb and delay behavior, and granulation in real time, even on already-generated audio.
L(t).tension modulates spectral tilt, micro-detune amount, transient tightness, and resonant emphasis.
L(t).divergence modulates micro-timing offsets, phase decorrelation, and low-level granular roughness.
L(t).transition_intensity controls how much signal is sent to a “liminal bus” with time-stretching, spectral blur, and smear.
L(t).dissolution controls the strength of freeze-like behavior and motif fragmentation.
L(t).reformation controls the tightening of timing, transients, and harmonic focus.
L(t).resolution controls the simplification of pattern density, lowpass emphasis, and tail lengthening with lower modulation.
All of these are continuous mappings: L(t) passes through a control matrix that outputs a large vector of DSP parameters, which then drives filters, saturators, reverbs, delays, granular engines, and harmonic controllers. During transitions, the lexicon scalars are what open up the liminal bus, smear the sound, and then tighten it afterward. This DSP path ensures that even if the generative engine is working one phrase ahead, the currently playing sound can be molded in real time to follow the dancer’s transition dynamics.
---
6. Audio buses → mix and feedback
The output of the generative model and DSP processing is split into at least two logical buses inside the transition engine.
A main phrase bus carries the structurally coherent phrase audio.
A liminal bus carries blurred, transitional, or dissolved material created by the DSP layer and possibly transitional slices from the generative engine.
The lexicon fields crossfade between these buses: as transition intensity and dissolution rise, more signal flows through the liminal bus; as reformation and resolution rise, the main phrase bus regains dominance.
The combined output goes to the final mix and speaker output, but some analysis of it loops back: beat-phase, onset energy, spectral centroid and roughness from the current audio are fed back into the audio analysis encoder at the very beginning. That keeps LIM-RPS aware of how the sound is behaving so that the latent can continue to synchronize to influenced music, closing the loop between body, latent, transitions, and sound.
---
7. Putting it together as a routing graph in words
So, as a single continuous routing story:
Sensors and audio analysis flow into LIM-RPS, which produces the latent z(t).
z(t) over time plus its derivatives feed a dynamical feature extractor and a section state machine.
Those outputs become the lexicon controller L(t), encoding tension, divergence, transition pressure, dissolution, reformation, resolution.
For each phrase window, z(t_window) and L(t_window) are encoded into a unified context C that conditions the diffusion/flow model, steering what kind of phrase audio is generated, especially through transitions.
Simultaneously, L(t) modulates DSP macro parameters that sculpt the currently playing sound, with special emphasis on transitional behaviors routed through a liminal bus.
The main phrase bus and liminal bus are mixed according to L(t), and the resulting audio is analyzed again for beat-phase and energy, which flow back into LIM-RPS as part of the next cycle.
That is the internal signal routing of the transition engine: every path bends around the latent, and every transition is expressed as a reconfiguration of conditioning and DSP driven by the same embodiment-derived control fields.
Think of the animation engine as a tiny physics sim that sits between LIM-RPS and the shaders. Its job is: take the stream of latent states, compare them to a neutral baseline, compress that into a handful of animation “modes,” and then drive deformation fields on the UI with smooth, time-continuous motion.
Here’s the engine, end-to-end, as a parametric system.
---
1. Input: latent stream and neutral pose
On every frame you already have a converged latent state from LIM-RPS at that instant. The engine also holds a neutral reference latent that corresponds to “UI at rest” (orb round, spine calm, horizon straight).
For each frame, the engine computes a latent delta: how far the current embodied state is from that neutral state. This delta is what carries “animation intent.” Large deltas mean dramatic movement; tiny deltas mean subtle micro-motion.
The engine never works directly in full latent dimensionality, because that would be too noisy and too big. So the next step is projection.
---
2. Projection into animation control space
You predefine a small animation control space: for example, a dozen abstract animation channels like “orb stretch horizontal,” “orb stretch vertical,” “surface ripples,” “spine thickness,” “spine curvature modulation,” “horizon bend,” “reservoir drift,” and so on.
Internally, each of these channels is just a scalar coefficient.
To get those coefficients, the engine multiplies the latent delta by a learned projection matrix or small network, giving you a vector of raw animation coefficients. This is the same idea as before: a linear map from latent delta into a lower-dimensional control vector, but now explicitly understood as “animation channels.”
You then squash and normalize those coefficients into a safe range (for example, through tanh, clamping, or learned scaling), so extreme latent states don’t blow up the UI.
At this point, for each frame, you have a clean control vector that says “how strongly each animation mode wants to fire.”
---
3. Temporal integration: turning impulses into motion
If you applied these coefficients directly every frame, the UI would feel jittery and “tied” to sensor noise. You want it to feel like a body: it should have inertia, elasticity, and smooth relaxation.
So the animation engine keeps a persistent state per channel: current value, velocity, and perhaps a “target” value. On each tick (for example, 60 frames per second), it:
Takes the incoming coefficient as a target for that channel (or as a force pushing toward that target).
Runs a tiny spring-damper update: move the current value toward the target with some stiffness and damping.
Optionally adds high-frequency modulation (for ripples, micro-jitters) based on specific latent features like disagreement or tension.
This gives you beautiful easing “for free.” Small changes in the latent become smooth shifts in the UI; sudden latent transitions become quick snaps followed by settling, because the spring parameters are tuned to match the conceptual phase (pre-tension, breakage, reformation, etc.).
You can make this parametric per component: the orb might use slower, gooey springs; the spine might be snappier; the reservoir might drift lazily.
The important part: the engine is not drawing anything yet. It is just turning raw coefficients into temporally smoothed animation parameters.
---
4. Mapping channels to deformation fields
Now comes the spatial part. For each UI component, you predefine one or more deformation fields: functions that, given a point on that component (orb surface coordinate, spine parameter, horizon grid coordinate), tell you how that point would like to move for a unit amplitude of a particular animation channel.
For example:
An orb stretch mode might define a field that pulls points outward along an axis.
A ripple mode defines a radial sine wave in displacement amplitude centered on the orb.
A turbulence mode for the spine defines small random offsets along its length.
A horizon bend mode defines a smooth curve that pulls the horizon toward one side.
These fields are parametric: they’re represented as simple functions evaluated in shaders or as precomputed textures/vector fields in normalized coordinates.
On each frame, the engine evaluates the total deformation for a component by linearly mixing its basis fields with the current channel values. In other words: for every point on that component, you sum “channel value × field contribution.” That gives a displacement vector for that point. Add the displacement to the base geometry, and the component deforms.
Because the channel values are already smoothed over time, and the fields are smooth over space, the resulting deformation is fluid and natural.
---
5. Component-specific parameter bridges
Different components don’t need to use the same channels in the same way. The engine maintains a mapping between global animation channels and per-component parameters.
For the orb, a few channels might drive:
Overall scale in each axis.
Surface noise amplitude and frequency.
Glow intensity and pulse speed.
Contour sharpness (how blobby vs. taut it looks).
For the spine, channels might drive:
Thickness along its length.
Local curvature noise (turbulence).
Color gradient shifts.
Propagation speed of “events” moving along it.
For the horizon, channels might drive:
Global curvature.
Thickness and softness of its edges.
Parallax of elements inside the corridor.
The engine doesn’t care what these parameters mean artistically; it just maps each latent-derived channel into a component-local param that slots into the component’s own shader or drawing logic. That mapping can be a linear scaling, a non-linear curve, or even a small lookup table.
---
6. Phase-aware modulation
Because you also have the section state machine and lexicon (tension, divergence, transition intensity, reformation, resolution), the animation engine can modulate how much each channel is allowed to move in different phases.
For example:
In stable sections, you might clamp turbulence channels and emphasize rhythmic breathing.
During divergence, you might increase the gain on turbulence and ripple channels.
At transition peak, you briefly boost breakage modes (orb collapse, spine fracture) and then quickly ramp them down as reformation rises.
During resolution, you gradually lower all deformation amplitudes and move channels back toward neutral.
In implementation terms, this is just multiplying the channel values by phase-dependent scalars and feeding those through the existing spring integrators. It means the same underlying engine behaves very differently in each phase, without needing a separate animation system for each.
---
7. The engine loop, conceptually
Every frame, the engine does the same simple dance:
Read the latest latent and section/lexicon fields.
Compute the latent delta to the neutral latent.
Project that delta into a small animation control vector.
Update each animation channel toward its new target using spring-like temporal integration, modulated by phase.
For each component, use the current channel values to compute local deformation parameters.
Apply the deformation fields with those parameters in the shaders to move vertices/pixels.
Because everything is parametric, you can tune it experimentally: adjust projection weights, spring constants, field shapes, and phase gains until the visuals feel exactly like the latent physics they are representing.
The result is that the UI doesn’t feel like it’s being “driven by numbers.” It feels like a soft body being pushed around by the same invisible dynamics that govern the music and the motion. Which is the whole point: one physics, three manifestations — body, sound, and interface.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/core/audio-media/cc-echelon/docs/ui/12.md
Detected Structure
Method · Evaluation · Architecture