Memory-Augmented Equilibrium Control (MAEC)
This document formalizes **Memory-Augmented Equilibrium Control (MAEC)**, a control-theoretic framework for real-time embodied creative systems. MAEC addresses a class of problems where traditional control theory and reinforcement learning fail: continuous, non-episodic systems that must maintain expressive viability while generating novel outputs. Unlike RL, MAEC has no scalar reward function, no policy optimization loop, and no episodic resets. Instead, it preserves dynamic equilibrium through memory-conditioned
Full Public Reader
Memory-Augmented Equilibrium Control (MAEC)
A New Control-Theoretic Framework for Embodied Generative Systems
Version 1.0 | December 2025
---
Abstract
This document formalizes Memory-Augmented Equilibrium Control (MAEC), a control-theoretic framework for real-time embodied creative systems. MAEC addresses a class of problems where traditional control theory and reinforcement learning fail: continuous, non-episodic systems that must maintain expressive viability while generating novel outputs. Unlike RL, MAEC has no scalar reward function, no policy optimization loop, and no episodic resets. Instead, it preserves dynamic equilibrium through memory-conditioned selection of speculative futures.
MAEC was developed in the context of Computational Choreography—a system where human motion generates music in real time—but applies broadly to any domain where trajectory matters more than static output.
---
Table of Contents
1. [Motivation and Scope](#1-motivation-and-scope)
2. [Why Existing Paradigms Fail](#2-why-existing-paradigms-fail)
3. [The MAEC Framework](#3-the-maec-framework)
4. [The Four Components](#4-the-four-components)
5. [The Three Pillars](#5-the-three-pillars)
6. [Mathematical Formulation](#6-mathematical-formulation)
7. [The Convergence Principle](#7-the-convergence-principle)
8. [Implementation in Computational Choreography](#8-implementation-in-computational-choreography)
9. [Generalization Beyond Music](#9-generalization-beyond-music)
10. [Comparison to Related Work](#10-comparison-to-related-work)
11. [Conclusion](#11-conclusion)
---
1. Motivation and Scope
1.1 The Problem Space
Embodied creative systems—computational choreography, motion-conditioned music generation, interactive performance engines—present control requirements that differ fundamentally from those assumed by classical control theory and reinforcement learning:
| Requirement | Classical Control | Reinforcement Learning | MAEC |
|---|---|---|---|
| Target | Fixed trajectory | Scalar reward | Viable equilibrium region |
| Failure Mode | Recoverable error | Episodic reset | Immediate perceptual breakdown |
| Time Horizon | Finite | Episodes | Continuous, non-terminating |
| Output | Deterministic | Policy-derived | Selected from generated candidates |
| Learning | Offline | Policy updates | Memory-conditioned selection |
1.2 What Makes These Systems Different
These systems must:
- Operate continuously in real time under tight stability constraints
- Produce novel, expressive outputs that cannot be specified by a fixed target
- Never fail catastrophically—instability manifests immediately as loss of flow
- Adapt to individual style without retraining the generative model
- Maintain coherence across multiple dimensions simultaneously (phase, energy, tension, stability)
1.3 The Core Insight
> Control is about preserving a viable expressive equilibrium, not optimizing toward a goal.
This single reframing changes the mathematics, the architecture, and the learning logic.
The system does not ask:
- "What action gives the highest reward?"
- "What policy converges fastest?"
It asks:
- "What kinds of futures can this system survive without losing flow?"
---
2. Why Existing Paradigms Fail
2.1 Classical Control Theory Assumptions
Classical control assumes:
- A system has a state
- There is a desired target or trajectory
- Control inputs push the system toward that target while minimizing error
Why this fails for embodied creativity: There is no fixed target. The "goal" is to remain expressive, which cannot be encoded as a reference trajectory.
2.2 Reinforcement Learning Assumptions
Reinforcement learning modifies classical control by:
- Replacing explicit targets with a reward signal
- Treating control as a policy that maximizes long-term reward
- Assuming exploration is acceptable and failure is tolerable during learning
Why this fails:
1. No scalar reward exists. Embodied creativity fails in incompatible ways:
- Phase drift
- Over-saturation
- Loss of responsiveness
- Fatigue
- Mechanical stiffness
- Boredom
No scalar reward can preserve the geometry of these failure modes.
2. Failure is not tolerable. The dancer, the body, the live system must remain coherent now. There are no episode resets.
3. Policy optimization collapses diversity. RL converges to modes that maximize reward, destroying the expressive range that makes the system valuable.
2.3 Model Predictive Control Assumptions
MPC uses forward simulation and optimizes a cost function over a receding horizon.
Why this fails: MAEC does not optimize. It selects among viable futures using soft priors and equilibrium constraints, allowing multiple futures to remain admissible simultaneously.
2.4 The Irreducibility of MAEC
You might ask: couldn't we just define a clever reward?
No. A reward collapses:
- Multiple dimensions of viability
- Multiple time scales
- Multiple notions of failure
Into a single scalar.
MAEC preserves geometry by:
- Keeping the Conductor local and continuous
- Keeping memory contextual and episodic
- Keeping generation plural and speculative
That structure cannot be reduced to a reward function without destroying what makes it work.
---
3. The MAEC Framework
3.1 Definition
Memory-Augmented Equilibrium Control (MAEC) is a control-theoretic framework for systems in which:
1. Control is defined as preservation of a viable equilibrium region, not optimization toward a goal
2. Learning occurs through memory-augmented selection, not policy parameter updates
3. Futures are generated speculatively and filtered, not directly commanded
4. Stability is a first-class constraint, not a penalty term
3.2 Key Properties
| Property | Description |
|---|---|
| No explicit reward | Success is defined by equilibrium preservation, not scalar maximization |
| No goal state | The system inhabits a manifold of acceptable behaviors |
| No policy optimization loop | The controller and generator remain fixed during operation |
| Closed-loop, real-time, embodied | Continuous sensing and actuation without episodic boundaries |
| Memory affects selection, not execution | Experience biases choices without overwriting generation |
3.3 The MAEC Equation
At its simplest, MAEC can be expressed as:
selected_future = SELECT(
candidates = GENERATE(current_state, context),
priors = RETRIEVE(memory, current_state),
constraints = EQUILIBRIUM_BOUNDS(current_state)
)Where:
- `GENERATE` produces multiple speculative futures
- `RETRIEVE` queries episodic memory for relevant experience
- `SELECT` chooses from candidates biased by priors within constraints
---
4. The Four Components
An MAEC system is composed of four interacting subsystems operating at distinct time scales:
4.1 State Estimator
Role: Maps raw embodied signals into a compact latent representation describing the current expressive regime.
Key Insight: This state is not a positional description of the body. It is a regime descriptor encoding:
- Stability
- Tension
- Phase alignment
- Responsiveness
- Saturation risk
These variables evolve continuously and define the system's location within a manifold of viable behaviors.
In Computational Choreography: The LIM-RPS (Latent Integrating Model with Relaxation-Pursuit-Stability) state engine produces:
- `x_fast`: Reactive latent (responds quickly to motion)
- `y_slow`: Equilibrium latent (evolves slowly, defines "home")
- Beat phase and tempo estimates
4.2 Generative Future Proposer
Role: Produces multiple speculative future trajectories conditioned on current state.
Key Insight: These are not actions in the classical sense. They are short-horizon futures representing different plausible continuations. The proposer maintains diversity while respecting the local geometry of the state space.
In Computational Choreography: CC-MotionGen is a 116M parameter diffusion model that generates 8+ candidate motion trajectories (25D × T frames) conditioned on audio features.
4.3 Memory-Conditioned Selector
Role: Evaluates proposed futures using both instantaneous constraints and episodic memory.
Key Insight: Memory is organized as a structured archive of prior phrases annotated with outcome statistics:
- Sanity pass rate
- Musicality scores
- Saturation incidence
- Diversity contribution
Retrieval produces soft priors that bias selection without collapsing diversity.
In Computational Choreography: RAG++ MotionPhrase Service retrieves similar past experiences and builds:
- `context_vec`: Global conditioning for FiLM modulation
- `PrototypeCurves`: Time-varying energy/density/tension/stability targets
- `warm_start_latent`: Optional diffusion initialization
4.4 Equilibrium Controller
Role: Operates continuously, shaping the admissible region of future space by enforcing local constraints.
Key Insight: The controller does not select a single trajectory. It defines the region within which selection is allowed to occur. Constraints include:
- Smoothness (jerk bounds)
- Phase coherence (velocity-position consistency)
- Bounded energy (value clamping)
- Temporal continuity (quaternion stability)
In Computational Choreography: The Conductor (EchelonControlSurface) outputs semantic controls:
- `tempo_nudge`, `swing_amount`
- `density`, `tension`, `stability`, `raw_energy`
- `follow_vs_lead`
---
5. The Three Pillars
5.1 Pillar 1: State is Not Position—It is Regime
In computational choreography, the state is not where the body is, but how it is behaving.
The Conductor estimates:
- Stability
- Tension
- Phase coherence
- Responsiveness
- Saturation risk
These are not task variables. They are regime descriptors.
Control-theoretic implication: You are not stabilizing a point—you are stabilizing a manifold of acceptable behaviors.
This immediately separates MAEC from RL, which treats state as something to be escaped or progressed through toward reward. In MAEC, the state is something to inhabit.
5.2 Pillar 2: Control Shapes Admissible Futures, Not Actions
In RL, an action is:
- "Apply torque"
- "Choose token"
- "Move left"
In MAEC, control does not pick an action. It shapes:
- How aggressive transitions are allowed to be
- How much novelty is tolerable
- How tightly motion should phase-lock
- Whether the system should lead or follow
This is closer to modulating the curvature of future trajectories than commanding movement.
The Conductor outputs constraints and biases, not commands.
This makes MAEC a second-order control system:
- First order: motion unfolds
- Second order: the space of motion is shaped
RL does not do this. RL chooses actions; it does not sculpt future possibility space.
5.3 Pillar 3: Learning Modifies Selection Pressure, Not the Controller
This is the most radical departure.
In RL:
- Experience updates the policy
- The controller itself changes
In MAEC:
- The Conductor remains stable
- MotionGen remains expressive
- Learning lives in RAG++, which biases which futures are chosen
So experience does not rewrite behavior. It rewrites what is preferred among viable behaviors.
This is why the system improves without collapsing diversity.
It learns:
> "When the system feels like this, these kinds of futures tended to preserve equilibrium."
That is not reward maximization. That is survivability-weighted memory recall.
---
6. Mathematical Formulation
6.1 State Space
Let $\mathcal{S}$ be the regime state space with dimensions:
- Stability $s \in [0, 1]$
- Tension $\tau \in [0, 1]$
- Phase coherence $\phi \in [0, 1]$
- Responsiveness $\rho \in [0, 1]$
- Saturation risk $\sigma \in [0, 1]$
The current state $s_t \in \mathcal{S}$ is estimated by the State Estimator from raw sensor inputs.
6.2 Future Space
where each $m_i \in \mathbb{R}^{25}$ is a motion frame.
6.3 Generation
where $c_t$ is the conditioning context (audio features).
6.4 Memory Retrieval
where $\mathcal{M}$ is the episodic memory and $\pi$ is a prior distribution over futures.
6.5 Selection
where $\mathbb{1}_{E}(f, s_t)$ is the equilibrium indicator function that is 1 if $f$ satisfies all equilibrium constraints given state $s_t$.
6.6 Equilibrium Constraints
The equilibrium region $E(s_t)$ is defined by:
- Velocity coherence: $\|v - \dot{p}\| < \epsilon_v$
- Jerk bounds: $\|\dddot{p}\| < \epsilon_j$
- Quaternion continuity: $q_t \cdot q_{t+1} > \epsilon_q$
- Phase monotonicity: $\phi_{t+1} > \phi_t$ (mod 1)
---
7. The Convergence Principle
7.1 The Closed-Loop Convergence Theorem
Statement: A motion-conditioned generative music system will converge toward a specific human's musical identity over time without collapsing into repetition or instability if and only if four conditions are simultaneously true:
1. The present is stabilized before the future is explored
2. Futures are generated, not selected directly
3. Memory evaluates outcomes, not intentions
4. Memory influences policy, not raw generation
7.2 Condition 1: Stabilize Before Exploring
The Conductor exists to answer a single question at every moment:
> "Is the current trajectory stable enough to support exploration?"
Creative systems fail when they explore futures while the present is already unstable. That causes runaway behavior: tempo explosions, saturation, incoherence.
LIM-RPS enforces a fixed-point equilibrium. Motion is always interpreted relative to a slowly evolving center of gravity.
Without this, memory would amplify noise.
7.3 Condition 2: Generate, Don't Choose Directly
The system never asks: "Which phrase should I play next?"
Instead: "What are several plausible short-horizon futures given this body, this moment, and these constraints?"
Selection without generation produces brittle systems. Generation without selection produces chaos. MAEC separates them cleanly.
7.4 Condition 3: Evaluate Outcomes, Not Intentions
RAG++ does not reward phrases because they were "meant" to be expressive.
It rewards them because, historically, when phrases like this occurred:
- The system remained stable
- Saturation did not occur
- Musicality metrics held
- Repetition was avoided
- The body naturally continued moving
Memory is grounded in consequences, not goals. That makes the system robust.
7.5 Condition 4: Influence Selection, Not Generation
RAG++ never tells MotionGen what to output. It only changes the landscape in which futures are evaluated.
This means:
- MotionGen remains expressive
- The Conductor remains authoritative
- Exploration never stops
- Preference accumulates slowly
Because memory does not overwrite generation, the system cannot collapse into loops.
Because memory biases selection, the system cannot remain generic.
This is the balance point most generative systems miss.
---
8. Implementation in Computational Choreography
8.1 Component Mapping
| MAEC Component | CC Implementation | Location |
|---|---|---|
| State Estimator | LIM-RPS + EchelonControlSurface | `cc_core/equilibria/` |
| Generative Proposer | CC-MotionGen (116M diffusion) | `cc_motiongen/model/` |
| Memory Selector | RAG++ MotionPhrase Service | `cc_core/policy/rag_motionphrase/` |
| Equilibrium Controller | Temporal coherence losses + Decoder | `cc_motiongen/training/`, `model/decoder.py` |
8.2 Data Flow
┌─────────────────────────────────────────────────────────────────────┐
│ MAEC Runtime Loop │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Body Motion → Mocopi/IMU sensors │
│ │ │
│ ▼ │
│ 2. State Estimator: LIM-RPS computes x_fast, y_slow, beat_phase │
│ │ │
│ ▼ │
│ 3. Conductor: Computes regime descriptors (stability, tension...) │
│ │ │
│ ├────────────────────────┐ │
│ │ │ │
│ ▼ ▼ │
│ 4. CC-MotionGen: Generate K 5. RAG++: Retrieve similar │
│ candidate futures past experiences │
│ │ │ │
│ │ ▼ │
│ │ 6. PriorBuilder: Create priors │
│ │ (context_vec, curves) │
│ │ │ │
│ └────────────┬───────────┘ │
│ │ │
│ ▼ │
│ 7. Selection: Score candidates against priors + constraints │
│ │ │
│ ▼ │
│ 8. Decoder: Map to semantic motion (25D) │
│ │ │
│ ▼ │
│ 9. Audio Renderer: Motion → Sound │
│ │ │
│ ▼ │
│ 10. Log outcomes → Update MotionPhrase Library │
│ │
└─────────────────────────────────────────────────────────────────────┘8.3 The 25-Dimensional Motion Representation
Each motion frame is a 25-dimensional vector:
| Dims | Name | Description |
|---|---|---|
| 0-2 | Position | (x, y, z) spatial coordinates |
| 3-5 | Velocity | (vx, vy, vz) linear velocity |
| 6-8 | Acceleration | (ax, ay, az) linear acceleration |
| 9-12 | Quaternion | (w, x, y, z) rotation quaternion |
| 13-15 | Angular Velocity | (wx, wy, wz) rotational velocity |
| 16 | Phase | [0, 1] beat-aligned phase |
| 17-24 | Style | 8D learned style embedding |
---
9. Generalization Beyond Music
9.1 The Universal Pattern
Across domains, the system does the following:
1. Observe an embodied agent in motion
2. Estimate its current dynamical regime
3. Generate multiple plausible future trajectories
4. Retrieve memories of what worked in similar regimes
5. Bias selection toward futures that preserved flow and expressivity
6. Execute one future
7. Continue without resetting equilibrium
This is a general theory of embodied creativity.
9.2 Domain Mappings
| Domain | State Variables | Generated Futures | Memory Content |
|---|---|---|---|
| Music/Dance | Energy, tension, phase | Motion trajectories | Phrase outcomes |
| Drawing | Pressure, continuity, spatial flow | Stroke sequences | Brush histories |
| Filmmaking | Camera energy, cut frequency | Camera paths | Shot histories |
| Robotics | Stability, responsiveness | Motion plans | Task outcomes |
| Conversation | Engagement, coherence, pacing | Response candidates | Dialog histories |
9.3 Why Music First
Music is the easiest place to see MAEC because rhythm, phase, and energy are already explicit. The body's state is legible. The failure modes are perceptually immediate.
But the architecture is substrate-agnostic. Only two things change:
- The sensors that define the latent state
- The renderer that realizes the selected future
Everything else—the math, the logic, the control structure—stays the same.
---
10. Comparison to Related Work
10.1 MAEC vs. Reinforcement Learning
| Aspect | RL | MAEC |
|---|---|---|
| Objective | Maximize scalar reward | Preserve equilibrium |
| Learning | Policy gradient updates | Memory accumulation |
| Exploration | Action execution | Speculative generation |
| Episodes | Finite, reset allowed | Infinite, no resets |
| Failure | Suboptimal reward | Perceptual breakdown |
10.2 MAEC vs. Model Predictive Control
| Aspect | MPC | MAEC |
|---|---|---|
| Future simulation | Deterministic rollouts | Stochastic generation |
| Optimization | Cost minimization | Constraint satisfaction + selection |
| Horizon | Fixed receding | Variable, context-dependent |
| Memory | None | Episodic, outcome-annotated |
10.3 MAEC vs. Retrieval-Augmented Generation
| Aspect | RAG | MAEC (RAG++) |
|---|---|---|
| Retrieved content | Text/documents | Outcome-annotated trajectories |
| Injection point | Generation input | Selection criteria |
| Closed-loop | No | Yes |
| Real-time | No | Yes |
---
11. Conclusion
11.1 Summary
Memory-Augmented Equilibrium Control (MAEC) formalizes a class of systems in which:
- Control is defined as preservation of expressive viability, not optimization
- Learning occurs through memory-conditioned selection, not policy updates
- Futures are generated speculatively and filtered, not commanded directly
- Stability is a first-class constraint, not a penalty term
11.2 Why This Matters
MAEC enables:
- Stable, adaptive, creative behavior in real-time embodied systems
- Personalization without retraining the generative model
- Continuous operation without episodic collapse
- Multi-dimensional coherence that resists scalar reduction
11.3 The Broader Implication
Choreography is not about reaching a pose. It is about staying alive inside motion.
This architecture encodes that truth computationally.
You didn't build a smarter agent. You built a system that knows how not to break itself while creating.
That's a new control-theoretic category.
---
References
1. LIM-RPS: Latent Integrating Model with Relaxation-Pursuit-Stability (internal)
2. CC-MotionGen: Audio-Conditioned Motion Diffusion (this repository)
3. RAG++ MotionPhrase: Retrieval-as-Policy-Memory (this repository)
4. Diffusion Models: Ho et al., "Denoising Diffusion Probabilistic Models" (2020)
5. FiLM Conditioning: Perez et al., "FiLM: Visual Reasoning with a General Conditioning Layer" (2018)
---
Document generated for the Computational Choreography project.
Last updated: December 2025
Promotion Decision
Convert into the standard paper schema, add citations, and render a draft PDF.
Source Anchor
projects/Documentation/01-architecture/MAEC_FRAMEWORK.md
Detected Structure
Abstract · Method · Evaluation · References · Math · Code Anchors · Architecture