Grand Diomande Research · Full HTML Reader

Memory-Augmented Equilibrium Control (MAEC)

This document formalizes **Memory-Augmented Equilibrium Control (MAEC)**, a control-theoretic framework for real-time embodied creative systems. MAEC addresses a class of problems where traditional control theory and reinforcement learning fail: continuous, non-episodic systems that must maintain expressive viability while generating novel outputs. Unlike RL, MAEC has no scalar reward function, no policy optimization loop, and no episodic resets. Instead, it preserves dynamic equilibrium through memory-conditioned

Embodied Trajectory Systems working paper preprint structure candidate score 92 .md

Full Public Reader

Memory-Augmented Equilibrium Control (MAEC)

A New Control-Theoretic Framework for Embodied Generative Systems

Version 1.0 | December 2025

---

Abstract

This document formalizes Memory-Augmented Equilibrium Control (MAEC), a control-theoretic framework for real-time embodied creative systems. MAEC addresses a class of problems where traditional control theory and reinforcement learning fail: continuous, non-episodic systems that must maintain expressive viability while generating novel outputs. Unlike RL, MAEC has no scalar reward function, no policy optimization loop, and no episodic resets. Instead, it preserves dynamic equilibrium through memory-conditioned selection of speculative futures.

MAEC was developed in the context of Computational Choreography—a system where human motion generates music in real time—but applies broadly to any domain where trajectory matters more than static output.

---

1. [Motivation and Scope](#1-motivation-and-scope)
2. [Why Existing Paradigms Fail](#2-why-existing-paradigms-fail)
3. [The MAEC Framework](#3-the-maec-framework)
4. [The Four Components](#4-the-four-components)
5. [The Three Pillars](#5-the-three-pillars)
6. [Mathematical Formulation](#6-mathematical-formulation)
7. [The Convergence Principle](#7-the-convergence-principle)
8. [Implementation in Computational Choreography](#8-implementation-in-computational-choreography)
9. [Generalization Beyond Music](#9-generalization-beyond-music)
10. [Comparison to Related Work](#10-comparison-to-related-work)
11. [Conclusion](#11-conclusion)

---

1. Motivation and Scope

1.1 The Problem Space

Embodied creative systems—computational choreography, motion-conditioned music generation, interactive performance engines—present control requirements that differ fundamentally from those assumed by classical control theory and reinforcement learning:

Requirement	Classical Control	Reinforcement Learning	MAEC
Target	Fixed trajectory	Scalar reward	Viable equilibrium region
Failure Mode	Recoverable error	Episodic reset	Immediate perceptual breakdown
Time Horizon	Finite	Episodes	Continuous, non-terminating
Output	Deterministic	Policy-derived	Selected from generated candidates
Learning	Offline	Policy updates	Memory-conditioned selection

1.2 What Makes These Systems Different

These systems must:

Operate continuously in real time under tight stability constraints
Produce novel, expressive outputs that cannot be specified by a fixed target
Never fail catastrophically—instability manifests immediately as loss of flow
Adapt to individual style without retraining the generative model
Maintain coherence across multiple dimensions simultaneously (phase, energy, tension, stability)

1.3 The Core Insight

> Control is about preserving a viable expressive equilibrium, not optimizing toward a goal.

This single reframing changes the mathematics, the architecture, and the learning logic.

The system does not ask:
- "What action gives the highest reward?"
- "What policy converges fastest?"

It asks:
- "What kinds of futures can this system survive without losing flow?"

---

2. Why Existing Paradigms Fail

2.1 Classical Control Theory Assumptions

Classical control assumes:
- A system has a state
- There is a desired target or trajectory
- Control inputs push the system toward that target while minimizing error

Why this fails for embodied creativity: There is no fixed target. The "goal" is to remain expressive, which cannot be encoded as a reference trajectory.

2.2 Reinforcement Learning Assumptions

Reinforcement learning modifies classical control by:
- Replacing explicit targets with a reward signal
- Treating control as a policy that maximizes long-term reward
- Assuming exploration is acceptable and failure is tolerable during learning

Why this fails:

1. No scalar reward exists. Embodied creativity fails in incompatible ways:
- Phase drift
- Over-saturation
- Loss of responsiveness
- Fatigue
- Mechanical stiffness
- Boredom

No scalar reward can preserve the geometry of these failure modes.

2. Failure is not tolerable. The dancer, the body, the live system must remain coherent now. There are no episode resets.

3. Policy optimization collapses diversity. RL converges to modes that maximize reward, destroying the expressive range that makes the system valuable.

2.3 Model Predictive Control Assumptions

MPC uses forward simulation and optimizes a cost function over a receding horizon.

Why this fails: MAEC does not optimize. It selects among viable futures using soft priors and equilibrium constraints, allowing multiple futures to remain admissible simultaneously.

2.4 The Irreducibility of MAEC

You might ask: couldn't we just define a clever reward?

No. A reward collapses:
- Multiple dimensions of viability
- Multiple time scales
- Multiple notions of failure

Into a single scalar.

MAEC preserves geometry by:
- Keeping the Conductor local and continuous
- Keeping memory contextual and episodic
- Keeping generation plural and speculative

That structure cannot be reduced to a reward function without destroying what makes it work.

---

3. The MAEC Framework

3.1 Definition

Memory-Augmented Equilibrium Control (MAEC) is a control-theoretic framework for systems in which:

1. Control is defined as preservation of a viable equilibrium region, not optimization toward a goal
2. Learning occurs through memory-augmented selection, not policy parameter updates
3. Futures are generated speculatively and filtered, not directly commanded
4. Stability is a first-class constraint, not a penalty term

3.2 Key Properties

Property	Description
No explicit reward	Success is defined by equilibrium preservation, not scalar maximization
No goal state	The system inhabits a manifold of acceptable behaviors
No policy optimization loop	The controller and generator remain fixed during operation
Closed-loop, real-time, embodied	Continuous sensing and actuation without episodic boundaries
Memory affects selection, not execution	Experience biases choices without overwriting generation

3.3 The MAEC Equation

At its simplest, MAEC can be expressed as:

selected_future = SELECT(
    candidates = GENERATE(current_state, context),
    priors     = RETRIEVE(memory, current_state),
    constraints = EQUILIBRIUM_BOUNDS(current_state)
)

Where:
- `GENERATE` produces multiple speculative futures
- `RETRIEVE` queries episodic memory for relevant experience
- `SELECT` chooses from candidates biased by priors within constraints

---

4. The Four Components

An MAEC system is composed of four interacting subsystems operating at distinct time scales:

4.1 State Estimator

Role: Maps raw embodied signals into a compact latent representation describing the current expressive regime.

Key Insight: This state is not a positional description of the body. It is a regime descriptor encoding:
- Stability
- Tension
- Phase alignment
- Responsiveness
- Saturation risk

These variables evolve continuously and define the system's location within a manifold of viable behaviors.

In Computational Choreography: The LIM-RPS (Latent Integrating Model with Relaxation-Pursuit-Stability) state engine produces:
- `x_fast`: Reactive latent (responds quickly to motion)
- `y_slow`: Equilibrium latent (evolves slowly, defines "home")
- Beat phase and tempo estimates

4.2 Generative Future Proposer

Role: Produces multiple speculative future trajectories conditioned on current state.

Key Insight: These are not actions in the classical sense. They are short-horizon futures representing different plausible continuations. The proposer maintains diversity while respecting the local geometry of the state space.

In Computational Choreography: CC-MotionGen is a 116M parameter diffusion model that generates 8+ candidate motion trajectories (25D × T frames) conditioned on audio features.

4.3 Memory-Conditioned Selector

Role: Evaluates proposed futures using both instantaneous constraints and episodic memory.

Key Insight: Memory is organized as a structured archive of prior phrases annotated with outcome statistics:
- Sanity pass rate
- Musicality scores
- Saturation incidence
- Diversity contribution

Retrieval produces soft priors that bias selection without collapsing diversity.

In Computational Choreography: RAG++ MotionPhrase Service retrieves similar past experiences and builds:
- `context_vec`: Global conditioning for FiLM modulation
- `PrototypeCurves`: Time-varying energy/density/tension/stability targets
- `warm_start_latent`: Optional diffusion initialization

4.4 Equilibrium Controller

Role: Operates continuously, shaping the admissible region of future space by enforcing local constraints.

Key Insight: The controller does not select a single trajectory. It defines the region within which selection is allowed to occur. Constraints include:
- Smoothness (jerk bounds)
- Phase coherence (velocity-position consistency)
- Bounded energy (value clamping)
- Temporal continuity (quaternion stability)

In Computational Choreography: The Conductor (EchelonControlSurface) outputs semantic controls:
- `tempo_nudge`, `swing_amount`
- `density`, `tension`, `stability`, `raw_energy`
- `follow_vs_lead`

---

5. The Three Pillars

5.1 Pillar 1: State is Not Position—It is Regime

In computational choreography, the state is not where the body is, but how it is behaving.

The Conductor estimates:
- Stability
- Tension
- Phase coherence
- Responsiveness
- Saturation risk

These are not task variables. They are regime descriptors.

Control-theoretic implication: You are not stabilizing a point—you are stabilizing a manifold of acceptable behaviors.

This immediately separates MAEC from RL, which treats state as something to be escaped or progressed through toward reward. In MAEC, the state is something to inhabit.

5.2 Pillar 2: Control Shapes Admissible Futures, Not Actions

In RL, an action is:
- "Apply torque"
- "Choose token"
- "Move left"

In MAEC, control does not pick an action. It shapes:
- How aggressive transitions are allowed to be
- How much novelty is tolerable
- How tightly motion should phase-lock
- Whether the system should lead or follow

This is closer to modulating the curvature of future trajectories than commanding movement.

The Conductor outputs constraints and biases, not commands.

This makes MAEC a second-order control system:
- First order: motion unfolds
- Second order: the space of motion is shaped

RL does not do this. RL chooses actions; it does not sculpt future possibility space.

5.3 Pillar 3: Learning Modifies Selection Pressure, Not the Controller

This is the most radical departure.

In RL:
- Experience updates the policy
- The controller itself changes

In MAEC:
- The Conductor remains stable
- MotionGen remains expressive
- Learning lives in RAG++, which biases which futures are chosen

So experience does not rewrite behavior. It rewrites what is preferred among viable behaviors.

This is why the system improves without collapsing diversity.

It learns:
> "When the system feels like this, these kinds of futures tended to preserve equilibrium."

That is not reward maximization. That is survivability-weighted memory recall.

---

6. Mathematical Formulation

6.1 State Space

Let $\mathcal{S}$ be the regime state space with dimensions:
- Stability $s \in [0, 1]$
- Tension $\tau \in [0, 1]$
- Phase coherence $\phi \in [0, 1]$
- Responsiveness $\rho \in [0, 1]$
- Saturation risk $\sigma \in [0, 1]$

The current state $s_t \in \mathcal{S}$ is estimated by the State Estimator from raw sensor inputs.

6.2 Future Space

Let $\mathcal{F}$ be the space of K-step future trajectories: $$f \in \mathcal{F}: f = (m_1, m_2, \ldots, m_K)$$

where each $m_i \in \mathbb{R}^{25}$ is a motion frame.

6.3 Generation

The Generative Future Proposer produces $N$ candidate futures: $$\{f_1, f_2, \ldots, f_N\} = G(s_t, c_t)$$

where $c_t$ is the conditioning context (audio features).

6.4 Memory Retrieval

The Memory-Conditioned Selector retrieves relevant experience: $$\pi = R(s_t, \mathcal{M})$$

where $\mathcal{M}$ is the episodic memory and $\pi$ is a prior distribution over futures.

6.5 Selection

The selected future satisfies: $$f^* = \arg\max_{f \in \{f_1, \ldots, f_N\}} \left[ \pi(f) \cdot \mathbb{1}_{E}(f, s_t) \right]$$

where $\mathbb{1}_{E}(f, s_t)$ is the equilibrium indicator function that is 1 if $f$ satisfies all equilibrium constraints given state $s_t$.

6.6 Equilibrium Constraints

The equilibrium region $E(s_t)$ is defined by:
- Velocity coherence: $\|v - \dot{p}\| < \epsilon_v$
- Jerk bounds: $\|\dddot{p}\| < \epsilon_j$
- Quaternion continuity: $q_t \cdot q_{t+1} > \epsilon_q$
- Phase monotonicity: $\phi_{t+1} > \phi_t$ (mod 1)

---

7. The Convergence Principle

7.1 The Closed-Loop Convergence Theorem

Statement: A motion-conditioned generative music system will converge toward a specific human's musical identity over time without collapsing into repetition or instability if and only if four conditions are simultaneously true:

1. The present is stabilized before the future is explored
2. Futures are generated, not selected directly
3. Memory evaluates outcomes, not intentions
4. Memory influences policy, not raw generation

7.2 Condition 1: Stabilize Before Exploring

The Conductor exists to answer a single question at every moment:

> "Is the current trajectory stable enough to support exploration?"

Creative systems fail when they explore futures while the present is already unstable. That causes runaway behavior: tempo explosions, saturation, incoherence.

LIM-RPS enforces a fixed-point equilibrium. Motion is always interpreted relative to a slowly evolving center of gravity.

Without this, memory would amplify noise.

7.3 Condition 2: Generate, Don't Choose Directly

The system never asks: "Which phrase should I play next?"

Instead: "What are several plausible short-horizon futures given this body, this moment, and these constraints?"

Selection without generation produces brittle systems. Generation without selection produces chaos. MAEC separates them cleanly.

7.4 Condition 3: Evaluate Outcomes, Not Intentions

RAG++ does not reward phrases because they were "meant" to be expressive.

It rewards them because, historically, when phrases like this occurred:
- The system remained stable
- Saturation did not occur
- Musicality metrics held
- Repetition was avoided
- The body naturally continued moving

Memory is grounded in consequences, not goals. That makes the system robust.

7.5 Condition 4: Influence Selection, Not Generation

RAG++ never tells MotionGen what to output. It only changes the landscape in which futures are evaluated.

This means:
- MotionGen remains expressive
- The Conductor remains authoritative
- Exploration never stops
- Preference accumulates slowly

Because memory does not overwrite generation, the system cannot collapse into loops.
Because memory biases selection, the system cannot remain generic.

This is the balance point most generative systems miss.

---

8. Implementation in Computational Choreography

8.1 Component Mapping

MAEC Component	CC Implementation	Location
State Estimator	LIM-RPS + EchelonControlSurface	`cc_core/equilibria/`
Generative Proposer	CC-MotionGen (116M diffusion)	`cc_motiongen/model/`
Memory Selector	RAG++ MotionPhrase Service	`cc_core/policy/rag_motionphrase/`
Equilibrium Controller	Temporal coherence losses + Decoder	`cc_motiongen/training/`, `model/decoder.py`

8.2 Data Flow

┌─────────────────────────────────────────────────────────────────────┐
│                        MAEC Runtime Loop                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  1. Body Motion → Mocopi/IMU sensors                                │
│                     │                                               │
│                     ▼                                               │
│  2. State Estimator: LIM-RPS computes x_fast, y_slow, beat_phase   │
│                     │                                               │
│                     ▼                                               │
│  3. Conductor: Computes regime descriptors (stability, tension...)  │
│                     │                                               │
│                     ├────────────────────────┐                      │
│                     │                        │                      │
│                     ▼                        ▼                      │
│  4. CC-MotionGen: Generate K     5. RAG++: Retrieve similar        │
│     candidate futures               past experiences               │
│                     │                        │                      │
│                     │                        ▼                      │
│                     │            6. PriorBuilder: Create priors     │
│                     │               (context_vec, curves)           │
│                     │                        │                      │
│                     └────────────┬───────────┘                      │
│                                  │                                  │
│                                  ▼                                  │
│  7. Selection: Score candidates against priors + constraints       │
│                                  │                                  │
│                                  ▼                                  │
│  8. Decoder: Map to semantic motion (25D)                          │
│                                  │                                  │
│                                  ▼                                  │
│  9. Audio Renderer: Motion → Sound                                  │
│                                  │                                  │
│                                  ▼                                  │
│  10. Log outcomes → Update MotionPhrase Library                     │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

8.3 The 25-Dimensional Motion Representation

Each motion frame is a 25-dimensional vector:

Dims	Name	Description
0-2	Position	(x, y, z) spatial coordinates
3-5	Velocity	(vx, vy, vz) linear velocity
6-8	Acceleration	(ax, ay, az) linear acceleration
9-12	Quaternion	(w, x, y, z) rotation quaternion
13-15	Angular Velocity	(wx, wy, wz) rotational velocity
16	Phase	[0, 1] beat-aligned phase
17-24	Style	8D learned style embedding

---

9. Generalization Beyond Music

9.1 The Universal Pattern

Across domains, the system does the following:

1. Observe an embodied agent in motion
2. Estimate its current dynamical regime
3. Generate multiple plausible future trajectories
4. Retrieve memories of what worked in similar regimes
5. Bias selection toward futures that preserved flow and expressivity
6. Execute one future
7. Continue without resetting equilibrium

This is a general theory of embodied creativity.

9.2 Domain Mappings

Domain	State Variables	Generated Futures	Memory Content
Music/Dance	Energy, tension, phase	Motion trajectories	Phrase outcomes
Drawing	Pressure, continuity, spatial flow	Stroke sequences	Brush histories
Filmmaking	Camera energy, cut frequency	Camera paths	Shot histories
Robotics	Stability, responsiveness	Motion plans	Task outcomes
Conversation	Engagement, coherence, pacing	Response candidates	Dialog histories

9.3 Why Music First

Music is the easiest place to see MAEC because rhythm, phase, and energy are already explicit. The body's state is legible. The failure modes are perceptually immediate.

But the architecture is substrate-agnostic. Only two things change:
- The sensors that define the latent state
- The renderer that realizes the selected future

Everything else—the math, the logic, the control structure—stays the same.

---

10. Comparison to Related Work

10.1 MAEC vs. Reinforcement Learning

Aspect	RL	MAEC
Objective	Maximize scalar reward	Preserve equilibrium
Learning	Policy gradient updates	Memory accumulation
Exploration	Action execution	Speculative generation
Episodes	Finite, reset allowed	Infinite, no resets
Failure	Suboptimal reward	Perceptual breakdown

10.2 MAEC vs. Model Predictive Control

Aspect	MPC	MAEC
Future simulation	Deterministic rollouts	Stochastic generation
Optimization	Cost minimization	Constraint satisfaction + selection
Horizon	Fixed receding	Variable, context-dependent
Memory	None	Episodic, outcome-annotated

10.3 MAEC vs. Retrieval-Augmented Generation

Aspect	RAG	MAEC (RAG++)
Retrieved content	Text/documents	Outcome-annotated trajectories
Injection point	Generation input	Selection criteria
Closed-loop	No	Yes
Real-time	No	Yes

---

11. Conclusion

11.1 Summary

Memory-Augmented Equilibrium Control (MAEC) formalizes a class of systems in which:

Control is defined as preservation of expressive viability, not optimization
Learning occurs through memory-conditioned selection, not policy updates
Futures are generated speculatively and filtered, not commanded directly
Stability is a first-class constraint, not a penalty term

11.2 Why This Matters

MAEC enables:
- Stable, adaptive, creative behavior in real-time embodied systems
- Personalization without retraining the generative model
- Continuous operation without episodic collapse
- Multi-dimensional coherence that resists scalar reduction

11.3 The Broader Implication

Choreography is not about reaching a pose. It is about staying alive inside motion.

This architecture encodes that truth computationally.

You didn't build a smarter agent. You built a system that knows how not to break itself while creating.

That's a new control-theoretic category.

---

References

1. LIM-RPS: Latent Integrating Model with Relaxation-Pursuit-Stability (internal)
2. CC-MotionGen: Audio-Conditioned Motion Diffusion (this repository)
3. RAG++ MotionPhrase: Retrieval-as-Policy-Memory (this repository)
4. Diffusion Models: Ho et al., "Denoising Diffusion Probabilistic Models" (2020)
5. FiLM Conditioning: Perez et al., "FiLM: Visual Reasoning with a General Conditioning Layer" (2018)

---

Document generated for the Computational Choreography project.
Last updated: December 2025

Promotion Decision

Convert into the standard paper schema, add citations, and render a draft PDF.

Source Anchor

projects/Documentation/01-architecture/MAEC_FRAMEWORK.md

Detected Structure

Abstract · Method · Evaluation · References · Math · Code Anchors · Architecture

Full Public Reader

Memory-Augmented Equilibrium Control (MAEC)

Abstract

Table of Contents

1. Motivation and Scope

1.1 The Problem Space

1.2 What Makes These Systems Different

1.3 The Core Insight

2. Why Existing Paradigms Fail

2.1 Classical Control Theory Assumptions

2.2 Reinforcement Learning Assumptions

2.3 Model Predictive Control Assumptions

2.4 The Irreducibility of MAEC

3. The MAEC Framework

3.1 Definition

3.2 Key Properties

3.3 The MAEC Equation

4. The Four Components

4.1 State Estimator

4.2 Generative Future Proposer

4.3 Memory-Conditioned Selector

4.4 Equilibrium Controller

5. The Three Pillars

5.1 Pillar 1: State is Not Position—It is Regime

5.2 Pillar 2: Control Shapes Admissible Futures, Not Actions

5.3 Pillar 3: Learning Modifies Selection Pressure, Not the Controller

6. Mathematical Formulation

6.1 State Space

6.2 Future Space

6.3 Generation

6.4 Memory Retrieval

6.5 Selection

6.6 Equilibrium Constraints

7. The Convergence Principle

7.1 The Closed-Loop Convergence Theorem

7.2 Condition 1: Stabilize Before Exploring

7.3 Condition 2: Generate, Don't Choose Directly

7.4 Condition 3: Evaluate Outcomes, Not Intentions

7.5 Condition 4: Influence Selection, Not Generation

8. Implementation in Computational Choreography

8.1 Component Mapping

8.2 Data Flow

8.3 The 25-Dimensional Motion Representation

9. Generalization Beyond Music

9.1 The Universal Pattern

9.2 Domain Mappings

9.3 Why Music First

10. Comparison to Related Work

10.1 MAEC vs. Reinforcement Learning

10.2 MAEC vs. Model Predictive Control

10.3 MAEC vs. Retrieval-Augmented Generation

11. Conclusion

11.1 Summary

11.2 Why This Matters

11.3 The Broader Implication

References

Promotion Decision

Source Anchor

Detected Structure