Back to corpus
working paperpreprint structure candidatescore 96

Recursive Polymodal Synthesis: A Framework for Real-Time Computational Choreography Through Multi-Modal Sensor Fusion

We present Recursive Polymodal Synthesis (RPS), a framework for real-time computational choreography that achieves robust multi-modal sensor fusion through iterative proximal updates with spectral norm constraints, and couples that embodied state to a phrase-conditioned spectrogram diffusion backend for audio generation. The system integrates kinematic, physiological, and rhythmic data streams into a unified embodied representation that drives either smooth control signals or direct audio synthesis in real time. Ou

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

We present Recursive Polymodal Synthesis (RPS), a framework for real-time computational choreography that achieves robust multi-modal sensor fusion through iterative proximal updates with spectral norm constraints, and couples that embodied state to a phrase-conditioned spectrogram diffusion backend for audio generation. The system integrates kinematic, physiological, and rhythmic data streams into a unified embodied representation that drives either smooth control signals or direct audio synthesis in real time. Our approach addresses three fundamental challenges in embodied interaction systems: maintaining cross-modal coherence under partial observability, generating temporally coherent responses at multiple timescales, and operating within strict latency budgets. Through modality-specific encoders, cross-modal translators, and proximal fixed-point iteration, we obtain high cross-modal coherence on synthetic validation data, while the phrase-conditioned diffusion + conductor stack achieves library-faithful audio generation evaluated with objective and perceptual metrics on real recordings. End-to-end, the system processes sensor inputs with 15–40 ms control latency and supports bar-ahead audio rendering with ~0.5–1.0 s prebuffer for stage performance. We report cross-modal coherence, beat-alignment error, key stability, spectral bandwidth/flatness, Fréchet Audio Distance, and human listening tests, and analyze trade-offs between model capacity, computational efficiency, and perceived expressivity. The framework is extensible to additional modalities and applications beyond computational choreography, including human–robot interaction, adaptive gaming interfaces, and assistive technologies.

Promotion decision

What has to happen next

Convert into the standard paper schema, add citations, and render a draft PDF.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.