Recursive Polymodal Synthesis for Real-Time Embodied Interaction: A Contraction-Based Framework with Provable Convergence
We present a mathematically rigorous framework for multi-modal sensor fusion in real-time embodied interaction systems, coupled to a phrase-conditioned spectrogram diffusion backend for direct audio generation. Our approach, termed Recursive Polymodal Synthesis (RPS), addresses the fundamental challenge of fusing heterogeneous sensor modalities with different noise characteristics, sampling rates, and semantic meanings into a coherent internal representation suitable for generative control. The key innovation is a
Full HTML reader
Read the full artifact
Extracted abstract or opening context
Promotion decision
What has to happen next
Convert into the standard paper schema, add citations, and render a draft PDF.
Why this is not always a full paper yet
Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.