The Anticipatory Transformer: Geometry-Steered Attention for Trajectory-Aware Reasoning

Full HTML reader

Read the full artifact

Extracted abstract or opening context

Standard transformers attend based on learned position encodings (sinusoidal, RoPE, ALiBi) that encode *where* tokens are in a sequence but not *what the sequence is doing* as a geometric process. I introduce the Anticipatory Transformer, a modified transformer architecture where seven geometric scalars derived from Anticipation Geometry (commitment, uncertainty, transition pressure, recovery margin, phase stiffness, novelty, stability) steer the multi-head attention mechanism via additive bias. The trajectory bias is computed by a learned network that maps the seven scalars at each position to per-head, position-dependent attention biases, enabling different heads to specialize to different geometric dimensions of the reasoning trajectory. I also introduce the CommitmentGate, a threshold-based mechanism that determines *when* to emit tokens: when the model's predicted commitment is below a learned threshold, it buffers hidden states and defers emission, enabling variable-rate generation that mirrors the deliberative pauses of human reasoning. The architecture further incorporates a dual-pathway design: a fast pathway with local windowed attention (128-token window, updated every token) for high-frequency pattern capture, and a slow pathway with global attention (full context) for long-range dependency modeling. In smoke tests on a 678,206-parameter model trained for 50 steps on synthetic data, the commitment gate achieves +0.93 correlation with the commitment scalar, attention heads specialize to 3 out of 4 unique dominant scalars, scalar prediction MSE drops from 0.15 to 0.07, and the orthogonality penalty converges to 0.005. I present this as a complete, implemented architecture with preliminary validation, not as a benchmark-breaking result. I argue that the trajectory-bias mechanism is suited for three application domains where standard position encodings are insufficient: agent reasoning over multi-step plans, multi-hop knowledge graph traversal, and real-time motion-to-audio synthesis.

Promotion decision

What has to happen next

Convert into the standard paper schema, add citations, and render a draft PDF.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.