Mohamed Diomande

Full HTML reader

Read the full artifact

Extracted abstract or opening context

**AGP: Anticipatory Geometry Partitioning for Semantically Routed Distributed Transformer Inference on Heterogeneous Apple Silicon** **AGP: A Hierarchical Transformer Architecture for Routed Hidden-State Transfer** We introduce **Anticipatory Geometry Partitioning (AGP)**, a transformer-centered systems architecture that treats intermediate hidden states not as disposable internals of a monolithic forward pass, but as operational interfaces for conditional computation, semantic control, and cross-device continuation. In standard decoder-only transformer inference, every token traverses essentially the same depth, on the same device, under the same compute budget, regardless of whether the model already possesses a sufficient internal estimate of the answer. AGP challenges that assumption. It augments a base language model with a learned control stack that predicts, from intermediate representations, whether the current state should be accepted locally, resumed from a later boundary, revived locally, or escalated to a deeper corrective path. In our current implementation, the base model is a Thunder-trained `Gemma 4 E2B` backbone in `MLX`; the control stack consists of route, vitality, and earliest-layer heads; and the transfer stack consists of a learned same-host latent adapter that reconstructs final hidden-state targets from selected source-layer states. The central claim of AGP is that transformer hidden states can become **typed scheduling objects**. Rather than spending full-depth computation uniformly, the system learns a small policy over latent sufficiency. This policy is informed by an anticipation-inspired control geometry: the model does not merely ask “what token comes next,” but “what kind of state is this, how alive is it, how far is it from semantic sufficiency, and what is the cheapest safe continuation path?” In the full architecture, these judgments are made by a hierarchy of lightweight heads that estimate route action, vitality, and acceptable boundary. The route layer determines whether computation should terminate locally, continue locally from a late layer, or escalate to a corrective path. The vitality layer estimates whether the latent state is healthy, weak, or in need of revival. The boundary layer estimates where a continuation boundary should be drawn. Together, these components convert a decoder transformer into a runtime that allocates compute according to the geometry of the current hidden state rather than the static index of the final layer. AGP is designed explicitly for heterogeneous Apple hardware. The dense model and trainable adapters live in `MLX`, taking advantage of unified-memory GPU execution on Apple silicon. Two hosts, `Mac4` and `Mac5`, are connected through `Thunderbolt 5`, which supplies the trans

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.