AGP TRIBE-Inspired Concrete Spec

Full HTML reader

Read the full artifact

Extracted abstract or opening context

This document turns the earlier TRIBE V2 analogy into a direct AGP design. The goal is not to imitate the neuroscience output target. The goal is to adopt the training and systems pattern that made TRIBE effective: mostly frozen encoders, a learned temporal fusion core, identity-conditioned routing, a low-rank bottleneck, and multiple specialized prediction heads. In the AGP stack, those ideas become a unified architecture for language, motion, trajectory reasoning, semantic projection, and cross-host transfer. The core model family remains a transformer. The base language trunk is `Gemma 4 E2B`, trained first as a domain-adapted backbone. Everything else wraps around that trunk as a typed latent architecture. The system should therefore be described as a hierarchical transformer system with multimodal fusion and hardware-aware routing rather than as a brand-new model class. The first module is `TextBackboneGemma`. This is the stage-one `Gemma 4 E2B` LoRA backbone trained on your high-signal corpus. It owns token embeddings, the decoder stack, and the primary hidden-state manifold that later modules read. It runs in `MLX` on GPU during training. In the near term it lives on `Mac4` for stage-one backbone work, then on both `Mac4` and `Mac5` during Thunder data-parallel training. During inference it becomes the main language trunk and its intermediate hidden states feed the rest of the AGP stack. The second module is `SensorFrontEnd`. This is the non-text multimodal encoder family for motion and embodied state. It includes `PoseEncoder` for MediaPipe or body landmarks, `IMUEncoder` for iPhone motion streams, `MocopiEncoder` for skeleton data, and later optional `AudioEventEncoder` and `VisionSceneEncoder` for environmental context. In the Echelon world, these modules replace TRIBE’s frozen video and audio encoders. They should stay mostly frozen or lightly adapted once they are stable. Their role is not to generate language. Their role is to produce temporally aligned evidence streams for the shared fusion model. The third module is `TraceEncoder`. This is the structured behavioral encoder for agent trajectories. It takes tool calls, file diffs, command traces, patch summaries, git context, and dialogue turns, and maps them into a temporal sequence suitable for fusion. In KARL terms, this is the front-end that converts behavior into a cognitive trajectory. In the AGP stack it acts like the behavioral sibling of the motion encoders.

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.