Grand Diomande Research · Full HTML Reader

AGP TRIBE-Inspired Concrete Spec

This document turns the earlier TRIBE V2 analogy into a direct AGP design. The goal is not to imitate the neuroscience output target. The goal is to adopt the training and systems pattern that made TRIBE effective: mostly frozen encoders, a learned temporal fusion core, identity-conditioned routing, a low-rank bottleneck, and multiple specialized prediction heads. In the AGP stack, those ideas become a unified architecture for language, motion, trajectory reasoning, semantic projection, and cross-host transfer.

Embodied Trajectory Systems research note experiment writeup candidate score 18 .md

Full Public Reader

AGP TRIBE-Inspired Concrete Spec

Date: `2026-04-16`

The core model family remains a transformer. The base language trunk is `Gemma 4 E2B`, trained first as a domain-adapted backbone. Everything else wraps around that trunk as a typed latent architecture. The system should therefore be described as a hierarchical transformer system with multimodal fusion and hardware-aware routing rather than as a brand-new model class.

The first module is `TextBackboneGemma`. This is the stage-one `Gemma 4 E2B` LoRA backbone trained on your high-signal corpus. It owns token embeddings, the decoder stack, and the primary hidden-state manifold that later modules read. It runs in `MLX` on GPU during training. In the near term it lives on `Mac4` for stage-one backbone work, then on both `Mac4` and `Mac5` during Thunder data-parallel training. During inference it becomes the main language trunk and its intermediate hidden states feed the rest of the AGP stack.

The second module is `SensorFrontEnd`. This is the non-text multimodal encoder family for motion and embodied state. It includes `PoseEncoder` for MediaPipe or body landmarks, `IMUEncoder` for iPhone motion streams, `MocopiEncoder` for skeleton data, and later optional `AudioEventEncoder` and `VisionSceneEncoder` for environmental context. In the Echelon world, these modules replace TRIBE’s frozen video and audio encoders. They should stay mostly frozen or lightly adapted once they are stable. Their role is not to generate language. Their role is to produce temporally aligned evidence streams for the shared fusion model.

The third module is `TraceEncoder`. This is the structured behavioral encoder for agent trajectories. It takes tool calls, file diffs, command traces, patch summaries, git context, and dialogue turns, and maps them into a temporal sequence suitable for fusion. In KARL terms, this is the front-end that converts behavior into a cognitive trajectory. In the AGP stack it acts like the behavioral sibling of the motion encoders.

The fourth module is `IdentityBank`. This is the AGP analogue to TRIBE’s subject embeddings. It should contain at least four learned embedding families. `UserEmbedding` captures your overall behavioral and semantic baseline. `SessionEmbedding` captures short-term context and active mode. `ProjectEmbedding` captures the current project’s style and domain prior. `BodySignatureEmbedding` captures movement-specific baseline when the system is operating in embodied mode. These embeddings are injected into the fusion core and router so the same raw evidence can be interpreted differently depending on who is acting, what project is active, and what behavioral regime is current.

The fifth module is `TemporalFusionCore`. This is the direct TRIBE analogue to the temporal fusion transformer. It receives aligned sequences from `TextBackboneGemma`, `SensorFrontEnd`, and `TraceEncoder`, plus the relevant identity embeddings. It should be a modest transformer, trained after the stage-one backbone, and responsible for producing a shared latent over time rather than for generating text directly. This is where cross-modal interaction happens. It is also where modality dropout should be introduced so the fused latent remains robust when motion, audio, tool context, or some other evidence stream disappears.

The sixth module is `RouteMoE`. This is the soft expert router. It replaces brittle threshold logic with learned weighting over expert subpaths. Its job is not limited to one domain. In the motion stack it performs soft sigil routing over `stabilize`, `transition`, `recover`, and related claim families. In the language stack it performs routing over route actions such as `accept_local`, `continue_local`, `revive_local`, and `escalate`. In the semantic stack it performs soft activation over primitive and invariant experts. This is the place where TRIBE’s per-parcel ensemble logic becomes one of the central AGP ideas.

The seventh module is `CompressionBottleneck`. This is a low-rank bottleneck that sits after temporal fusion and before downstream specialist heads. It should force the system to retain only predictive, transferable structure. In stage two and three this helps denoise routing and semantic decisions. In later transfer stages this bottleneck becomes the canonical learned packet that can be transported across hosts. This is also the exact place where `TurboQuant` will plug in. First the bottleneck is learned in floating-point form. Then TurboQuant compresses it for cross-host transport.

The eighth module family is `SpecialistHeads`. These heads all consume the fused latent, the route weights, and the bottlenecked state. The first head is `VitalityHead`, which predicts whether the current hidden state is alive, weak, dead, or needs revival. The second is `RouteHead`, which predicts whether the system should stay local, continue deeper, revive locally, or escalate. The third is `SemanticHead`, which predicts primitives, invariants, and bundle neighborhoods aligned with the semantic kernel. The fourth is `EquilibriumHead`, which predicts motion equilibrium and downstream embodied control targets for Echelon. The fifth is `TrajectoryRewardHead`, which predicts behavioral quality metrics and likely effective next actions for KARL-style trajectory evaluation. The sixth is `TransferHead`, which produces the learned continuation packet used by AGP-PTP.

The ninth module is `TransferAdapter`. This is the first explicitly distributed systems module. It wraps the compression bottleneck and the transfer head into an encode/decode path. On the sending side it consumes the fused latent, route state, vitality state, and optional semantic state, and emits a compact packet. On the receiving side it reconstructs a continuation-ready latent. This is the module that makes cross-host continuation meaningful. It should not be trained until the stage-one backbone and the route/vitality heads are already stable.

The tenth module is `EnginePlacementLayer`. This is not a neural module but a runtime specification for where each module should live. During early training, `TextBackboneGemma`, `TemporalFusionCore`, `RouteMoE`, and the specialist heads live on GPU through `MLX`. During stage-one Thunder training, `Mac4` and `Mac5` both host the text backbone in data parallel and train through the Thunder ring backend. During later inference, `RouteHead`, `VitalityHead`, and potentially `SemanticHead` become `ANE` candidates, because they are small, repeated, and projection-heavy. `CompressionBottleneck` and `TransferAdapter` become `TurboQuant` candidates because they define the state we want to transport over `Thunderbolt 5`.

The stage layout should now be explicit. `Stage 1` is `TextBackboneGemma` only, trained as a Thunder LoRA backbone on `Mac4 + Mac5`. `Stage 2` adds `TemporalFusionCore` and `IdentityBank`, initially in text-plus-trace mode before full multimodal motion input is required. `Stage 3` adds `VitalityHead` and `RouteHead`. `Stage 4` adds `SemanticHead` and kernel-aligned supervision. `Stage 5` adds `CompressionBottleneck` and `TransferAdapter`. `Stage 6` introduces `ANE` deployment for shallow heads and `TurboQuant` transport for the bottleneck packet.

The hardware map follows directly from that curriculum. `Mac4` is the primary training coordinator and one of the two Thunder ranks for stage one. `Mac5` is the second Thunder rank and later becomes the first corrective/deeper host for transfer experiments. `Thunderbolt 5` is the training and transport fabric between them. `MLX` on GPU is the primary training runtime. `ANE` is deferred until the shallow heads stabilize, because it is a deployment optimization and not a wise first training surface. `TurboQuant` is deferred until the transfer packet exists, because quantizing an undefined latent is premature.

The training objective also becomes cleaner under this architecture. Stage one still optimizes standard next-token SFT loss for `TextBackboneGemma`. Stage two adds temporal-fusion alignment losses across text and trajectory inputs plus identity-conditioning regularization. Stage three adds vitality and route classification losses. Stage four adds semantic multi-label losses and sparse regularization. Stage five adds transfer reconstruction and continuation-fidelity losses. The important change from the old AGP framing is that the fusion core and identity bank become first-class modules rather than diffuse ideas.

The immediate practical implication is simple. The very next real training job should still be the stage-one `Gemma 4 E2B` backbone, but it should now be understood as the bottom layer of this full architecture, not as a disconnected LoRA run. Once that stage-one adapter exists, the rest of this TRIBE-inspired stack has a concrete trunk to attach to.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/docs/research/agp-tribe-inspired-concrete-spec.md

Detected Structure

Method · Evaluation · Architecture