AGP TurboQuant + Apple Neural Engine Performance Plan

Full HTML reader

Read the full artifact

Extracted abstract or opening context

This plan turns the local TurboQuant and Apple Neural Engine research into an executable AGP performance lane. The goal is not to add accelerator names to the paper. The goal is to prove which parts of AGP become faster, smaller, or more energy efficient when the system uses the right engine for the right class of computation. - `Desktop/cog-rlm/scripts/turboquant.py` - `Desktop/cog-rlm/scripts/ane_bridge.py` - `Desktop/cog-rlm/scripts/ane_lora_mil.py` - `Desktop/cog-rlm/scripts/ane_trainer.py` - `Desktop/cog-rlm/scripts/ane_whisper_spike.py` - `Desktop/cog-rlm/scripts/ane_mlx_train.py` TurboQuant belongs at the compression boundary, not at the acoustic model boundary. It should compress embedding indexes, hidden-state transfer packets, and AGP-PTP payloads. Its role is state mobility under bounded distortion. The Apple Neural Engine belongs at the reflex boundary, not at the full-transformer training boundary. It should run compact projection-heavy heads: route, vitality, semantic projection, sigil or partition classifiers, and later frozen-forward projection kernels if the private MIL path remains stable. Its role is low-power repeated inference, not replacing MLX/GPU training. - `4-bit`: about `0.993` cosine on real `768D` embeddings. - `4-bit`: about `0.95-0.98` recall@10 on small real samples. - `4-bit`: about `5.9x` compression versus fp32. - `8-bit`: effectively lossless retrieval behavior in the small eval.

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.