ARCHITECTURE LENS — what we judge every paper against
1. **FIT** — which domain(s) below does it touch, and which named system of ours is the counterpart? 2. **DELTA** — what does the paper do that our counterpart does NOT do (and vice versa)? 3. **VERDICT** — one of: - `ABSORB` — their technique is better on some axis; name the exact file/module where it lands and what changes. - `TEST` — we built something comparable; define the head-to-head (their benchmark or ours, what metric, what would count as a win). - `RIVAL` — we already built something arguably ahead or di
Full Public Reader
# ARCHITECTURE LENS — what we judge every paper against
> Version 1.0 — 2026-06-12. This is the profile of Mohamed's live systems.
> Every daily paper is mapped against these domains. If a paper doesn't touch
> any domain, it is SKIP unless it is foundational enough to create a new domain
> (in which case: propose the new domain in the report).
How to use this file
For each candidate paper, answer four questions in order:
1. FIT — which domain(s) below does it touch, and which named system of ours is the counterpart?
2. DELTA — what does the paper do that our counterpart does NOT do (and vice versa)?
3. VERDICT — one of:
- `ABSORB` — their technique is better on some axis; name the exact file/module where it lands and what changes.
- `TEST` — we built something comparable; define the head-to-head (their benchmark or ours, what metric, what would count as a win).
- `RIVAL` — we already built something arguably ahead or different-but-stronger; write the claim with evidence (commits, metrics, dates).
- `WATCH` — relevant, no action yet; state the trigger that would upgrade it to ABSORB/TEST.
- `SKIP` — no domain fit.
4. ACTION — for ABSORB/TEST: one concrete first step small enough to start the same week.
A verdict without a named system, a named file, or a falsifiable claim is invalid.
---
## D1 — Agent skills: libraries, induction, typing, routing
Our systems: SOOP-2 skills operating system (296 typed skills, `SKILL_TYPES_v1` 6-category algebra, `skill-typecheck` linter), SEA two-tier router (Tier 1 recall@30 = 1.00 on 214 skills, Tier 2 twin-primary scorer on Mac4:8100), `skill-forge` (auto-generates skills from session pattern mining), Cortex rule promotion.
Where they live: `[home-path]`, `[home-path]`, `[home-path]`.
What would beat us: automatic skill induction with verified composition guarantees; skill libraries that self-prune by utility; routing that beats recall@30=1.00 at lower cost; typed-composition checking richer than our 6-category algebra.
Known rivals already mapped: SkillDAG, SkillOpt, MUSE-Autoskill, GraphOfSkills, SkillsBench (see `Desktop/code4ai-analysis/ANALYSIS.md`).
## D2 — Trajectory reward, agentic RL, SFT from agent traces
Our systems: KARL reward engine (6-signal composite, 3203 rescored records, score-at-emit via `flows-karl-writer`), cognitive twin SFT pipeline (`cognitive-forge`, 1049 SFT examples from distilled trajectories), trajectory cards from gateway events.
Where they live: `Desktop/karl/`, `[home-path]`.
What would beat us: process-reward models that outperform composite heuristic signals; trajectory filtering/credit-assignment methods with measured downstream SFT gains; online RL from agent traces that doesn't need human labels.
## D3 — Agent memory, context management, continual learning
Our systems: Cortex (rule promotion/decay), Memory Guardian (invariance-locked files), file-based auto-memory with topic files + index, sleep-sync synthesis, context-recovery MCP, post-compaction continuation.
Where they live: `[home-path]`, `[home-path]`, `[home-path]`.
What would beat us: memory consolidation with measured retrieval gains over append-only topic files; continual-learning methods that beat re-prompting; graph/episodic memory beating our flat index at recall.
## D4 — Multi-agent orchestration, harness design, autonomous loops
Our systems: the mesh (Mac1-5 + cloud-vm + K11, tmux pane orchestration, NATS JetStream events), ELP-2 everlasting-loop supervisor (scoreboard + dispatcher + deadman cron), Pulse autonomous dev sessions, chain skills (`chain:full-omega` = crucible→omega→hydra), Prefect flow migration, multi-wake `/loop` autopilot (shipped skate-wind phases 0-3 unattended).
Where they live: `[home-path]`, `[home-path]`, `[home-path]`.
Operating thesis (from code4AI ingest): intelligence is migrating into the deterministic harness; minimal harness ≥ heavyweight frameworks.
What would beat us: harness designs with measured reliability gains over deterministic supervisors; multi-agent coordination that beats single-agent-with-good-harness on real tasks; verified inter-agent protocols.
## D5 — Speech, low-resource ASR, phonology, writing systems
Our systems: N'Ko program — AGP 20
Where they live: `Desktop/nko-brain-scanner/`, `Desktop/nko-acoustic-coding/`, `Desktop/NKO-CONSOLIDATION.md`.
What would beat us: low-resource ASR below ~20
## D6 — On-device inference, quantization, NPU serving
Our systems: ADR-001 ANE + TurboQuant serving plan for N'Ko Whisper (AGP-aware quantization as differentiator), whisper.cpp xcframework path in NKoScribe, train/serve feature-extractor consistency law (1500-frame vs 375-frame anchor bug).
What would beat us: quantization schemes beating TurboQuant-class methods on ASR; NPU-targeted attention variants; sub-1GB speech models at our quality bar.
## D7 — Motion, pose, real-time generative visuals
Our systems: MotionMix (multi-iPhone camera mesh + latent diffusion control), SAN (audio-mel ↔ pose trajectories, 22.5K aligned frames staged), LUME stem-live (body → music on K11), BodyBurst Unity VFX from mocopi bones, MediaPipe Femto Bolt pipeline, Brush 3DGS on AMD, the 7-scalar anticipation geometry (hidden unifier across motion/speech/vocabulary).
What would beat us: real-time body-conditioned generation under our latency; audio↔motion joint models beating per-modality pipelines; cheap 3DGS/4D capture beating Brush-on-780M.
## D8 — Evaluation, LLM-as-judge, benchmarks for agents
Our systems: SEA Tier 2 twin scorer, meta-review fixpoint chains (round-1 → contrarian → round-2), meta:amr adversarial debate, KARL reward as offline judge, recall benchmarks.
What would beat us: judge-reliability methods that beat adversarial-debate synthesis; agent benchmarks our chains should be run against (these are TEST candidates by default — we can submit our harness).
## D9 — Proactivity, ambient agents, when-to-speak
Our systems: `cortex:watch` silent-default monitor (O1/O2 from the Google Labs proactivity paper closed; ~95
What would beat us: learned when-to-notify policies with measured fatigue reduction; anything beyond threshold functions.
## D10 — Representation: meaning/form disentanglement, interlingua
Our systems: N'Ko writable-interlingua thesis (z_meaning shared / z_form=N'Ko, sound-isomorphism vs meaning axis), the SOUND→MEANING ladder (pivot-lang scaffold → learned latent engine → disentanglement).
What would beat us / what we need: content-form factorization results in speech or text; multilingual meaning-space papers with probing evidence; anything making the meaning axis trainable at Manding data budgets.
---
## Standing priorities (tie-breakers when triage is full)
1. D2 + D8 (KARL/judging) — the active training loop, highest absorption surface.
2. D1 + D4 (skills/harness) — the operating thesis; we believe we're near-SOTA, so RIVAL/TEST verdicts are valuable here.
3. D5 + D10 (N'Ko) — flagship research program; absorb anything that moves the meaning axis.
4. D3, D7, D6, D9 — steady-state; absorb opportunistically.
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
sota-loop/lens/ARCHITECTURE_LENS.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture