SOOP-2 Architecture
This document is the authoritative architecture reference for SOOP-2. It describes what exists today (SEA at ~40% shipped), what SOOP-2 adds, and exactly how the pieces connect. Every section maps to at least one acceptance criterion from the launch checkpoint.
Full Public Reader
# SOOP-2 Architecture
> Skills Operating System Phase 2
> Status: APPROVED DESIGN | Date: 2026-05-12 | Version: 1.0
This document is the authoritative architecture reference for SOOP-2. It describes what exists today (SEA at ~40
The Rail document at `Desktop/crucible-output/soop-2/03-rail/EXECUTION-PLAN.md` is the execution companion. This document explains _why_. The Rail explains _when_ and _who_.
---
Table of Contents
1. System layers (top to bottom)
2. Type system as cross-cutting concern
3. The 6 categories with examples
4. Meta-review contrarian as a first-class SEA entity
5. Silent return type
6. Closed feedback loop topology
7. Mesh assignment
8. EW invariant alignment
9. Anti-patterns to avoid
10. Architecture decisions log
---
1. System layers (top to bottom)
The full stack from user intent to feedback reward has 10 layers. Each layer is described with its current state, what SOOP-2 changes, and its integration surface.
---
Layer 1: Slash command / user intent
What it is: The entry point. Mohamed types `/meta-review`, `/cortex:audit`, `/tie:gen`, etc. in a Claude Code session. Claude Code's slash-command resolver looks up the skill from `[home-path]`.
Current state: 213 skills exist under `[home-path]`. Skill listing has a char budget; skills past the tail get dropped (the "dropped skills" problem). No type metadata exists in frontmatter today.
SOOP-2 changes: After SOOP-2, every SKILL.md has 6 typed frontmatter fields (see Section 2). The listing budget problem is partially mitigated by the Tier 1 retrieval shim (Layer 3), which injects relevant skills on demand without relying on the listing window.
Integration point: Slash command invocation is the trigger for Layers 2 and 3 (pre-hook routing) and Layers 8-10 (post-hook feedback capture).
---
Layer 2: Pre-invoke hook chain
What it is: `[home-path]` contains `UserPromptSubmit` and `PreToolUse` hooks that fire before Claude processes the prompt. The context budget hook and memory guardian hook live here.
Current state: `context-budget/budget.py` fires on every submit. `memory-guardian/guardian.py` blocks protected file violations. `plan-review-gate/twin_inject.py` injects the cognitive twin opinion as a Stop hook (fires _after_ response, not before).
SOOP-2 changes: No structural changes to this layer. The silent return path (Section 5) adds a swallow step inside `twin_inject.py` — if the skill returns `{"silent": true}`, the Stop hook logs to `silent_returns.jsonl` and emits nothing to the session.
Integration point: Feeds into Layer 3 (Tier 1 retrieval runs as part of UserPromptSubmit hook) and Layer 9 (feedback capture fires in Stop hook).
---
Layer 3: Tier 1 router (embedding similarity)
What it is: `[home-path]` embeds the user query and does cosine-similarity retrieval against the 213-skill embedding index. Returns top-K=30 candidate skills, ranked by score.
Current state: 60
SOOP-2 changes (Track C):
- Re-embed all 213 SKILL.md files (not just 13) using query-dominant text. Output: `skills_embeddings_v2.parquet`.
- Extend ranking score: `score = α cosine_similarity + β type_compatibility`. Default α=0.7, β=0.3. Knob lives in `[home-path]`.
- Build labeled benchmark: ≥50 labeled queries, gold-set top-K, stored at `[home-path]`.
- Gate: recall@30 ≥ 0.80 on bench before shipping.
Integration point: Output (top-K=30 candidates) feeds Layer 4 (Tier 2 scorer). Type compatibility weight (Layer 2's type fields) is the new ranking signal.
---
Layer 4: Tier 2 scorer (LLM re-ranking)
What it is: Takes top-K=30 candidates from Layer 3 and LLM-scores each against the session context, returning a ranked shortlist (usually top-2 for injection).
Current state: MiniMax-M2.5 (229B TQ1_0) at `localhost:18080`, P50 latency 3.8s. Sequential scoring of 50 candidates would take approximately 190s. This is an accepted latency today because injection is async (Stop hook, not blocking main response).
SOOP-2 changes (Track D):
- Cognitive twin at `mac4:8100` replaces MiniMax as primary scorer.
- Twin sees session geometry (7 scalars) + inscription sequence + type signature of each candidate skill.
- MiniMax becomes fallback: if twin responds with 5xx or times out, MiniMax handles the batch for that window.
- Circuit breaker: if twin has been down >60s, auto-fallback for that window, log the flip to `[home-path]`.
- Ship condition: twin recall ≥ MiniMax recall on 100-query side-by-side.
Integration point: Output (shortlist ≤2 skills) feeds Layer 5 (injection compositor). Circuit-breaker state feeds Layer 9 (the pattern learner needs to know when twin was degraded).
---
Layer 5: Injection compositor
What it is: The gatekeeper for what actually gets injected into the session footer. Enforces budget limits so injection doesn't overwhelm context.
Current state: `twin_inject.py` Stop hook does single-injection. The full compositor (2-injection max, 600 char budget, cooldown, family limits) was designed in the SEA v1 Evo³ analysis but the code location is unclear per the audit.
SOOP-2 changes:
- Implement full compositor rules: max 2 injections per message, 600 char combined budget, per-skill cooldown (configurable), per-family limit (max 1 from same skill family in a single injection).
- Silent return handling: if skill returns `{"silent": true}`, compositor swallows the injection entirely and logs to `silent_returns.jsonl`. No footer, no noise.
- Contrarian round-2 output flows through compositor with higher priority flag (it is not filtered by cooldown since it is a second-round response to an already-injected skill).
Integration point: Output is the injected footer text. The compositor decision (inject / silent / skip) is the event that Layer 9 records.
---
Layer 6: Skill execution (the skill itself)
What it is: The actual skill logic. For Claude Code skills, this is the SKILL.md file and whatever tools/scripts it invokes. For SEA-side skills, this is `[home-path]` Python.
Current state: Skills run and produce text. No structured output contract exists. No skill declares whether it commutes with other skills, whether it is idempotent, or whether it can stay silent.
SOOP-2 changes:
- Type system (Section 2) adds structured contracts to every SKILL.md.
- Silent-capable skills (Section 5) return `{"silent": true, "reason": "..."}` when conditions are met.
- Contrarian (Section 4) is a new first-class entity, not an ad-hoc invocation of an existing skill.
- Effect tracking: skills declare `effects: [memory:write, prompt-log:read]` etc. The compositor and router use this to avoid scheduling conflicting effectors in the same window.
Integration point: Output goes to Layer 5 (compositor). Activation event goes to Layer 9 (feedback collector). Type signature is read by Layers 3 and 4.
---
Layer 7: Activation log (per-skill state)
What it is: Per-skill append-only event log. Each skill has its own `activation-log.jsonl`. The aggregate view (`[home-path]`) is the pattern learner's input.
Current state: `twin-opinions.jsonl`, `twin-feedback.jsonl`, `skill-executions.jsonl` exist as partial implementations. Not all skills write to these. Log rotation not implemented (audit notes "~50MB at scale" risk).
SOOP-2 changes (Track E.1):
- Standardize event schema (see Section 4 for the exact event format).
- All 213 skills write to `[home-path]` on invocation.
- Reaction signals (thumbs-up, thumbs-down, dismiss) write to `[home-path]` (M1 feedback collector).
- Log rotation: windowed append, daily files, 30-day retention, gzip archive.
Integration point: Feeds Layer 8 (pattern learner) and Layer 10 (KARL evolution worm reads skill-activation correlation).
---
Layer 8: Pattern learner + threshold calibrator
What it is: Offline nightly process that reads activation logs + reaction signals, updates per-skill hot/cold topic maps, and auto-tunes activation thresholds.
Current state: Not built. Hot/cold topic state.json exists as a concept but is not populated automatically.
SOOP-2 changes (Tracks E.2 + E.3):
- M2 pattern learner: nightly cron reads `reactions.jsonl` + `activations.jsonl`, updates `[home-path]` with hot/cold topics and session-recency weights.
- M3 threshold calibrator: reads M2 output, auto-tunes `activation_threshold` in each skill's config. Skills that get consistent thumbs-down on certain query types see their threshold rise (they fire less often). Skills that get thumbs-up see their threshold lowered.
- KARL integration (Track E.4): `evolution_worm.py` lens scores (residual lens, shipping lens, etc.) feed into M2 as skill-relevance signals. KARL knows what mode the work is in; M2 uses that to weight which skills are relevant.
Integration point: state.json output is read by Layer 3 (type compatibility weight in Tier 1 ranking uses hot/cold topic map). LSE reward loop (Layer 9) reads M3 calibration output.
---
Layer 9: KARL / LSE reward loop
What it is: KARL is the trajectory intelligence system (`[home-path]`). LSE is the Learning Signal Extractor. Together they close the loop: skill activations that correlate with positive session outcomes get higher reward; skill activations that correlate with nothing useful get lower reward.
Current state: KARL evolution worm runs (2,550 / 3,011 records). `lse_stages.py --invariance` provides should-not-fire rules. But the bridge from KARL lens scores to per-skill activation weights is not closed.
SOOP-2 changes (Track E.5):
- Wire `lse_stages.py --invariance` output into M3 (threshold calibrator). Invariance violations add hard-floor should-not-fire rules that override M3's statistical tuning.
- Wire `evolution_worm.py` 5-lens output into M2 (pattern learner) so skill relevance tracks session trajectory geometry.
- This closes the feedback loop from user reaction through pattern learning through threshold calibration through KARL trajectory scoring.
Integration point: Reward signal updates Layer 4 (twin fine-tuning, long cycle) and Layer 8 (M3 calibration, short cycle). Twin weights update at mac4:8100 on a longer cadence (weekly or on KARL milestone).
---
Layer 10: Cognitive twin (mac4:8100)
What it is: The cognitive twin is a fine-tuned model at `mac4:8100` that has been trained on 106K examples of Mohamed's session patterns. It replaces MiniMax as the Tier 2 scorer.
Current state: Running at mac4:8100. P50 latency measured against MiniMax baseline (3.8s). Migration from MiniMax to twin as primary is partial.
SOOP-2 changes (Track D):
- Twin becomes primary Tier 2 scorer (MiniMax = fallback).
- Twin input now includes the type signature of each candidate skill, not just text description.
- Twin fine-tuning cadence wired to KARL/LSE reward loop (Layer 9): as skill relevance signals accumulate, twin weights update to reflect them.
Integration point: Twin output (ranked skill shortlist + confidence scores) feeds Layer 5 (compositor). Twin feedback log feeds Layer 8 (M2 pattern learner). Circuit-breaker status feeds Layer 7 (activation log records degraded-mode events).
---
2. Type system as cross-cutting concern
The type system is not a Layer — it is a cross-cutting schema that every other layer reads. Without it, Tier 1 has no type compatibility signal, the compositor cannot detect conflicting effectors, and the linter cannot enforce correctness at authoring time.
The 7 frontmatter fields
Every SKILL.md gains these 7 fields in its YAML frontmatter block:
category: generator | transformer | reducer | distributor | effector | auditor
input_type: "∅ | M | M^n | M × State | State"
output_type: "M | M^n | M' | State' | Report | ∅"
effects: [] # list of: memory:write, memory:read, prompt-log:read, prompt-log:write, fs:write, fs:read, network:call, supabase:write
idempotent: true | false
commutes_with: [] # list of skill slugs that can safely fire in same window
silent_capable: false # true if skill can return {"silent": true}These fields are not documentation. They are machine-readable. Every downstream system reads them:
Tier 1 router (Layer 3): `type_compatibility_weight` uses `input_type` + `output_type` to score how well a candidate skill matches the current context shape. A Generator skill ranks higher when context has `∅` (open prompt, no prior artifact). A Reducer ranks higher when context has `M^n` (multiple prior outputs to consolidate).
Tier 2 scorer (Layer 4): Twin receives type signatures as structured input, not just text. This gives the twin a structured feature the text description alone does not capture.
Injection compositor (Layer 5): `commutes_with` controls whether two skills can fire in the same window. `effects` detects conflicts: two Effectors that both write to `supabase` cannot be scheduled together without ordering.
Linter (Track A.2): `[home-path]` validates all 7 fields exist, that `category` is one of the 6 valid values, that `input_type` and `output_type` use the defined vocabulary, and that `effects` entries are from the allowed class list. Runs in <3s on 226 files (213 Claude Code skills + 13 SEA skills).
Meta-review composer (Section 4): When meta-review assembles its reviewer panel, it queries for skills with `category: auditor` or `category: reducer`. The type system makes this query mechanical rather than name-based.
Type vocabulary (exact strings for the linter)
input_type / output_type vocabulary:
| Symbol | Meaning |
|---|---|
| `∅` | No input or output (Generator creates from nothing; silent Auditor returns nothing) |
| `M` | Single message or artifact |
| `M^n` | Multiple messages or artifacts (n ≥ 2) |
| `M'` | Transformed message (Transformer output, different from input) |
| `M × State` | Message plus current state (Effector input) |
| `State` | Pure state (Auditor reading environment) |
| `State'` | Modified state (Effector output) |
| `Report` | Structured diagnostic output (Auditor output) |
effects vocabulary:
memory:write — modifies [home-path] or agent-memory/ files
memory:read — reads memory files
prompt-log:read — reads session prompt logs
prompt-log:write — writes to prompt logs
fs:write — writes to filesystem outside memory
fs:read — reads from filesystem
network:call — makes HTTP / API calls
supabase:write — writes to Supabase tables
supabase:read — reads from Supabase
tmux:send — sends keystrokes to tmux pane
git:commit — commits to gitAn empty `effects: []` is valid for pure Generators and pure Transformers that operate only in the prompt context.
Idempotency and commutativity
`idempotent: true` means calling the skill twice in the same context produces the same result as calling it once. This is used by the pattern learner (M2) to avoid double-counting in feedback signals, and by the compositor to decide whether a skill can be re-injected after a cooldown refresh.
`commutes_with` is a whitelist. Two skills commute if swapping their order produces identical results. Non-commutative pairs (the default) are ordered by the compositor using the dependency graph embedded in their type signatures: `Generator → Transformer → Reducer` is the canonical order. Any chain that inverts this order is flagged by the linter as a composition warning, not an error (hybrid skills exist; see Section 3).
---
3. The 6 categories with examples
The 6 categories come from the operator algebra proposed by the CUHK + Google Labs papers and adapted to the 213-skill stack. Each category has a type signature, a semantic role, common composition partners, and notes on hybrid handling.
---
Category 1: Generator
Type signature: `∅ → M`
Definition: Produces new artifacts from nothing (or from a bare seed prompt). No required prior artifact. The input is the empty context or a minimal directive. The output is a first draft, raw idea, or initial structure.
Examples from the 213:
- `tie:gen` — generates a raw tie thread from a topic
- `creative:forge` — generates a new creative brief
- `evo-cubed` — generates multiple evolutionary branches from a problem seed
Common composition partners: Generators almost always feed Transformers (`M → M'`) or Reducers (`M^n → M`). The canonical pipeline is `Generator → Transformer → Reducer → Effector`.
Hybrid handling: A skill that is labeled `Generator` but also reads from memory (e.g., reads past session state to seed generation) has `effects: [memory:read]`. This is valid. The category captures the _output shape_, not the effect set. A skill that both reads and writes non-trivially (e.g., generates and immediately saves to Supabase) should be labeled its dominant role in `category` and list both effects.
Linter rule: If `category: generator` and `input_type` is not `∅`, the linter emits a warning (not an error). Some generators accept a seed `M` — they are `M → M'` which is technically a Transformer. If the author believes the skill's role is generative despite accepting a seed, keep `generator` and document the reason in a comment field.
---
Category 2: Transformer
Type signature: `M → M'`
Definition: Takes an existing artifact and produces a modified version. The input and output are both artifacts, but the output is different (refined, reformatted, restructured, translated).
Examples from the 213:
- `tie:ref` — refines a tie thread draft
- `pwr:morph` — restructures a power-law prompt for a different platform
- Any rewrite or editing skill
Common composition partners: Transformers chain with each other (`M → M' → M''`) or feed Reducers. Multiple Transformers in sequence is a valid pipeline, provided each step's output type matches the next step's input type.
Hybrid handling: A Transformer that also logs to memory (e.g., stores a "before" snapshot before transforming) has `effects: [memory:write]` with `category: transformer`. The compositor treats this as a Transformer for ordering purposes but flags the memory write when checking effect conflicts with other memory-writing skills in the same window.
Idempotency: Most Transformers are NOT idempotent. Applying a rewrite skill twice produces different results (the second run rewrites the already-rewritten artifact). Set `idempotent: false`.
---
Category 3: Reducer
Type signature: `M^n → M`
Definition: Consolidates multiple artifacts or viewpoints into one. Takes n inputs and produces 1 output. The reduction can be synthesis, summarization, voting, averaging, or principled selection.
Examples from the 213:
- `meta-review` — synthesizes n domain-reviewer opinions into one analysis
- `meta:amr` — multi-agent reducer, similar to meta-review but with explicit agent attribution
- `syn:fusion` — fuses multiple code or design proposals
Common composition partners: Reducers almost always come _after_ a Generator + multiple Transformer passes. The typical pipeline is a fan-out (Distributor) followed by a Reducer. The Reducer's output can feed an Effector.
Why meta-review is a Reducer: It is not an Auditor. meta-review consumes n reviewer opinions (`M^n`) and produces one synthesis (`M`). The contrarian (Section 4) is an Auditor (`State → Report`) that plugs into the meta-review pipeline as a required round-2 pass after the first Reducer output.
Hybrid handling: A Reducer that writes the synthesis to memory has `effects: [memory:write]`. Common and valid.
---
Category 4: Distributor
Type signature: `M → M^n`
Definition: Takes one artifact and fans it out into multiple parallel streams, agents, or viewpoints. The inverse of Reducer. Used to launch parallel review, parallel generation, or platform-specific adaptation passes.
Examples from the 213:
- `divergent-rail` — sends one plan to n parallel sub-agents for execution
- `tie:dist` — distributes a tie thread to n platforms with platform-specific adaptation
Common composition partners: Distributors almost always feed Transformers (n parallel transformations) which then feed a Reducer. The full fan-out/fan-in is `Distributor → n × Transformer → Reducer`.
Idempotency: Distributors are idempotent if the fan-out is deterministic (same seed → same n branches every time). Mark `idempotent: true` only if the distribution logic is deterministic.
Conflict rule: A Distributor plus an Effector in the same window is dangerous. The Distributor spawns n branches; the Effector fires on the _current_ state before branches complete. The compositor blocks this combination unless the Effector's `commutes_with` list explicitly includes the Distributor slug.
---
Category 5: Effector
Type signature: `M × State → State'`
Definition: Makes a side effect that changes the world outside the prompt context. Writes files, commits to git, posts to Supabase, sends keystrokes to tmux, makes network calls. The output is a new state, not a new artifact.
Examples from the 213:
- `pulse` — dispatches a task to a mesh agent (tmux send)
- `ops:deploy` — deploys to a service
- `ops:git` — commits and pushes git changes
Common composition partners: Effectors come last in a pipeline. They are the "ship it" step after Generator + Transformer + Reducer passes have produced a verified artifact. Running an Effector before the artifact is ready is an error.
Idempotency: Most Effectors are NOT idempotent. A git commit fires twice creates two commits. `ops:deploy` twice deploys twice. Mark `idempotent: false` unless the Effector explicitly checks "already deployed" state and no-ops.
Conflict rule: Two Effectors in the same injection window are allowed ONLY if their `effects` sets are disjoint. Two skills with `effects: [supabase:write]` cannot fire together. The compositor enforces this. If they must fire in sequence, they need separate injection windows (cooldown).
---
Category 6: Auditor
Type signature: `State → Report`
Definition: Reads the current state (session context, filesystem, logs, memory) and produces a diagnostic report. Does not modify state (pure read). Output is a structured report, not an artifact to be transformed further.
Examples from the 213:
- `cortex:audit` — audits session memory health
- `pae:status` — checks PAE system status
- `cortex:status` — checks cortex state
- `meta:adversarial` / contrarian — audits a prior synthesis and argues against it
Why the contrarian is an Auditor: The contrarian takes the current state (a synthesis produced by meta-review's first pass) and produces a Report (the dissent). It does not transform the synthesis. It does not produce a new synthesis. It produces a structured critique that feeds back into the meta-review pipeline as round-2 input for the domain reviewers.
Silent capability: Auditors are the primary candidates for `silent_capable: true`. An Auditor that finds nothing to report should return `{"silent": true}`. Emitting a report that says "everything is fine" is noise. Section 5 covers the full silent path.
Idempotency: Auditors _should_ be idempotent. Running `cortex:audit` twice on unchanged state should produce the same report. Mark `idempotent: true` for Auditors that read-only and are deterministic.
---
4. Meta-review contrarian as a first-class SEA entity
The contrarian is the fix for the meta-review fixpoint problem.
The fixpoint problem
Today: meta-review assembles 6 parallel domain reviewers, collects their opinions, and runs a synthesizer Reducer. If you invoke `meta-review` a second time on the same artifact, you get approximately the same synthesis. `meta-review ∘ meta-review` is degenerately idempotent: the synthesizer re-averages the same 6 opinions and produces no new information. The composition is not a meaningful fixpoint.
The fix: add a 7th seat — the contrarian. The contrarian reads the first-pass synthesis and argues it is wrong. The 6 reviewers respond in round 2. The second synthesis incorporates that dissent. Now `meta-review(meta-review(x))` converges only when the contrarian has nothing new to say. That is a meaningful fixpoint.
The contrarian pipeline
1. meta-review fires (Reducer, round 1)
- 6 domain reviewers run in parallel
- Synthesizer produces first_synthesis
2. Contrarian fires (Auditor, round 2)
- Input: first_synthesis + original artifact
- Task: find ONE strong argument that the synthesis is wrong or incomplete
- Output: {"dissent": "...", "severity": "high|medium|low"} OR {"silent": true}
3. If contrarian returns {"silent": true}:
- STOP. first_synthesis is the final output.
- This is the ∅-stop condition.
4. If contrarian returns a dissent:
- 6 domain reviewers receive first_synthesis + dissent as context
- Round-2 reviewer responses address the contrarian's argument
- Synthesizer produces second_synthesis
5. second_synthesis is the final output.
- The round-2 contrarian pass is NOT repeated (one adversarial round, not infinite loop).Why the ∅-stop condition is meaningful
The contrarian can stay silent only if it cannot find a genuine new failure mode. This is a stricter condition than "the synthesis looks OK." The contrarian is explicitly prompted with an adversarial mandate: find a reason this is wrong. If it cannot, that silence is informative. The pipeline terminates knowing that an adversarial pass found nothing.
A contrarian that always finds something is a broken contrarian (it has no stopping criterion). A contrarian that always stays silent is a broken contrarian (it is not actually adversarial). The `severity` field helps tune this: if the contrarian returns only `low` severity dissents on a second invocation, the meta-review composer can treat that as ∅ for practical purposes.
Contrarian as first-class SEA entity
The contrarian is NOT an invocation of `meta:adversarial`. It is its own skill.
SKILL.md registration:
slug: meta:contrarian
category: auditor
input_type: "M × State" (M = first_synthesis; State = original artifact)
output_type: "Report | ∅"
effects: []
idempotent: false
commutes_with: []
silent_capable: trueIt lives at `[home-path]` (its own file, not a section of meta-review's SKILL.md). It is independently invokable for other use cases where an adversarial pass is needed outside of meta-review.
state.json schema (per skill)
Each skill that has SEA wiring maintains a `state.json` at `[home-path]`.
For the contrarian, the state.json tracks:
{
"slug": "meta:contrarian",
"version": 1,
"created": "2026-05-12T00:00:00Z",
"last_updated": "2026-05-12T00:00:00Z",
"activation_threshold": 0.65,
"hot_topics": [],
"cold_topics": [],
"session_count": 0,
"silent_rate": 0.0,
"dissent_severity_distribution": {
"high": 0,
"medium": 0,
"low": 0
},
"notes": "Track silent_rate: if >0.8, threshold needs lowering (contrarian is too shy). If <0.05, threshold needs raising (contrarian cries wolf)."
}For review-class skills (meta-review, meta:amr, syn:fusion), the `state.json` adds a `contrarian_wired` boolean that the compositor checks before injecting the round-2 contrarian. If `contrarian_wired: false`, the skill runs in legacy mode (no round-2 adversarial pass).
activation-log.jsonl event format
Every SEA skill writes to the aggregate activation log at `[home-path]`. Each line is one JSON event:
{
"ts": "2026-05-12T04:00:00Z",
"session_id": "87a17b2c",
"slug": "meta:contrarian",
"category": "auditor",
"trigger": "meta-review round-2",
"input_shape": "M × State",
"output": "dissent | silent",
"silent": false,
"dissent_severity": "medium",
"injection_fired": true,
"reaction": null,
"twin_confidence": 0.82,
"twin_degraded": false
}The `reaction` field is null at write time and updated by M1 (feedback collector) when the user thumbs-up or thumbs-down the injection. The `twin_degraded` flag marks events where MiniMax fallback was used instead of the cognitive twin.
---
5. Silent return type
Why silent is not mute
Mute already exists: a skill can be configured to not inject into the session footer at all. Mute is a configuration flag set by the user or the compositor. It means "never inject this skill."
Silent is a runtime decision made by the skill itself at invocation time. A `silent_capable` skill evaluates the current context and decides: "I have nothing worth saying." It returns `{"silent": true}` and the compositor swallows the injection. The skill still _ran_. The decision to not inject is the skill's own judgment, not a blanket config.
The Google Labs paper (arXiv 2605.06717) names this O2: "stay-silent as an explicit action." Every skill producing output today is an O2 failure. Becoming Level-3 proactive requires skills that can decide to do nothing.
Frontmatter schema for silent
The `silent_capable` field in frontmatter is a boolean that signals to the compositor that this skill may return `{"silent": true}`. The compositor does not filter based on this field alone — it is informational for the router and useful for the linter to warn if a skill is marked `silent_capable: false` but returns silent in practice (a type lie).
The full silent return object:
{
"silent": true,
"reason": "No anomalies detected in session memory. State is clean.",
"slug": "cortex:audit",
"ts": "2026-05-12T04:00:00Z"
}The `reason` field is required. It is logged to `[home-path]` for pattern analysis. If a skill is silent 95
Router treatment
When Tier 1 retrieves a skill with `silent_capable: true`, it scores it the same as any other skill. Silent capability does not penalize or boost ranking. The router cannot predict whether the skill will be silent at runtime (that depends on state that the router does not have access to).
Tier 2 (twin scorer) similarly scores the skill on relevance, not on its silent history. However, if the twin's training data includes many sessions where `cortex:audit` returned silent and was marked neutral (no thumbs reaction), the twin will learn to score it lower in contexts where it is likely to be silent. This is an emergent behavior of the feedback loop, not a hard rule.
The 5 skills with silent paths (Track G.2)
These are the first 5 skills typed as `silent_capable: true` in SOOP-2:
| Skill slug | Category | Why it gets silent |
|---|---|---|
| `cortex:audit` | Auditor | Should stay silent if session memory is clean and no anomalies detected |
| `cortex:status` | Auditor | Should stay silent if cortex is healthy and no degradation signals |
| `cortex:watch` | Auditor | Should stay silent if no new watch-trigger events in the current window |
| `pae:status` | Auditor | Should stay silent if PAE system is fully operational with no alerts |
| `pae:xpoll` | Auditor | Should stay silent if cross-poll finds no new signals since last activation |
All 5 are Auditors. This is the expected pattern: Auditors that monitor health should not inject a "health report" when health is good. The injection is signal; silence is the expected steady state.
Other skills may gain `silent_capable: true` in future passes (especially in the contrarian category, as described in Section 4). The 5 listed above are the SOOP-2 gate condition for acceptance criterion 8.
End-to-end silent flow
Skill invokes → evaluates state → returns {"silent": true, "reason": "..."}
↓
twin_inject.py Stop hook receives output
↓
Checks: output["silent"] == True?
↓
Yes → log to silent_returns.jsonl → emit NOTHING to session footer
↓
Activation log records: {"output": "silent", "injection_fired": false}
↓
M1 feedback collector marks: no reaction expected (silent events are not reaction-soliciting)The key implementation point: the Stop hook (`twin_inject.py`) must explicitly handle `{"silent": true}` before it tries to format and inject the output. The current hook assumes all output is injectable text. Track G.3 patches this.
---
6. Closed feedback loop topology
The feedback loop is the hardest architectural challenge in SOOP-2. It connects user reaction (a behavioral signal) through software layers back to model weights (the twin). The full loop is M1 → M2 → M3 → KARL → twin, spanning 5 distinct processes.
ASCII diagram
USER REACTION
(thumbs up/down/dismiss on injected footer)
|
v
M1: Feedback Collector
[home-path]
(event: {ts, slug, reaction, session_id, context_hash})
|
|------ also reads ------> KARL evolution_worm.py
| (5 lens scores per session window)
v
M2: Pattern Learner (nightly cron)
Reads: reactions.jsonl + activations.jsonl + KARL lens scores
Writes: [home-path]
(hot_topics, cold_topics, session_recency_weight updated per skill)
|
v
M3: Threshold Calibrator
Reads: state.json from M2
Reads: lse_stages.py --invariance output (hard should-not-fire rules)
Writes: per-skill activation_threshold in config
(thumbs-down pattern on topic X → threshold rises for skill S on topic X)
|
|
+------+-------+
| |
v v
TIER 1 ROUTER KARL LSE REWARD LOOP
(reads hot/cold lse_stages.py --reward
topic weights feeds skill-activation
for re-ranking) correlation into
twin retraining queue
|
v
TWIN WEIGHTS UPDATE
mac4:8100 retraining
(weekly cadence or
on KARL milestone)
|
v
IMPROVED TIER 2 SCORER
(twin scores skills better
because it has learned which
activations led to positive
user reactions in this topology)Loop timescales
The loop operates at three distinct timescales:
Short cycle (session): M1 captures reactions in real time. Activation log is written per invocation. No delay.
Medium cycle (nightly): M2 and M3 run nightly. Threshold changes take effect the next session after the nightly cron completes. Changes are visible within 24 hours.
Long cycle (weekly / KARL milestone): Twin weight updates happen when KARL accumulates enough LSE reward signal to justify a retrain. This is not on a fixed schedule — it is triggered by a milestone condition in `lse_stages.py`. Twin improvements take effect when mac4:8100 loads the updated weights.
What closes the loop
The loop was "designed but not closed" per the audit. The specific gaps:
1. `lse_stages.py --invariance` exists but its output is not read by M3. Track E.5 closes this.
2. `evolution_worm.py` lens scores exist but are not fed to M2. Track E.4 closes this.
3. Reactions.jsonl does not exist yet (M1 is not built). Track E.1 creates it.
4. M2 pattern learner is not built. Track E.2 creates it.
5. M3 threshold calibrator is not built. Track E.3 creates it.
All 5 gaps close in Phase 4 of the Rail plan (after typing foundation and router rebuild are stable).
---
7. Mesh assignment
Mac3 is deprecated. All tasks previously assigned to Mac3 in SEA docs are reassigned.
Mac3 deprecation status
Mac3 still appears in these files:
- `Desktop/skill-entity-architecture/CREATIVE_EVOLUTION_SEA_v1.md` (§2, STEP 6)
- `Desktop/skill-entity-architecture/phase0-mac3-availability.md`
- `Desktop/skill-entity-architecture/mac3-worker-config/`
- `Desktop/skill-entity-architecture/MIGRATION-GUIDE.md`
Track H (Rail plan) purges all of these. After SOOP-2, `grep -ri "mac3"` on the SEA repo returns zero results.
Component mesh assignments (post-SOOP-2)
| Component | Mesh Node | Notes |
|---|---|---|
| Tier 1 router (sea_skill_injector.py) | mac1 (clawdbot host) | Runs as Discord bot process; local to mac1 |
| Tier 2 scorer (twin client) | mac4:8100 | Cognitive twin, already running |
| MiniMax fallback scorer | localhost:18080 | Runs on same machine as Tier 1 (mac1 or developer machine) |
| Activation log writer | mac1 (hooks) | Stop hook fires on claude session host |
| M1 Feedback collector | mac1 | Part of clawdbot pipeline |
| M2 Pattern learner (nightly cron) | mac1 | Cron on mac1 processes the logs |
| M3 Threshold calibrator | mac1 | Same cron as M2 |
| KARL evolution worm | mac4 | `[home-path]` |
| LSE reward loop | mac4 | `[home-path]` |
| Silent return log | mac1 | Written by Stop hook |
| Contrarian agent | mac1 (hooks) | Invoked as part of meta-review pipeline, runs on session host |
| Linter (skill-typecheck) | local / mac1 | Runs in pre-commit hook, also in CI |
Previously Mac3-assigned tasks
The SEA v1 design assigned creative content generation and serenity/ambient processing to Mac3. Per user instruction (recorded in the operator theory memory file: "Mac3 dropped from execution plans this session per user instruction"), these reassign as follows:
| Former Mac3 role | New assignment |
|---|---|
| Creative content generation worker | Not a SEA routing role; runs on session host as standard skill invocation |
| Serenity/ambient processing | Deprecated with Mac3 |
| Phase 0 Mac3 availability check | Deleted (Track H.2) |
| Mac3 worker config | Deleted (Track H.2) |
---
8. EW invariant alignment
SOOP-2 must not break any of the 4 EW invariants. This section maps each invariant to the specific SOOP-2 design decisions that preserve it, and flags one area of caution.
Invariant 1: No-Absorbing-States
Statement: The system must not enter a state from which it cannot recover. No dead-end state, no irrecoverable failure mode.
How SOOP-2 preserves it:
- Tier 2 twin circuit breaker: if twin is down >60s, the system falls back to MiniMax. The fallback is explicit, logged, and automatic. The system does not halt waiting for the twin.
- Silent return path: if a skill returns `{"silent": true}`, the system continues normally. Silent is not an error state.
- Contrarian ∅-stop: if the contrarian finds nothing, meta-review terminates cleanly. There is no infinite adversarial loop.
- M2/M3 cron failure: if the nightly cron fails, thresholds hold at their last calibrated value. The system degrades gracefully to yesterday's calibration, not to a broken state.
- Feedback loop M1 failure: if reactions.jsonl cannot be written (disk full, permission error), the Stop hook logs the error and exits cleanly. The main session is not affected.
Caution: The twin weight update (long cycle) is a manual process triggered by KARL milestone. If the twin diverges significantly from actual behavior patterns and no milestone fires, the Tier 2 scorer degrades slowly. This is not an absorbing state, but it is a slow-drift risk. Mitigation: M3 threshold calibrator operates independently of twin weights and can compensate for drift in the medium cycle.
Invariant 2: Memory Guardian
Statement: Protected files cannot be shrunk below their minimums. `memory/active-tasks.md`, `SOUL.md`, `AGENTS.md`, `MEMORY.md` are additive-only.
How SOOP-2 preserves it:
- No SOOP-2 component writes to any protected file.
- The linter writes to `[home-path]` (new directory).
- The typing pass writes to SKILL.md frontmatter (not protected files).
- The audit output file `skills-typing-audit-2026-05-25.md` writes to `[home-path]` (project memory, not the protected workspace memory files).
- Mac3 purge (Track H) deletes files in `Desktop/skill-entity-architecture/` (not protected directories).
No conflict. SOOP-2 is additive on the protected files front.
Invariant 3: Voice rules (no em dashes, no AI-isms)
Statement: All outreach, emails, messages, and copy must follow Mo's voice: no em dashes, no "leverage", "delve", "craft", "seamless", "excited to share", "thrilled", etc.
How SOOP-2 preserves it:
- SOOP-2 is infrastructure. It does not produce user-facing copy.
- The contrarian's dissent text is internal to the meta-review pipeline, not outreach.
- Skill descriptions in SKILL.md are technical documentation, not marketing copy.
- This ARCHITECTURE.md document avoids em dashes and AI-isms throughout (verified during authoring).
Note: If any SKILL.md description is found to use AI-isms during the typing pass (Track B), the linter should flag it. This is currently not a linter rule, but it is worth adding as an optional check in Track A.2.
Invariant 4: 200K context ceiling
Statement: Hard context budget per session is 200K tokens. Save-then-forget protocol. Subagents for heavy lifting.
How SOOP-2 preserves it:
- Mass typing (Track B) explicitly uses subagent-per-batch policy. 213 SKILL.md hand-edits in one session would exhaust context.
- The Rail plan notes "Compaction risk: High" for Phase 2 and mandates the subagent approach.
- Bench runs (Track C) produce small output (a precision/recall number). They do not dump 213 embeddings into context.
- The feedback loop cron (M2/M3) runs offline as a background process. It does not run in Claude session context.
- This ARCHITECTURE.md document is written to the filesystem (not held in context) so it can be referenced without re-loading.
No conflict. SOOP-2's high-context-risk operations are explicitly scheduled as subagent or offline processes.
---
9. Anti-patterns to avoid
These are explicit forbids. Each is derived from a real failure mode observed in the SEA audit or the operator algebra design review.
---
Forbid 1: Untyped SKILL.md files
Do not ship a SKILL.md without the 6 type fields. An untyped skill is invisible to the type compatibility weight in Tier 1, invisible to the twin's structured input, and invisible to the compositor's conflict detection.
Untyped skills are not "lower priority" skills. They are broken skills from the router's perspective. The linter exits non-zero on them. The mass typing pass (Track B) exists precisely to eliminate this state.
What counts as a violation: Any SKILL.md missing one or more of: `category`, `input_type`, `output_type`, `effects`, `idempotent`, `commutes_with`, `silent_capable`.
---
Forbid 2: Generator + Effector hybrid without primary category
A skill that both generates content AND fires a side effect (write to git, post to Supabase) is a hybrid. Hybrids are allowed. But the SKILL.md must declare a `category` that is the dominant role, and the other role must be declared in `effects`.
Do not label a hybrid as `category: generator` with no mention of the effector behavior. The compositor will treat it as a pure Generator and may schedule it alongside another Effector that conflicts. The `effects` list is the disclosure mechanism.
Correct pattern:
category: generator
effects: [supabase:write]This tells the compositor: "This is a Generator that also writes to Supabase. Check effect conflicts before scheduling alongside other supabase-writing skills."
---
Forbid 3: meta-review without contrarian after SOOP-2 ships
After Track F completes, `meta-review` has a round-2 contrarian pass built in. Do not bypass it by invoking `meta:adversarial` manually as a workaround. The structured pipeline (Reducer → Auditor → Reducer) is the correct form.
If a specific invocation needs to skip the contrarian (e.g., time-sensitive, user explicitly waives round-2), the SKILL.md should expose a `--no-contrarian` flag, and that flag must be used explicitly. Bypassing the contrarian silently is a type lie.
---
Forbid 4: Mac3 in any SEA config or doc after Track H ships
Zero Mac3 references. If you are writing new SEA documentation and type "mac3", stop. Mac3 is gone. Use mac4 (cognitive twin), mac1 (clawdbot host), or mac5 (if applicable to the specific task).
The Track H grep (H.5) confirms zero residual refs before SOOP-2 ships. Any future addition of Mac3 references is a regression.
---
Forbid 5: Feedback loop short-circuit (skipping M2/M3 and directly editing thresholds)
The threshold for each skill should only change through the M3 calibrator reading M2's pattern-learner output. Do not manually edit `activation_threshold` in `state.json` to "quick-fix" a skill that is firing too often.
Manual edits are overwritten by the nightly cron. They also bypass the invariance rules from `lse_stages.py`. If a skill is firing too often, the correct fix is to add a negative reaction signal to reactions.jsonl (M1) and let M2/M3 handle the calibration on the next cron run.
Exception: during SOOP-2 development, before M2/M3 exist, manual threshold edits are allowed as temporary scaffolding. Mark them with a comment: `# TEMP: remove after E.3 ships`.
---
Forbid 6: Injecting contrarian output without a prior Reducer output
The contrarian is round-2. It has no valid input if meta-review's round-1 Reducer has not produced `first_synthesis`. Do not invoke `meta:contrarian` as a standalone first-pass skill. Its `input_type: "M × State"` requires M (the synthesis) to exist.
The compositor enforces this via dependency ordering. But if someone invokes `meta:contrarian` directly via slash command without a prior meta-review result, the skill must detect the missing input and either error clearly or request the missing synthesis before proceeding.
---
Forbid 7: Silent skills that are always silent (threshold too high)
A skill marked `silent_capable: true` that returns silent >80
M3 detects this pattern: if `silent_rate > 0.8` for any skill, it raises the threshold so the skill fires only in contexts where it is more likely to have something to say. Do not keep a skill with `silent_rate: 0.95` deployed at a low threshold — it wastes Tier 2 scoring cycles.
---
10. Architecture decisions log
| # | Decision | Rationale |
|---|---|---|
| 1 | Type system = 6 frontmatter fields, not a separate schema file | Frontmatter fields are co-located with the skill, version-controlled with the skill, and readable by any tool without a separate schema lookup. |
| 2 | Contrarian is its own SKILL.md, not a mode of meta-review | First-class SEA entity status means the contrarian has its own type signature, state.json, activation log, and can be invoked independently. A mode flag inside meta-review would hide it from the router and the feedback loop. |
| 3 | Silent return is a runtime decision, not a config flag | Mute (config) is a blanket suppression. Silent (runtime) is the skill's own judgment about the current context. They solve different problems. The Google O2 criterion requires runtime silent, not config mute. |
| 4 | Twin replaces MiniMax as Tier 2 primary, MiniMax as fallback | Twin is trained on Mohamed's patterns and produces better relevance scores. MiniMax is a safety net. Degraded mode (MiniMax only) is not the target operating mode. |
| 5 | Tier 1 recall target = 80 | |
| 6 | Mac3 purged, not archived | Mac3 never had live workers in SEA. The references are dead documentation creating confusion. Archiving them preserves the confusion. Deletion is correct. |
| 7 | Feedback loop M2/M3 runs nightly, not real-time | Real-time threshold calibration would create unstable behavior (a skill fires, gets a thumbs-down, its threshold rises immediately, affecting the next invocation in the same session). Nightly cadence gives a stable session experience. |
| 8 | ∅-stop condition is one adversarial round, not recursive | Recursive adversarial passes (contrarian responds to second synthesis, which triggers a third pass, etc.) would create unpredictable latency and divergence risk. One round of adversarial plus one round-2 synthesis is the correct scope. |
| 9 | type_compatibility_weight is tunable (default α=0.7, β=0.3) | Different skill domains have different type-sensitivity. Creative skills benefit from type weighting; ops skills are better distinguished by text similarity alone. A global tunable knob lets the router be calibrated per domain without code changes. |
| 10 | Linter runs in <3s on 226 files | Pre-commit hooks that run >3s get disabled by developers. The 3s constraint is not a performance aspiration; it is the condition under which the linter will actually be used. Track A.4 includes a timing test. |
| 11 | SEA covers Claude Code skills (213), not just Clawdbot skills (13) | The audit found the surface mismatch: SEA was routing Discord/Clawdbot injections, while Claude Code skills routed via slash-command listing. SOOP-2 extends SEA to both surfaces. The 213 Claude Code skills are the higher-priority surface. |
| 12 | Activation log uses JSONL, not a database | JSONL is appendable, grepable, and readable without a running database. At scale (50MB risk), windowed rotation is simpler than managing schema migrations. If query needs grow, parquet conversion of JSONL is a one-line script. |
---
Appendix: File and path index
Key paths introduced or modified by SOOP-2:
[home-path] — 213 Claude Code skills (typing pass target)
[home-path] — NEW: linter
[home-path] — NEW: linter CLI entry point
[home-path] — NEW: inventory + category map
[home-path] — NEW: meta:contrarian skill file
[home-path] — SEA production code
[home-path] — modified: type_compatibility_weight
[home-path] — modified: α, β knobs
[home-path] — NEW: labeled recall benchmark
[home-path] — pattern: per-skill state
[home-path] — activation event log
[home-path] — NEW: M1 reaction signals
[home-path] — NEW: silent event log
[home-path] — NEW: circuit breaker events
[home-path] — modified: silent return handling
Desktop/skill-entity-architecture/CREATIVE_EVOLUTION_SEA_v1.md — modified: Mac3→Mac4
Desktop/skill-entity-architecture/phase0-mac3-availability.md — DELETED
Desktop/skill-entity-architecture/mac3-worker-config/ — DELETED
Desktop/skill-entity-architecture/MIGRATION-GUIDE.md — modified: Mac3 refs
[home-path] — NEW: final audit---
Summary: what SOOP-2 is and is not
SOOP-2 is:
- A type system applied to all 213 skills, making composition mechanical instead of ad-hoc
- A recall improvement on Tier 1 routing (60
- A twin promotion (MiniMax → twin as primary Tier 2 scorer)
- A closed feedback loop (user reactions → pattern learner → threshold calibration → KARL/LSE → twin)
- A contrarian primitive inside meta-review with a meaningful ∅-stop condition
- A silent return type for Auditors that have nothing to say
- A Mac3 purge
SOOP-2 is not:
- A rewrite of SEA architecture (the two-tier routing, injection compositor, and skill state documents remain as designed)
- A new model training run (twin weight updates happen as a downstream effect of the feedback loop, not as an SOOP-2 deliverable)
- A UI or dashboard (Phase 4.1 dashboard is outside SOOP-2 scope)
- A multi-surface unification of Clawdbot + Claude Code routing (SOOP-2 extends SEA to Claude Code skills but does not merge the surfaces into one pipeline)
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
crucible-output/soop-2/02-architecture/ARCHITECTURE.md
Detected Structure
Method · Evaluation · References · Figures · Code Anchors · Architecture · is Stage Research