Grand Diomande Research · Full HTML Reader

Stage 1, Path C: THE LIVING DOCUMENT

What if the Cognitive Twin requires zero training? What if instead of fine-tuning a model to be Mo, we assemble a system prompt so comprehensive that ANY model becomes Mo for the duration of the conversation?

Agents That Account for Themselves research note experiment writeup candidate score 18 .md

Full Public Reader

# Stage 1, Path C: THE LIVING DOCUMENT
## Zero-Training Twin via Dynamically-Assembled System Prompt
Evolution: Evo-Cubed | Stage 1C of 3
Date: 2026-03-07
Builds on: Stage 0 Research (RAG = 9x value, existing KB + graph infrastructure)

---

The Thesis

What if the Cognitive Twin requires zero training? What if instead of fine-tuning a model to be Mo, we assemble a system prompt so comprehensive that ANY model becomes Mo for the duration of the conversation?

The benchmark already proves this partially: Qwen3-235B-A22B with RAG context hits 93.6

Path C closes the personality gap through feedback-driven personality accumulation — a system that learns Mo's style from corrections rather than training data.

---

The Zero-Training Architecture

How It Works

User Query
  │
  ├── 1. PERSONALITY LAYER (static + accumulated)
  │     ├── Core identity document (500 words, hand-written once)
  │     ├── Communication style rules (concise, direct, no filler)
  │     ├── 20 few-shot examples of Mo's responses
  │     └── Accumulated corrections ("Mo would say X not Y")
  │
  ├── 2. KNOWLEDGE LAYER (dynamic, per-query)
  │     ├── RAG retrieval (top 10 KB entries, temporal-weighted)
  │     ├── Graph traversal (multi-hop for relational queries)
  │     └── Real-time state (active tasks, machine status, recent commits)
  │
  ├── 3. CONTEXT LAYER (session-specific)
  │     ├── Current conversation history
  │     ├── Active project context (from CLAUDE.md)
  │     └── Time-of-day behavioral adjustments
  │
  └── 4. MODEL LAYER (any model, no LoRA)
        └── Send assembled prompt to whichever model is available
            ├── Qwen3-235B on Together AI (primary)
            ├── Qwen3.5-35B on Mac4/exo (local fallback)
            ├── Claude Opus (escalation)
            └── Gemini 2.5 Pro (alternative)

### The Key Insight: Model Portability
Fine-tuned models lock you to one architecture. A dynamically-assembled prompt works with ANY model. When Qwen4 drops, the twin migrates instantly — no retraining. When Claude gets cheaper, switch. When a local model gets fast enough, run locally. The twin's identity lives in the prompt, not the weights.

---

The Personality Accumulation System

How Style Is Captured Without Training

Step 1: Seed Document
Write a 500-word "Mo Identity Document" capturing:
- How Mo structures arguments (claim → evidence → action)
- Vocabulary patterns (technical precision, no hedging, specific examples)
- Response length preferences (short for simple questions, detailed for architecture)
- Cultural references and humor style
- Decision-making patterns (bias toward action, "ship then iterate")

Step 2: Correction Loop
When the twin responds in a way that doesn't sound like Mo:

User: "That doesn't sound like me. I would say it more like: [correction]"
Twin: "Noted. Updating personality model."

→ System stores: {
    "trigger": "user asked about project prioritization",
    "twin_said": "I believe we should carefully evaluate the options...",
    "mo_said": "Ship the one that's closest to done. Evaluate after.",
    "rule_extracted": "Mo prefers action over deliberation. Short, decisive answers for prioritization."
  }

Step 3: Accumulation
Over time, the correction database grows. Before each query, the system retrieves the 5 most relevant corrections and injects them as "style examples" in the prompt. The twin gets more accurate with every correction — without ever training a model.

### The Math
- Fine-tuning requires ~18,000 examples and produces a fixed-point-in-time model
- Personality accumulation requires ~200 corrections and produces an ever-improving dynamic prompt
- At 5 corrections per day of active use, 200 corrections = 40 days
- But the system is useful from correction #1 — each correction immediately improves the next response

---

Implementation: One Week to Live Twin

### Day 1: Identity Document
- Write the 500-word Mo Identity Document
- Extract 20 diverse few-shot examples from existing conversations
- Define the communication style ruleset

### Day 2: Prompt Assembly Engine
Build a Python module that:
1. Accepts a user query
2. Retrieves relevant KB entries (existing RAG)
3. Traverses graph for relational context (existing Graph Kernel)
4. Retrieves relevant personality corrections (new, simple DB)
5. Assembles the complete system prompt
6. Sends to the configured model

python
class LivingTwin:
    def __init__(self):
        self.identity = load_identity_doc()
        self.rag = RAGClient()
        self.graph = GraphClient()
        self.corrections = CorrectionDB()

    def respond(self, query: str, model: str = "together/qwen3-235b"):
        knowledge = self.rag.retrieve(query, top_k=10)
        graph_context = self.graph.traverse(query, max_hops=2)
        style_corrections = self.corrections.relevant(query, top_k=5)
        state = get_realtime_state()

        system_prompt = assemble_prompt(
            identity=self.identity,
            knowledge=knowledge,
            graph=graph_context,
            corrections=style_corrections,
            state=state
        )

        return call_model(model, system_prompt, query)

### Day 3: Correction Capture Interface
- Add a `/twin-correct` slash command to Clawdbot
- When Mo says "I wouldn't say it like that", capture the correction
- Store in Supabase `twin_corrections` table
- Auto-extract style rules using Gemini Flash (cheap, fast)

### Day 4: Integration
- Wire into Clawdbot as a model option
- Add model fallback chain: Together AI → local exo → Claude escalation
- Test with the 39-question eval suite

### Day 5: Deploy + Iterate
- Route 10
- Activate correction loop
- Monitor accuracy and personality fidelity

---

Multi-Model Flexibility

### The Zero-Lock-In Advantage | Scenario | Fine-Tuned Twin | Living Document Twin | |----------|----------------|---------------------| | New model released | Retrain (days-weeks) | Switch model param (seconds) | | Provider goes down | Stuck or rebuild | Route to any alternative | | Better local model | Download + retrain | Download + use | | Want to A/B test models | Train both ($$) | Change one config param | | Price increase | Locked in | Switch to cheapest option |

Model Quality Ladder

Query arrives → Check model availability:
  1. Together AI Qwen3-235B (free, best quality) → USE
  2. Local exo cluster Qwen3.5-35B (free, good quality) → USE
  3. Mac4 Ollama Gemma3-4B (free, basic quality) → USE for simple queries
  4. Claude Opus (expensive, highest quality) → ESCALATE complex queries

The Living Document twin doesn't care which model runs underneath. The identity is in the prompt.

---

Cost Analysis

ComponentMonthly Cost
Together AI Qwen3-235B inference$0 (free tier)
Correction storage (Supabase)$0 (existing plan)
Identity document maintenance$0 (one-time + occasional updates)
Prompt assembly compute (Mac1)$0 (local)
Total$0/mo

Not "$0-15/mo." Literally zero dollars. The most expensive component — the model — is free on Together AI's serverless tier.

---

Risk Assessment

RiskSeverityMitigation
Personality fidelity never reaches fine-tuned qualityMedium200+ corrections should close 90
System prompt too long for smaller modelsLowQwen3-235B has 262K context; even 35B has 32K
Correction loop requires active Mo participationMediumOnly needed during first 40 days, then plateaus
Model API changes break prompt formatLowAbstract model calls behind unified interface
"Uncanny valley" — close but not quite MoMediumAccept 90

---

Why This Path Wins

Fine-tuning captures who Mo was. The Living Document captures who Mo is — right now, today, with corrections applied. It ships in one week, costs nothing, works with any model, and gets better with use instead of worse with time.

The question isn't "how do we train a model to be Mo?" The question is "how do we give any model enough context to be Mo?" And the answer is: assemble the right prompt.

---

Stage 1C of 3 — EXPLORE | Path C: The Living Document | ~1,300 words

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

evo-cube-output/cognitive-twin-v9/stage1-path-c.md

Detected Structure

Method · Evaluation · References · Math · Figures · Architecture · is Stage Research