Stage 1, Path C: THE LIVING DOCUMENT
What if the Cognitive Twin requires zero training? What if instead of fine-tuning a model to be Mo, we assemble a system prompt so comprehensive that ANY model becomes Mo for the duration of the conversation?
Full Public Reader
# Stage 1, Path C: THE LIVING DOCUMENT
## Zero-Training Twin via Dynamically-Assembled System Prompt
Evolution: Evo-Cubed | Stage 1C of 3
Date: 2026-03-07
Builds on: Stage 0 Research (RAG = 9x value, existing KB + graph infrastructure)
---
The Thesis
What if the Cognitive Twin requires zero training? What if instead of fine-tuning a model to be Mo, we assemble a system prompt so comprehensive that ANY model becomes Mo for the duration of the conversation?
The benchmark already proves this partially: Qwen3-235B-A22B with RAG context hits 93.6
Path C closes the personality gap through feedback-driven personality accumulation — a system that learns Mo's style from corrections rather than training data.
---
The Zero-Training Architecture
How It Works
User Query
│
├── 1. PERSONALITY LAYER (static + accumulated)
│ ├── Core identity document (500 words, hand-written once)
│ ├── Communication style rules (concise, direct, no filler)
│ ├── 20 few-shot examples of Mo's responses
│ └── Accumulated corrections ("Mo would say X not Y")
│
├── 2. KNOWLEDGE LAYER (dynamic, per-query)
│ ├── RAG retrieval (top 10 KB entries, temporal-weighted)
│ ├── Graph traversal (multi-hop for relational queries)
│ └── Real-time state (active tasks, machine status, recent commits)
│
├── 3. CONTEXT LAYER (session-specific)
│ ├── Current conversation history
│ ├── Active project context (from CLAUDE.md)
│ └── Time-of-day behavioral adjustments
│
└── 4. MODEL LAYER (any model, no LoRA)
└── Send assembled prompt to whichever model is available
├── Qwen3-235B on Together AI (primary)
├── Qwen3.5-35B on Mac4/exo (local fallback)
├── Claude Opus (escalation)
└── Gemini 2.5 Pro (alternative)### The Key Insight: Model Portability
Fine-tuned models lock you to one architecture. A dynamically-assembled prompt works with ANY model. When Qwen4 drops, the twin migrates instantly — no retraining. When Claude gets cheaper, switch. When a local model gets fast enough, run locally. The twin's identity lives in the prompt, not the weights.
---
The Personality Accumulation System
How Style Is Captured Without Training
Step 1: Seed Document
Write a 500-word "Mo Identity Document" capturing:
- How Mo structures arguments (claim → evidence → action)
- Vocabulary patterns (technical precision, no hedging, specific examples)
- Response length preferences (short for simple questions, detailed for architecture)
- Cultural references and humor style
- Decision-making patterns (bias toward action, "ship then iterate")
Step 2: Correction Loop
When the twin responds in a way that doesn't sound like Mo:
User: "That doesn't sound like me. I would say it more like: [correction]"
Twin: "Noted. Updating personality model."
→ System stores: {
"trigger": "user asked about project prioritization",
"twin_said": "I believe we should carefully evaluate the options...",
"mo_said": "Ship the one that's closest to done. Evaluate after.",
"rule_extracted": "Mo prefers action over deliberation. Short, decisive answers for prioritization."
}Step 3: Accumulation
Over time, the correction database grows. Before each query, the system retrieves the 5 most relevant corrections and injects them as "style examples" in the prompt. The twin gets more accurate with every correction — without ever training a model.
### The Math
- Fine-tuning requires ~18,000 examples and produces a fixed-point-in-time model
- Personality accumulation requires ~200 corrections and produces an ever-improving dynamic prompt
- At 5 corrections per day of active use, 200 corrections = 40 days
- But the system is useful from correction #1 — each correction immediately improves the next response
---
Implementation: One Week to Live Twin
### Day 1: Identity Document
- Write the 500-word Mo Identity Document
- Extract 20 diverse few-shot examples from existing conversations
- Define the communication style ruleset
### Day 2: Prompt Assembly Engine
Build a Python module that:
1. Accepts a user query
2. Retrieves relevant KB entries (existing RAG)
3. Traverses graph for relational context (existing Graph Kernel)
4. Retrieves relevant personality corrections (new, simple DB)
5. Assembles the complete system prompt
6. Sends to the configured model
class LivingTwin:
def __init__(self):
self.identity = load_identity_doc()
self.rag = RAGClient()
self.graph = GraphClient()
self.corrections = CorrectionDB()
def respond(self, query: str, model: str = "together/qwen3-235b"):
knowledge = self.rag.retrieve(query, top_k=10)
graph_context = self.graph.traverse(query, max_hops=2)
style_corrections = self.corrections.relevant(query, top_k=5)
state = get_realtime_state()
system_prompt = assemble_prompt(
identity=self.identity,
knowledge=knowledge,
graph=graph_context,
corrections=style_corrections,
state=state
)
return call_model(model, system_prompt, query)### Day 3: Correction Capture Interface
- Add a `/twin-correct` slash command to Clawdbot
- When Mo says "I wouldn't say it like that", capture the correction
- Store in Supabase `twin_corrections` table
- Auto-extract style rules using Gemini Flash (cheap, fast)
### Day 4: Integration
- Wire into Clawdbot as a model option
- Add model fallback chain: Together AI → local exo → Claude escalation
- Test with the 39-question eval suite
### Day 5: Deploy + Iterate
- Route 10
- Activate correction loop
- Monitor accuracy and personality fidelity
---
Multi-Model Flexibility
Model Quality Ladder
Query arrives → Check model availability:
1. Together AI Qwen3-235B (free, best quality) → USE
2. Local exo cluster Qwen3.5-35B (free, good quality) → USE
3. Mac4 Ollama Gemma3-4B (free, basic quality) → USE for simple queries
4. Claude Opus (expensive, highest quality) → ESCALATE complex queriesThe Living Document twin doesn't care which model runs underneath. The identity is in the prompt.
---
Cost Analysis
| Component | Monthly Cost |
|---|---|
| Together AI Qwen3-235B inference | $0 (free tier) |
| Correction storage (Supabase) | $0 (existing plan) |
| Identity document maintenance | $0 (one-time + occasional updates) |
| Prompt assembly compute (Mac1) | $0 (local) |
| Total | $0/mo |
Not "$0-15/mo." Literally zero dollars. The most expensive component — the model — is free on Together AI's serverless tier.
---
Risk Assessment
| Risk | Severity | Mitigation |
|---|---|---|
| Personality fidelity never reaches fine-tuned quality | Medium | 200+ corrections should close 90 |
| System prompt too long for smaller models | Low | Qwen3-235B has 262K context; even 35B has 32K |
| Correction loop requires active Mo participation | Medium | Only needed during first 40 days, then plateaus |
| Model API changes break prompt format | Low | Abstract model calls behind unified interface |
| "Uncanny valley" — close but not quite Mo | Medium | Accept 90 |
---
Why This Path Wins
Fine-tuning captures who Mo was. The Living Document captures who Mo is — right now, today, with corrections applied. It ships in one week, costs nothing, works with any model, and gets better with use instead of worse with time.
The question isn't "how do we train a model to be Mo?" The question is "how do we give any model enough context to be Mo?" And the answer is: assemble the right prompt.
---
Stage 1C of 3 — EXPLORE | Path C: The Living Document | ~1,300 words
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
evo-cube-output/cognitive-twin-v9/stage1-path-c.md
Detected Structure
Method · Evaluation · References · Math · Figures · Architecture · is Stage Research