Grand Diomande Research · Full HTML Reader

Evolution³ — Stage 0: Research Brief (V4)

**V2 (Technical Substrate)** decided: - Centralized TypeScript daemon with hot standby on Mac4 - TOML declarative manifests for service/flow/skill definitions - WebSocket pub/sub event bus for inter-service messaging - 43-task master plan, 264 hours, 10-16 weeks

Agents That Account for Themselves proposal experiment writeup candidate score 22 .md

Full Public Reader

# Evolution³ — Stage 0: Research Brief (V4)
## NUMU FARE — Rust Core + Native Thread Intelligence + openclaw Library Integration
### Generated: 2026-03-05 | Method: 3 parallel research agents + 2 prior evo-cube runs (V2 substrate, V3 personal OS)

---

Foundation: What V2 and V3 Established

V2 (Technical Substrate) decided:
- Centralized TypeScript daemon with hot standby on Mac4
- TOML declarative manifests for service/flow/skill definitions
- WebSocket pub/sub event bus for inter-service messaging
- 43-task master plan, 264 hours, 10-16 weeks

V3 (Personal Integration) decided:
- Name: NUMU FARE (ߒߎߡߎ ߝߊ߬) — "The Forge That Crosses"
- CLI: `numu`, Fleet: `numu fleet`, Intelligence: "The Weave"
- 147 skills with heat tracking + memory spines (40 active, 107 cold storage)
- Adaptive enrichment curriculum (MASTER/CRAFTSMAN/JOURNEYMAN depth)
- ADI personas as forge temperatures
- 7-wave implementation plan (~7 weeks, score target 7.6→9.2)

V4 introduces three new dimensions that reshape the architecture:
1. The openclaw/openclaw library patterns — proven at scale, directly adoptable
2. Rust as the daemon language — performance, safety, deployment characteristics
3. Native Discord thread management — threads as first-class workspaces, not afterthoughts

---

Research Dimension 1: openclaw/openclaw Library Patterns

### What It Is
A TypeScript/Node.js personal AI assistant platform (263K+ GitHub stars). It solves many of the same problems NUMU FARE faces: multi-model orchestration, skill/tool management, context enrichment, memory search, session management.

9 Adoptable Patterns

Pattern 1: Typed WebSocket Protocol (TypeBox)
- Every WS message has a runtime-validated schema via TypeBox
- Client and server share type definitions — no parsing ambiguity
- Messages have `type`, `correlationId`, `payload` structure
- Relevance: V2 specified WS event bus but left message schema open. This closes that gap.
- Adoption: Use serde + JSON schema in Rust, or TypeBox if staying TypeScript

Pattern 2: Subagent Registry with Announce Retry
- Each spawned agent gets a `runId` UUID
- Lifecycle: `pending → running → ended → announced`
- If announce fails, retry with exponential backoff (max 3 retries)
- 30-minute hard expiry — orphan detection kills stale agents
- `current_agents` array in parent tracks all children
- Relevance: NUMU currently has no subagent lifecycle tracking. Enriched_spawn.py fires and forgets.
- Adoption: Direct port. The `SubagentRegistry` becomes a core NUMU module.

Pattern 3: Auth Profile Rotation with Cooldowns
- Ordered list of auth profiles (accounts)
- On rate limit or error: mark profile "bad" with cooldown timer
- On success: mark "good" with timestamp
- Exponential backoff on repeated failures
- Runtime snapshot persistence (survives restart)
- Relevance: NUMU has 2 Claude Max accounts with ACMP rotation. This formalizes it.
- Adoption: Direct port. Replace daemon_v3.js ACMP logic with proper rotation module.

Pattern 4: Skills as SKILL.md Modules
- Each skill is a directory with a `SKILL.md` file
- YAML frontmatter declares: name, description, requirements, model preferences
- LLM selects skills at runtime based on task analysis
- Skill files are loaded dynamically, not hardcoded
- Relevance: NUMU already has 147 `[home-path]` files. The pattern matches exactly.
- Adoption: Already adopted. V4 adds the formalized metadata parsing that openclaw uses.

Pattern 5: Lightweight Context Mode
- Cron/heartbeat runs skip loading AGENTS.md, SOUL.md, heavy context
- Task metadata flags: `lightweight: true` or context tier
- Saves 15-20K tokens per lightweight dispatch
- Relevance: Maps directly to V3's JOURNEYMAN/CRAFTSMAN/MASTER enrichment tiers.
- Adoption: Extend the V3 curriculum system with a `CRON` tier below JOURNEYMAN that loads almost nothing.

Pattern 6: Session-Level Model Override
- Any session can specify a model override that applies for its duration
- Does not affect other concurrent sessions
- Cascades to subagents unless they have their own override
- Relevance: NUMU's Discord already has model prefix overrides (gemini:, claude:, codex:). This makes them session-scoped.
- Adoption: Add session-level model persistence in the dispatch context.

Pattern 7: Hybrid Memory Search (BM25 + Vector + Temporal Decay + MMR)
- BM25 for keyword matching (fast, good for exact terms)
- Vector similarity for semantic matching
- Temporal decay weights recent memories higher
- MMR (Maximal Marginal Relevance) ensures diversity in results
- Relevance: RAG++ currently uses vector-only. Adding BM25 and MMR would improve retrieval quality.
- Adoption: Enhance RAG++ :8000 with BM25 fallback and MMR re-ranking.

Pattern 8: Session Compaction with Task Preservation
- When context exceeds 40
- Chunk → summarize → merge strategy
- Active tasks, todos, and code state are preserved through compaction
- Structural markers guide what to keep vs. compress
- Relevance: Claude Code already compacts, but NUMU's enriched context doesn't survive compaction well. This ensures critical state persists.
- Adoption: Add compaction-aware markers to enriched context injection.

Pattern 9: Cron Stagger
- Multiple scheduled tasks don't all fire at :00
- Stagger offsets prevent burst-loading the system
- Health checks verify pre-conditions before cron execution
- Relevance: NUMU's 50 Prefect flows sometimes cluster. This prevents burst overload.
- Adoption: Add stagger offsets to Prefect flow schedules.

---

Research Dimension 2: Rust Implementation Assessment

The Case FOR Rust

Factor	Rust	TypeScript (Node.js)	TypeScript (Bun)
Memory	~10-15 MB daemon	~80-120 MB	~40-60 MB
Startup	<100ms	1-2s	200-500ms
CPU (routing)	~0.5ms per decision	~5ms	~3ms
Binary	Single static binary	node_modules + runtime	bun + node_modules
Crash safety	No null, no data races	Runtime exceptions	Runtime exceptions
Concurrency	Tokio (zero-cost async)	Event loop (single-threaded)	Event loop
Dependencies	cargo.lock reproducible	npm (dependency hell)	npm (same)

Crate Ecosystem (verified):
- `tokio` — Async runtime (the standard)
- `serenity` — Discord API library (feature-complete, threads, forums, voice)
- `reqwest` — HTTP client
- `serde` / `serde_json` / `toml` — Serialization
- `axum` — Web framework (from tokio team)
- `sqlx` — Async Postgres/SQLite
- `tracing` — Structured logging
- `clap` — CLI argument parsing
- `notify` — File watcher (for config hot-reload)
- `tera` / `minijinja` — Template engines
- `jsonschema` — JSON schema validation
- `tungstenite` / `tokio-tungstenite` — WebSocket

The Case AGAINST Rust

Factor	Impact
Learning curve	2-4 months for borrow checker fluency. Mo is a solo operator.
Iteration speed	Compile times (30s-2min for full build) vs. instant reload in TS
Discord library maturity	`serenity` is good but community is smaller than discord.js
MCP integration	MCP protocol libraries exist in TS/Python. Rust would need custom implementation
Rapid prototyping	Rust's type system slows experimentation
Existing codebase	4,671 lines of working JS/TS orchestration code. Rewrite cost is real.

Recommendation: Hybrid Approach

Keep TypeScript (Bun) for the daemon. The NUMU daemon is an orchestration layer — it reads config, dispatches tasks, manages lifecycle. It does NOT do heavy compute. The 80MB→40MB reduction from Node→Bun is sufficient. The iteration speed advantage of TypeScript matters for a solo operator who changes the daemon weekly.

Use Rust for performance-critical modules:
1. The Skill Matcher — 147 skills × embedding similarity on every dispatch. Rust + SIMD = sub-millisecond.
2. The Memory Indexer — BM25 + vector hybrid search. Rust outperforms Python/TS by 10-50x.
3. The Session Compactor — Token counting + chunk boundary detection. CPU-bound, benefits from Rust.
4. The Sentinel — Health check daemon that must be ultra-lightweight and never crash. Rust's zero-cost abstractions are ideal.

Communication: Rust modules expose HTTP APIs (axum) or MCP tool interfaces. The TypeScript daemon calls them as needed. This is the same pattern as RAG++ (Python) being called by enriched_spawn.py (Python) from the daemon (TypeScript).

Migration Path:
- Month 1-3: Build in TypeScript/Bun. Ship fast. Prove architecture.
- Month 4-6: Profile hotspots. Rewrite hottest paths in Rust.
- Month 7+: Evaluate full Rust daemon if TypeScript becomes a bottleneck.

If Building a Full Rust Daemon (Alternative Path)

numu-daemon/
├── Cargo.toml                    # Workspace root
├── crates/
│   ├── numu-core/                # Event bus, config, lifecycle
│   │   ├── src/
│   │   │   ├── config.rs         # TOML config parsing + hot-reload
│   │   │   ├── bus.rs            # WebSocket pub/sub event bus
│   │   │   ├── dispatch.rs       # Task routing + SmartRouter logic
│   │   │   └── registry.rs       # Subagent registry (from openclaw pattern)
│   ├── numu-discord/             # Discord gateway + thread intelligence
│   │   ├── src/
│   │   │   ├── gateway.rs        # Serenity-based bot
│   │   │   ├── threads.rs        # Thread lifecycle manager
│   │   │   ├── forums.rs         # Forum channel task boards
│   │   │   └── reactions.rs      # Reaction reflex handlers
│   ├── numu-skills/              # Skill manifest, heat tracking, matching
│   │   ├── src/
│   │   │   ├── manifest.rs       # 147-skill flat manifest
│   │   │   ├── heat.rs           # Heat score calculation
│   │   │   ├── matcher.rs        # SEA embedding matcher
│   │   │   └── spine.rs          # Memory spine per skill
│   ├── numu-intel/               # Intelligence pipeline
│   │   ├── src/
│   │   │   ├── rag.rs            # RAG++ client
│   │   │   ├── memory.rs         # Hybrid BM25+vector search
│   │   │   ├── enrichment.rs     # Adaptive curriculum builder
│   │   │   └── compaction.rs     # Session compaction
│   ├── numu-sentinel/            # Health monitoring + self-healing
│   │   ├── src/
│   │   │   ├── health.rs         # Fleet health checks
│   │   │   ├── immune.rs         # Adaptive immunity
│   │   │   └── launcher.rs       # Service lifecycle management
│   └── numu-cli/                 # CLI entry point
│       └── src/main.rs           # clap-based CLI

---

Research Dimension 3: Native Discord Thread Intelligence

Thread Types (Discord API)

Type	ID	Description	Use in NUMU
PUBLIC_THREAD	11	Visible to channel members	Task workspaces
PRIVATE_THREAD	12	Invite-only	Sensitive ops (credentials, billing)
ANNOUNCEMENT_THREAD	10	In announcement channels	Release notes, status broadcasts
GUILD_FORUM	15	Forum channel with tags	Task boards, project tracking

Thread-as-Workspace Pattern

Current state: Clawdbot's `threadMode: 'per-task'` creates threads but treats them as flat message containers. No thread context awareness. No thread lifecycle. No cross-thread intelligence.

V4 Design — Thread Intelligence:

1. Thread Context Injection
Before responding in a thread, read the last 50 messages to understand context:

buildThreadContext(threadId) → {
  summary: AI-generated summary of thread so far,
  participants: who has contributed,
  artifacts: pinned messages, code blocks, links,
  status: inferred from last messages (active/blocked/waiting/done),
  parentChannel: which channel spawned this thread,
  relatedThreads: threads with similar topics (vector similarity)
}

This context is injected into the enriched dispatch, making the agent "aware" of the thread's full history.

2. Forum Channels as Task Boards
Convert `#task-dispatch` from a flat channel to a `GUILD_FORUM` (type 15):

Forum: #forge-tasks
├── Tags: [pending, in-progress, blocked, done, review]
├── Thread: "Deploy Spore v1.3 to TestFlight"
│   ├── Tag: in-progress
│   ├── Pinned: ExportOptions.plist config
│   └── Messages: build log, error resolution, success confirmation
├── Thread: "Graph Kernel reconnection"
│   ├── Tag: blocked
│   ├── Pinned: Error 65 Supabase trace
│   └── Messages: diagnosis attempts, workaround proposals
└── Thread: "NUMU daemon scaffold"
    ├── Tag: pending
    └── Messages: architecture decisions, code snippets

Available tags per forum: Up to 20 tags. Map to: priority (P0-P3), status (pending/active/blocked/done/review), domain (ios/infra/intel/discord/content), temperature (phoenix/sengoku/reddog/monkey).

3. Thread Lifecycle Management

CREATE → ACTIVE → [BLOCKED] → RESOLVED → ARCHIVED
   ↓         ↓          ↓           ↓
 Name it   Work in    Flag with   Pin summary
 Tag it    it         tag change  Archive thread
                                  Memory pipeline

Auto-archive: Threads with no activity for 24h get tagged `stale`. After 72h, auto-archive with AI summary.
Thread-to-Memory: On archive, generate summary → write to Obsidian vault → index in RAG++.
Thread Resurrection: Reference an archived thread in a new message → auto-unarchive with context.

4. Cross-Thread Awareness

When dispatching a new task, check existing threads:
- BM25 keyword match on thread names + pinned messages
- Vector similarity on thread summaries
- If match score > 0.7: reply in existing thread instead of creating new one
- If match score 0.4-0.7: create new thread but link to related thread

5. Thread Reply Intelligence

The bot should understand thread hierarchy:
- Direct reply to a specific message in a thread → response should reference that message
- Thread starter message → response addresses the original task
- Reaction to a message in a thread → triggers the reaction reflex (approve/reject/gold/priority)
- Bot can pin its own messages (artifacts, summaries, final results)

6. Thread-Based Team Decomposition

Current: `/team <task>` creates subtask threads. V4 enhancement:

/team "Deploy all Wave 3 apps"
  → Forum thread: "Team: Deploy all Wave 3 apps" [tag: in-progress]
    → Sub-thread 1: "Subtask: Build archives" [tag: pending]
    → Sub-thread 2: "Subtask: Upload to ASC" [tag: blocked-by:1]
    → Sub-thread 3: "Subtask: Submit for review" [tag: blocked-by:2]
    → Summary post: Progress table, updated automatically

7. Webhook Threading

Discord webhooks support `thread_id` parameter — send messages to specific threads:

POST /webhooks/{id}/{token}?thread_id={thread_id}

This means Prefect flows, Sentinel alerts, and build pipelines can post directly to their relevant thread instead of flooding `#heartbeat`.

### Rate Limits
- Thread creation: 10 per channel per 10 minutes
- Message send: 5 per second per channel
- Thread list fetch: 50 per request (paginated)
- Archive/unarchive: Standard API rate (50/sec global)

---

Research Dimension 4: Integration with V3 Architecture

### Skills Integration (147 Skills → Thread-Aware)
- Each skill activation can create or reply in a thread
- Skill memory spines persist per-thread context
- Combinators (multi-skill squads) get their own threads
- SEA activation logging includes thread context

### Pane Awareness → Thread Awareness
- Pane registry tracks which Claude Code pane is working in which thread
- Cross-pollination detection works across threads (not just panes)
- Claim conflicts prevent two panes from modifying the same thread simultaneously

### Enrichment Engine → Thread Context Layer
Add an 8th layer to the 7-layer enrichment:

Layer 1: Identity (AGENTS.md, SOUL.md)
Layer 2: Project (CLAUDE.md, CHECKLIST.md)
Layer 3: Skills (heat-tracked, spine-injected)
Layer 4: Semantic Memory (RAG++ + CIA + BM25/MMR)
Layer 5: Protocols (Research-First, CEF, TEAM)
Layer 6: Quality Gates
Layer 7: Race Protocol
Layer 8: Thread Context (NEW — thread summary, participants, artifacts, related threads)

### Build Pipeline → Thread-Aware
- Each `numu shipyard build <app>` creates a thread for the build
- Build logs stream into the thread in real-time
- Errors pin themselves with stack traces
- Success pins the TestFlight URL

---

Open Questions for Stage 1

1. Rust or TypeScript/Bun for the daemon? The hybrid approach is recommended, but each Stage 1 path should explore what a full commitment to one language looks like.
2. How deep should thread intelligence go? Thread context injection adds latency (50-message fetch + AI summary). What's the right depth vs. speed tradeoff?
3. Forum channels vs. regular channels? Forum channels (GUILD_FORUM) are powerful but change the UX significantly. Should all task channels be forums, or just `#forge-tasks`?
4. openclaw patterns: port or adapt? Some patterns (subagent registry, auth rotation) can be ported almost verbatim. Others (hybrid memory, compaction) need adaptation for NUMU's unique context.
5. Thread-to-memory pipeline: How much of a thread should be preserved? Full transcript? AI summary only? Key artifacts?

---

End of Stage 0 Research Brief (V4). Sourced from: openclaw/openclaw GitHub analysis, Rust crate ecosystem audit, Discord API thread documentation, V2 substrate decisions, V3 NUMU FARE architecture.
Ready for Stage 1: EXPLORE — 3 divergent paths incorporating Rust, threads, and openclaw patterns into NUMU FARE.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

evo-cube-output/v4-rust-threads/stage0-research-brief.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture · is Stage Research