Back to corpus
architecturetechnical paper candidatescore 44

Stage 2: Compound -- KARL Phase 4+ Unified Architecture

We have a trajectory recording system (110 records), a reward engine (3-signal composite), a shadow vector router (10% cache hit rate), and a training pipeline that produced one adapter (KARL v2, loss 1.843, gemma-3-1b-4bit) from 35 SFT examples. The adapter exists but has never been evaluated for actual routing or planning quality. The finetune daemon on Mac5 is down. The promotion gate says the shadow router is not ready.

Full HTML reader

Read the full artifact

Open in new tab

Extracted abstract or opening context

We have a trajectory recording system (110 records), a reward engine (3-signal composite), a shadow vector router (10% cache hit rate), and a training pipeline that produced one adapter (KARL v2, loss 1.843, gemma-3-1b-4bit) from 35 SFT examples. The adapter exists but has never been evaluated for actual routing or planning quality. The finetune daemon on Mac5 is down. The promotion gate says the shadow router is not ready. **The honest assessment**: We have a data collection system that works and a training pipeline that runs. We do not yet have evidence that the trained model improves anything. The gap between "model trained" and "model useful" is the gap this compound must close. Before any data or algorithm work, the training infrastructure must be reliable and observable. **Actions**: 1. SSH to Mac5, restart finetune daemon, create LaunchAgent `com.openclaw.finetune-daemon.plist` with auto-restart 2. Increase training hyperparameters: seq_len 256->512, LoRA rank 8->16, layers 4->8, batch_size 1->2 3. Create `[home-path]` with three evaluation functions: - `evaluate_routing_accuracy()`: Given test prompts, does the model predict the correct skill? - `evaluate_planning_quality()`: Given tasks, does the model generate tool plans that match reference? - `compare_adapters()`: Head-to-head comparison between two adapter versions 4. Build a held-out evaluation set: 20 real prompts with known-correct skills and tool plans, manually curated from the 34 high-reward trajectories. This set is NEVER used for training. **Validation gate**: Finetune daemon responds on :9200. Evaluator runs successfully on KARL v2. Baseline metrics recorded.

Promotion decision

What has to happen next

Promote into a technical note or architecture paper with implementation anchors.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.