Grand Diomande Research ยท Full HTML Reader

DEP Report โ€” Cognitive Twin Pipeline + MiniMax Fleet Integration

| Category | Score | Weight | Weighted | |----------|-------|--------|----------| | Structure | 7 | 1.0 | 7.0 | | Compilation | 8 | 1.5 | 12.0 | | Integration | 6 | 1.5 | 9.0 | | Content | 7 | 1.0 | 7.0 | | User Journey | 5 | 1.0 | 5.0 | | Deployment | 5 | 1.0 | 5.0 | | **Total** | | **7.0** | **45.0 / 70 = 64.3%** |

Agents That Account for Themselves research note experiment writeup candidate score 28 .md

Full Public Reader

# DEP Report โ€” Cognitive Twin Pipeline + MiniMax Fleet Integration
Date: 2026-02-16
Auditor: Claw ๐Ÿฆž
Scope: `Desktop/cognitive-twin/pipeline/` + `[home-path]`

---

1. Structure (Score: 7/10)

### โœ… Strengths
- Clean separation: `pipeline/` for scripts, `output/` for results
- Registry file (`minimax-fleet/registry.json`) tracks instance metadata
- Single-file scorer is appropriately simple for the task

### โš ๏ธ Issues
- No `__init__.py` or module structure โ€” fine for now but limits importability
- No `requirements.txt` โ€” script uses only stdlib (good) but should document that
- No README.md in `cognitive-twin/` โ€” new contributor can't onboard
- Output files not gitignored โ€” JSONL scoring data could be large

### Recommendations
- [ ] Add `README.md` with pipeline overview, usage, and architecture
- [ ] Add `.gitignore` for `output/*.jsonl` (keep structure, ignore data)
- [ ] Add `CLAUDE.md` for sub-agent context

Structure Score: 7/10

---

2. Compilation / Runtime (Score: 8/10)

### โœ… Strengths
- Pure Python 3, stdlib only (no dependencies to break)
- `urllib.request` instead of `requests` โ€” zero install required
- Parallel execution via `ThreadPoolExecutor` โ€” matches llama.cpp's 4 slots
- Health check before starting โ€” fails fast if MiniMax is down

### โš ๏ธ Issues
- SQL injection risk โ€” `args.role` is interpolated directly into SQL query (`f"role = '{args.role}'"`)
- No retry logic โ€” network hiccups or slot contention cause permanent failures
- No checkpoint/resume โ€” if the 3.5hr run crashes at 80
- Hardcoded DB path โ€” breaks if kimi_memory.db moves

### Recommendations
- [ ] CRITICAL: Parameterize SQL โ€” use `?` placeholders, not f-strings
- [ ] Add exponential backoff retry (3 attempts per turn)
- [ ] Add checkpoint file โ€” write last processed ID, support `--resume`
- [ ] Make DB path configurable via `--db` argument
- [ ] Add `--output` flag to specify output path

Compilation Score: 8/10

---

3. Integration (Score: 6/10)

### โœ… Strengths
- Clawdbot gateway properly configured with `minimax-fleet` provider
- `models.mode: "merge"` preserves all existing providers
- Alias `minimax` registered โ€” accessible via `/model minimax`
- SSH tunnel verified and health-checked

### โš ๏ธ Issues
- Tunnel is ephemeral โ€” dies on Mac sleep, SSH disconnect, or network change
- No auto-reconnect โ€” if tunnel drops mid-scoring, the run fails silently
- No monitoring โ€” nobody alerts when the Vast.ai instance goes down
- No auto-shutdown โ€” instance burns $0.77/hr even when idle
- Clawdbot end-to-end not verified โ€” only direct API tested, never through gateway

### Recommendations
- [ ] Create tunnel keepalive script with autossh or a cron watchdog
- [ ] Add Vast.ai balance monitor to heartbeat checks
- [ ] Add auto-shutdown script โ€” stop instance after N hours idle
- [ ] Test `/model minimax` in a live Discord session โ€” verify full round-trip
- [ ] Add tunnel status to `memory/agent-capacity.json`

Integration Score: 6/10

---

4. Content / Quality (Score: 7/10)

### โœ… Strengths
- Scoring prompt is well-designed โ€” clear taxonomy, structured output
- JSON output format enables downstream pipeline consumption
- 100
- Distribution looks realistic (50

### โš ๏ธ Issues
- No ground truth validation โ€” are the scores actually correct?
- No inter-rater reliability โ€” should score a subset with Claude and compare
- Reasoning overhead โ€” model burns ~70
- No score calibration โ€” what makes a 7 vs 8? No reference examples
- Content truncation at 2000 chars โ€” long technical messages lose context

### Recommendations
- [ ] Score 50 turns with Claude Sonnet โ†’ compare against MiniMax scores โ†’ measure agreement
- [ ] Add few-shot examples to the prompt (1 per density level)
- [ ] Increase content window to 4000 chars for long-form messages
- [ ] Log reasoning_content for audit trail (optional flag)
- [ ] Create `calibration_set.json` with human-verified reference scores

Content Score: 7/10

---

5. User Journey (Score: 5/10)

### โœ… Strengths
- CLI interface with clear flags (`--limit`, `--parallel`, `--dry-run`)
- Progress bar with real-time stats during execution
- Final summary with distribution chart

### โš ๏ธ Issues
- No way to monitor a running job besides `tail -f` the log
- No progress webhook โ€” long runs should ping Discord
- No results viewer โ€” scoring output is raw JSONL, no summary tool
- No pipeline orchestration โ€” density scoring is step 1, but steps 2-4 (WORMS, SFT export, training) don't exist yet
- No dashboard โ€” should post results to #ct-corpus when done

### Recommendations
- [ ] Add `--notify` flag that posts completion to #ct-corpus
- [ ] Create `analyze_scores.py` โ€” reads JSONL, generates distribution report
- [ ] Add `watch_run.sh` script for monitoring
- [ ] Plan next pipeline stages: WORMS augmentation โ†’ SFT export โ†’ training
- [ ] Post live progress to #ct-corpus thread every 500 turns

User Journey Score: 5/10

---

6. Deployment / Operations (Score: 5/10)

### โœ… Strengths
- Vast.ai instance details documented in `registry.json`
- Pipeline runs as a simple background process
- Cost model is clear ($0.77/hr, ~$2.67 for full user corpus)

### โš ๏ธ Issues
- No launchd/systemd service for the SSH tunnel
- No cost tracking โ€” balance check is manual
- No data backup โ€” scoring output lives only on local disk
- No CI/CD โ€” no automated pipeline trigger
- Instance lifecycle manual โ€” start/stop via Vast.ai web UI

### Recommendations
- [ ] Create `com.minimax-fleet.tunnel.plist` for persistent tunnel
- [ ] Add `vast_balance_check.sh` to heartbeat
- [ ] Commit output summaries (not full JSONL) to git
- [ ] Create `vast_ctl.sh` โ€” start/stop/status wrapper for the instance
- [ ] Add fleet health to HEARTBEAT.md checks

Deployment Score: 5/10

---

Overall DEP Score: 6.3/10

CategoryScoreWeightWeighted
Structure71.07.0
Compilation81.512.0
Integration61.59.0
Content71.07.0
User Journey51.05.0
Deployment51.05.0
Total7.0**45.0 / 70 = 64.3

---

Priority Actions (Ranked)

### ๐Ÿ”ด Critical (do now)
1. Fix SQL injection in density_scorer.py โ€” parameterize queries
2. Add checkpoint/resume โ€” can't afford losing a 3.5hr run
3. Create tunnel keepalive โ€” tunnel death = wasted compute

### ๐ŸŸก Important (do this week)
4. Validate against Claude โ€” 50-turn calibration set
5. Add retry logic โ€” 3 attempts with exponential backoff
6. Post results to #ct-corpus when scoring completes
7. Add auto-shutdown for Vast.ai instance

### ๐ŸŸข Nice to have
8. README.md + CLAUDE.md
9. Results analyzer script
10. Few-shot examples in prompt
11. Fleet health in heartbeat

---

Architecture Notes

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     SSH Tunnel      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Mac1 Air  โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚  Vast.ai GPU     โ”‚
โ”‚  Clawdbot   โ”‚   localhost:18080   โ”‚  RTX PRO 6000    โ”‚
โ”‚  Pipeline   โ”‚                     โ”‚  MiniMax M2.5    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜                     โ”‚  141 tok/s       โ”‚
       โ”‚                            โ”‚  $0.77/hr        โ”‚
       โ–ผ                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ kimi_memory  โ”‚
โ”‚   39K msgs   โ”‚โ”€โ”€โ–บ density_scorer.py โ”€โ”€โ–บ scores.jsonl
โ”‚   (SQLite)   โ”‚         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ–ผ
                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                   โ”‚  Next Steps: โ”‚
                   โ”‚  WORMS aug   โ”‚
                   โ”‚  SFT export  โ”‚
                   โ”‚  LoRA train  โ”‚
                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

cognitive-twin/DEP_REPORT.md

Detected Structure

Method ยท Evaluation ยท Figures ยท Code Anchors ยท Architecture