DEP Report โ Cognitive Twin Pipeline + MiniMax Fleet Integration
| Category | Score | Weight | Weighted | |----------|-------|--------|----------| | Structure | 7 | 1.0 | 7.0 | | Compilation | 8 | 1.5 | 12.0 | | Integration | 6 | 1.5 | 9.0 | | Content | 7 | 1.0 | 7.0 | | User Journey | 5 | 1.0 | 5.0 | | Deployment | 5 | 1.0 | 5.0 | | **Total** | | **7.0** | **45.0 / 70 = 64.3%** |
Full Public Reader
# DEP Report โ Cognitive Twin Pipeline + MiniMax Fleet Integration
Date: 2026-02-16
Auditor: Claw ๐ฆ
Scope: `Desktop/cognitive-twin/pipeline/` + `[home-path]`
---
1. Structure (Score: 7/10)
### โ
Strengths
- Clean separation: `pipeline/` for scripts, `output/` for results
- Registry file (`minimax-fleet/registry.json`) tracks instance metadata
- Single-file scorer is appropriately simple for the task
### โ ๏ธ Issues
- No `__init__.py` or module structure โ fine for now but limits importability
- No `requirements.txt` โ script uses only stdlib (good) but should document that
- No README.md in `cognitive-twin/` โ new contributor can't onboard
- Output files not gitignored โ JSONL scoring data could be large
### Recommendations
- [ ] Add `README.md` with pipeline overview, usage, and architecture
- [ ] Add `.gitignore` for `output/*.jsonl` (keep structure, ignore data)
- [ ] Add `CLAUDE.md` for sub-agent context
Structure Score: 7/10
---
2. Compilation / Runtime (Score: 8/10)
### โ
Strengths
- Pure Python 3, stdlib only (no dependencies to break)
- `urllib.request` instead of `requests` โ zero install required
- Parallel execution via `ThreadPoolExecutor` โ matches llama.cpp's 4 slots
- Health check before starting โ fails fast if MiniMax is down
### โ ๏ธ Issues
- SQL injection risk โ `args.role` is interpolated directly into SQL query (`f"role = '{args.role}'"`)
- No retry logic โ network hiccups or slot contention cause permanent failures
- No checkpoint/resume โ if the 3.5hr run crashes at 80
- Hardcoded DB path โ breaks if kimi_memory.db moves
### Recommendations
- [ ] CRITICAL: Parameterize SQL โ use `?` placeholders, not f-strings
- [ ] Add exponential backoff retry (3 attempts per turn)
- [ ] Add checkpoint file โ write last processed ID, support `--resume`
- [ ] Make DB path configurable via `--db` argument
- [ ] Add `--output` flag to specify output path
Compilation Score: 8/10
---
3. Integration (Score: 6/10)
### โ
Strengths
- Clawdbot gateway properly configured with `minimax-fleet` provider
- `models.mode: "merge"` preserves all existing providers
- Alias `minimax` registered โ accessible via `/model minimax`
- SSH tunnel verified and health-checked
### โ ๏ธ Issues
- Tunnel is ephemeral โ dies on Mac sleep, SSH disconnect, or network change
- No auto-reconnect โ if tunnel drops mid-scoring, the run fails silently
- No monitoring โ nobody alerts when the Vast.ai instance goes down
- No auto-shutdown โ instance burns $0.77/hr even when idle
- Clawdbot end-to-end not verified โ only direct API tested, never through gateway
### Recommendations
- [ ] Create tunnel keepalive script with autossh or a cron watchdog
- [ ] Add Vast.ai balance monitor to heartbeat checks
- [ ] Add auto-shutdown script โ stop instance after N hours idle
- [ ] Test `/model minimax` in a live Discord session โ verify full round-trip
- [ ] Add tunnel status to `memory/agent-capacity.json`
Integration Score: 6/10
---
4. Content / Quality (Score: 7/10)
### โ
Strengths
- Scoring prompt is well-designed โ clear taxonomy, structured output
- JSON output format enables downstream pipeline consumption
- 100
- Distribution looks realistic (50
### โ ๏ธ Issues
- No ground truth validation โ are the scores actually correct?
- No inter-rater reliability โ should score a subset with Claude and compare
- Reasoning overhead โ model burns ~70
- No score calibration โ what makes a 7 vs 8? No reference examples
- Content truncation at 2000 chars โ long technical messages lose context
### Recommendations
- [ ] Score 50 turns with Claude Sonnet โ compare against MiniMax scores โ measure agreement
- [ ] Add few-shot examples to the prompt (1 per density level)
- [ ] Increase content window to 4000 chars for long-form messages
- [ ] Log reasoning_content for audit trail (optional flag)
- [ ] Create `calibration_set.json` with human-verified reference scores
Content Score: 7/10
---
5. User Journey (Score: 5/10)
### โ
Strengths
- CLI interface with clear flags (`--limit`, `--parallel`, `--dry-run`)
- Progress bar with real-time stats during execution
- Final summary with distribution chart
### โ ๏ธ Issues
- No way to monitor a running job besides `tail -f` the log
- No progress webhook โ long runs should ping Discord
- No results viewer โ scoring output is raw JSONL, no summary tool
- No pipeline orchestration โ density scoring is step 1, but steps 2-4 (WORMS, SFT export, training) don't exist yet
- No dashboard โ should post results to #ct-corpus when done
### Recommendations
- [ ] Add `--notify` flag that posts completion to #ct-corpus
- [ ] Create `analyze_scores.py` โ reads JSONL, generates distribution report
- [ ] Add `watch_run.sh` script for monitoring
- [ ] Plan next pipeline stages: WORMS augmentation โ SFT export โ training
- [ ] Post live progress to #ct-corpus thread every 500 turns
User Journey Score: 5/10
---
6. Deployment / Operations (Score: 5/10)
### โ
Strengths
- Vast.ai instance details documented in `registry.json`
- Pipeline runs as a simple background process
- Cost model is clear ($0.77/hr, ~$2.67 for full user corpus)
### โ ๏ธ Issues
- No launchd/systemd service for the SSH tunnel
- No cost tracking โ balance check is manual
- No data backup โ scoring output lives only on local disk
- No CI/CD โ no automated pipeline trigger
- Instance lifecycle manual โ start/stop via Vast.ai web UI
### Recommendations
- [ ] Create `com.minimax-fleet.tunnel.plist` for persistent tunnel
- [ ] Add `vast_balance_check.sh` to heartbeat
- [ ] Commit output summaries (not full JSONL) to git
- [ ] Create `vast_ctl.sh` โ start/stop/status wrapper for the instance
- [ ] Add fleet health to HEARTBEAT.md checks
Deployment Score: 5/10
---
Overall DEP Score: 6.3/10
| Category | Score | Weight | Weighted |
|---|---|---|---|
| Structure | 7 | 1.0 | 7.0 |
| Compilation | 8 | 1.5 | 12.0 |
| Integration | 6 | 1.5 | 9.0 |
| Content | 7 | 1.0 | 7.0 |
| User Journey | 5 | 1.0 | 5.0 |
| Deployment | 5 | 1.0 | 5.0 |
| Total | 7.0 | **45.0 / 70 = 64.3 |
---
Priority Actions (Ranked)
### ๐ด Critical (do now)
1. Fix SQL injection in density_scorer.py โ parameterize queries
2. Add checkpoint/resume โ can't afford losing a 3.5hr run
3. Create tunnel keepalive โ tunnel death = wasted compute
### ๐ก Important (do this week)
4. Validate against Claude โ 50-turn calibration set
5. Add retry logic โ 3 attempts with exponential backoff
6. Post results to #ct-corpus when scoring completes
7. Add auto-shutdown for Vast.ai instance
### ๐ข Nice to have
8. README.md + CLAUDE.md
9. Results analyzer script
10. Few-shot examples in prompt
11. Fleet health in heartbeat
---
Architecture Notes
โโโโโโโโโโโโโโโ SSH Tunnel โโโโโโโโโโโโโโโโโโโโ
โ Mac1 Air โโโโโโโโโโโโโโโโโโโโโโบโ Vast.ai GPU โ
โ Clawdbot โ localhost:18080 โ RTX PRO 6000 โ
โ Pipeline โ โ MiniMax M2.5 โ
โโโโโโโโฌโโโโโโโ โ 141 tok/s โ
โ โ $0.77/hr โ
โผ โโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโ
โ kimi_memory โ
โ 39K msgs โโโโบ density_scorer.py โโโบ scores.jsonl
โ (SQLite) โ โ
โโโโโโโโโโโโโโโโ โผ
โโโโโโโโโโโโโโโโ
โ Next Steps: โ
โ WORMS aug โ
โ SFT export โ
โ LoRA train โ
โโโโโโโโโโโโโโโโPromotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
cognitive-twin/DEP_REPORT.md
Detected Structure
Method ยท Evaluation ยท Figures ยท Code Anchors ยท Architecture