Grand Diomande Research · Full HTML Reader

DEP Integration Report — Skill Entity Architecture (SEA)

**Date:** 2026-02-18 **Evaluator:** Subagent DEP (Deep Evaluation Pass) **Scope:** Full 4-layer integration verification **Overall Verdict:** ⚠️ **PARTIALLY INTEGRATED**

Agents That Account for Themselves architecture technical paper candidate score 62 .md

Full Public Reader

DEP Integration Report — Skill Entity Architecture (SEA)

Date: 2026-02-18
Evaluator: Subagent DEP (Deep Evaluation Pass)
Scope: Full 4-layer integration verification
Overall Verdict: ⚠️ PARTIALLY INTEGRATED

---

Executive Summary

The SEA system is architecturally sound and functionally operational. All 13 skill entities exist with valid metadata, the scoring pipeline works end-to-end, and enriched dispatch integration is live. However, Tier 1 embedding-based routing is underperforming (only 6/10 queries pass the 0.4 threshold), the keyword fallback is silently compensating, and there are minor gaps in documentation alignment and plan tracking.

---

Layer 1: Data Layer — ✅ PASS (with notes)

### Skill-Memory Entities (13/13)
| Entity | state.json | hot_topics | cold_topics | versions.json | snapshots | activation-log |
|--------|-----------|------------|-------------|--------------|-----------|----------------|
| art:convergent | ✅ | 5 | 2 | ✅ | ✅ | ✅ |
| art:creative | ✅ | 5 | 2 | ✅ | ✅ | ✅ |
| art:divergent | ✅ | 5 | 2 | ✅ | ✅ | ✅ |
| art:dj | ✅ | 5 | 2 | ✅ | ✅ | ✅ |
| art:movement | ✅ | 5 | 2 | ✅ | ✅ | ✅ |
| art:snark | ✅ | 5 | 2 | ✅ | ✅ | ✅ |
| art:synthesis | ✅ | 5 | 2 | ✅ | ✅ | ✅ |
| nav:nonlinear | ✅ | 5 | 2 | ✅ | ✅ | ✅ |
| nav:organic | ✅ | 5 | 2 | ✅ | ✅ | ✅ |
| nav:perspective | ✅ | 5 | 2 | ✅ | ✅ | ✅ |
| phi:metaphysical | ✅ | 5 | 2 | ✅ | ✅ | ✅ |
| phi:paradox | ✅ | 5 | 2 | ✅ | ✅ | ✅ |
| phi:veritas | ✅ | 5 | 2 | ✅ | ✅ | ✅ |

### ⚠️ Finding: Missing `confidence` field
All 13 entities use `confidence_calibration` (value: 0.73) instead of `confidence` in state.json. The enriched_spawn.py `_load_skill_entity()` correctly reads `confidence_calibration` and maps it to `confidence` in the returned dict, so this is a naming inconsistency but NOT a functional bug.

### Embedding Cache
- File: `[home-path]` — ✅ EXISTS (64.8 KB)
- Shape: `(13, 384)` float32 — ✅ correct dimensions
- Model: `all-minilm` (all-MiniLM-L6-v2 via Ollama on Mac4)
- Keys: embeddings, skill_names, texts, model, dim, timestamp — all valid
- Skill names: 13 entries, dtype `<U16` — ✅

### Channel Map
- File: `[home-path]` — ✅ VALID
- Structure: meta (guild_id, category_id, category_name, created, version) + 13 channel mappings
- All 13 skills mapped to Discord channel IDs

---

Layer 2: Scoring Layer — ⚠️ PARTIAL PASS

### Tier 1 Router (`tier1_router.py`)
- Module loads: ✅
- Exports: `load_embeddings`, `route_message`, `route_message_with_timing`, `cosine_similarity`, `run_tests`
- Embeddings load: ✅ Returns tuple (embeddings_matrix, skill_names)
- Embedding queries: ✅ Uses Ollama on Mac4 ([ip]:11434) with `all-minilm` model
- Mac4 Ollama reachable: ✅ (HTTP 200)

#### ⚠️ Critical Finding: Low Embedding Similarity Scores
Tested 10 diverse queries — only 6/10 pass the 0.4 threshold:

Query	Best Match	Similarity	Pass?
"Help me write a creative poem about stars"	art:divergent	0.2721	❌
"Help me DJ a set for a rooftop party"	art:dj	0.3736	❌
"Reframe my perspective on career changes"	nav:perspective	0.3678	❌
"Explore cosmic consciousness and transcendence"	phi:metaphysical	0.5896	✅
"Create a witty satirical poem about AI"	art:snark	0.4692	✅
"Navigate uncertainty in my startup journey"	nav:nonlinear	0.4369	✅
"Brainstorm innovative product ideas"	art:creative	0.4006	✅
"Synthesize themes from three different books"	art:synthesis	0.4383	✅
"Cultivate an organic growth strategy"	nav:organic	0.5305	✅
"Tell me the truth about quantum physics"	phi:metaphysical	0.1864	❌

Root cause: The skill embeddings are generated from full SKILL.md content (up to 817 chars), which includes meta-language about the skill's technique rather than the topics it handles. Short user queries don't semantically align well with these description-style embeddings. The threshold of 0.4 is appropriate but the embeddings need richer topic-oriented text.

Mitigated by: The keyword fallback in `sea_skill_injector.py` catches the missed cases. The overall detection accuracy is 8/10 when combining both methods.

### Tier 2 Scorer (`tier2_scorer.py`)
- Module loads: ✅
- MiniMax at localhost:18080: ✅ (HTTP 200)
- API: `score_candidates(candidates: List[Dict], message, context)` expects dicts with `skill` key
- Exports: `score_candidates`, `score_skill`, `run_tests`, plus constants

#### ⚠️ Minor Finding: API Contract
`score_candidates()` expects `List[Dict]` with `{"skill": "name", "tier": "..."}` structure. Direct string lists won't work. This is consistent with how `tier1_router.route_message()` returns results, so the pipeline contract holds.

### SEA Skill Injector (`sea_skill_injector.py`)
- Module loads: ✅
- `detect_sea_skills_for_task()`: ✅ Returns `List[str]` of skill names
- Uses tier1_router internally, falls back to keyword matching on failure
- Keyword fallback: Covers all 13 skills with regex patterns

Benchmark Results (MiniMax Scoring)

Mean latency:      1574ms  ✅ PASS (<5s SLO)
P95 latency:       1615ms  ✅ PASS
Parallel (5 skills): ~1759ms  ✅ PASS
Full pipeline:     ~1959ms (2.0s)  ✅ PASS (<30s SLO)
Mean tokens/sec:   123.8
OVERALL VERDICT:   GO

---

Layer 3: Dispatch Integration — ✅ PASS (with one bug found and tested)

### Enriched Spawn (`[home-path]`)
- `detect_sea_skills(task, project_path)`: ✅ Works correctly
- Calls `sea_skill_injector.detect_sea_skills_for_task()` internally
- Also scans project CLAUDE.md for explicit skill references
- Enriches results with entity metadata (state, versions, hot_topics)
- Returns `List[Dict]` with full entity info

`format_sea_skill_block(entities)`: ✅ Works when called correctly
Signature: `format_sea_skill_block(entities: List[Dict]) → str`
Produces valid prompt injection block with skill name, version, confidence, hot_topics, and SKILL.md path
Note: Earlier test error was caused by calling with a string argument. In actual flow, it receives the List[Dict] from `detect_sea_skills()` — no bug in production path.

- Full flow tested:

  detect_sea_skills("Write a philosophical essay about truth and paradox")
  → [{"name": "phi:paradox", "confidence": 0.73, "hot_topics": [...], ...}]
  → format_sea_skill_block() → 394-char prompt block ✅

### Enriched Dual Dispatch (`enriched-dual-dispatch.sh`)
- `--no-sea` flag: ✅ Present (line 25, 42, 65-66, 214-218)
- SEA on by default: ✅ (`NO_SEA=false` at line 42)
- Passes SEA flag to enriched_spawn.py: ✅ (line 218)
- Status output: ✅ Reports "SEA Skills: auto-detect" or "SEA Skills: disabled"

### 10-Query Skill Matching Test
| # | Query | Detected Skills | Correct? |
|---|-------|----------------|----------|
| 1 | Write a philosophical essay about truth and paradox | phi:paradox | ✅ |
| 2 | Help me DJ a set for a rooftop party | (none) | ❌ Should match art:dj |
| 3 | Reframe my perspective on career changes | (none) | ❌ Should match nav:perspective |
| 4 | Build an iOS app with SwiftUI | (none) | ✅ Correctly no SEA match |
| 5 | Explore cosmic consciousness and transcendence | phi:metaphysical, nav:perspective | ✅ |
| 6 | Create a witty satirical poem about AI | art:snark | ✅ |
| 7 | Navigate uncertainty in my startup journey | nav:nonlinear | ✅ |
| 8 | Brainstorm innovative product ideas | art:creative | ✅ |
| 9 | Synthesize themes from three different books | art:synthesis | ✅ |
| 10 | Cultivate an organic growth strategy for my business | nav:organic | ✅ |

**Accuracy: 8/10 (80

---

Layer 4: Documentation — ⚠️ PARTIAL PASS

### MIGRATION-GUIDE.md
- Exists: ✅ (21 KB, comprehensive)
- References correct paths: ✅ (42 references to core files)
- Covers: state.json, versions.json, embedding-cache, tier1/tier2 integration, enriched dispatch
- Accuracy: Good — matches actual implementation structure

### CREATIVE_EVOLUTION_SEA_v1.md
- Exists: ✅ (60 KB, extensive)
- References: 131 mentions of core SEA components
- Covers: Conceptual architecture, tier design, MiniMax, Discord, enrichment
- Accuracy: Good — aligns with implemented architecture

### COMPLETE.md Files
| Task | COMPLETE.md | In Plan? | Status |
|------|------------|----------|--------|
| SEA-0.1 | ✅ | ✅ | Aligned |
| SEA-0.2 | ✅ | ❌ (pre-plan) | OK — early phase work |
| SEA-0.3 | ✅ | ❌ (pre-plan) | OK — early phase work |
| SEA-0.4 | ✅ | ❌ (pre-plan) | OK — early phase work |
| SEA-0.5 | ✅ | ❌ (pre-plan) | OK — early phase work |
| SEA-1.1 | ✅ | ✅ | Aligned |
| SEA-1.2 | ✅ | ✅ | Aligned |
| SEA-1.3 | ❌ MISSING | ✅ (complete in plan) | ⚠️ Gap |
| SEA-1.4 | ✅ | ❌ (pre-plan) | OK — extra work |
| SEA-2.1 | ✅ | ✅ | Aligned |
| SEA-2.2 | ❌ MISSING | ✅ (complete in plan) | ⚠️ Gap |
| SEA-3.1 | ✅ | ❌ (pre-plan) | OK — early phase work |
| SEA-4.2 | ✅ | ❌ (pre-plan) | OK — early phase work |
| SEA-4.3 | ✅ | ❌ (pre-plan) | OK — early phase work |

Missing COMPLETE files: SEA-1.3 (skill composition/chaining) and SEA-2.2 (migration guide). Both are marked complete in the plan but lack completion artifacts.

### CLAUDE.md / AGENTS.md References
- AGENTS.md: ❌ No reference to SEA — mentions skills generically but not SEA architecture
- Project CLAUDE.md files: 5 projects scanned, none reference SEA explicitly. Initial search hits were false positives (matching "sea" or "search" in unrelated content).

---

## Sea-Plan.json Status
- Location: `[home-path]`
- Status: `active` (should be `complete` — all 6 tasks are done)
- Metrics stale: `tasks_complete: 0`, `completion_pct: 0` — metrics not updated despite all tasks being complete
- Wave statuses: All still `pending` despite all tasks being `complete`

---

Discrepancies Found

### Critical
1. Tier 1 embedding similarity too low — 40

### Moderate
2. Keyword fallback case sensitivity — `\bdj\b` regex in `_keyword_fallback` doesn't match "DJ" (uppercase). Should use `re.IGNORECASE`.
3. sea-plan.json metrics stale — All tasks marked complete individually but overall metrics show 0
4. Missing COMPLETE files — SEA-1.3 and SEA-2.2 lack completion artifacts despite plan showing them as complete.

### Minor
5. `confidence_calibration` vs `confidence` naming — State files use `confidence_calibration`, code maps it correctly but naming is inconsistent across the codebase.
6. No AGENTS.md reference to SEA — The system is deployed but not documented in the workspace governance file.
7. `total_activations` all zero — No skills have ever been activated in production. System is deployed but untested in live traffic.
8. np.str_ type leaking — `detect_sea_skills_for_task()` returns `np.str_` objects instead of plain Python strings when Tier 1 provides results. Works functionally but is a type hygiene issue.

---

Recommendations

### P0 — Fix Now
1. Improve embeddings — Re-index with topic-augmented text. Prepend hot_topics and example queries to each SKILL.md before embedding. This should raise similarity scores by 0.1-0.2 across the board.
2. Fix keyword regex case sensitivity — Add `re.IGNORECASE` flag to all `_keyword_fallback` patterns in both `sea_skill_injector.py` and `enriched_spawn.py`.

### P1 — Fix Soon
3. Update sea-plan.json — Set status to `complete`, update all wave statuses, fix metrics counters.
4. Create missing COMPLETE files — Generate SEA-1.3-COMPLETE.md and SEA-2.2-COMPLETE.md.
5. Cast np.str_ to str — In `detect_sea_skills_for_task()`, wrap returns in `str()` calls.

### P2 — Nice to Have
6. Add SEA reference to AGENTS.md — Document the SEA system in workspace governance for future agent awareness.
7. Lower Tier 1 threshold to 0.35 — Would capture 2 more queries (art:dj at 0.37, nav:perspective at 0.37) without significant false positive risk.
8. Add live activation tracking — Hook into actual dispatch to increment `total_activations` and validate the system in production.

---

Overall Verdict

⚠️ PARTIALLY INTEGRATED

What works:
- All 13 entities structurally complete ✅
- Embedding cache valid ✅
- Tier 2 MiniMax scoring operational ✅
- Enriched dispatch integration functional ✅
- `--no-sea` flag works ✅
- Full pipeline latency well within SLO ✅
- 80

What needs attention:
- Tier 1 embeddings underperforming (60
- Keyword fallback compensating but has case sensitivity bugs
- Plan metadata stale / not reflecting actual completion
- No live traffic validation (zero activations)
- No AGENTS.md documentation of the system

Path to FULLY INTEGRATED: Fix P0 items (embedding quality + regex case), update plan metadata, and observe one week of live activations with >85

---

Report generated: 2026-02-18T19:45:00-04:00
Verification session: sea-dep-verification

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

skill-entity-architecture/DEP-INTEGRATION-REPORT.md

Detected Structure

Method · Evaluation · References · Code Anchors · Architecture