Back to corpus
research noteexperiment writeup candidatescore 32
DLM Package Refactoring Audit
**Date:** 2025-12-08 **Status:** 🔍 In Progress **Purpose:** Comprehensive audit and refactoring plan to reduce technical debt
Full HTML reader
Read the full artifact
Extracted abstract or opening context
**Date:** 2025-12-08 **Status:** 🔍 In Progress **Purpose:** Comprehensive audit and refactoring plan to reduce technical debt
The DLM package has grown to **63,339 lines** across **154 Python files** with significant technical debt and consolidation opportunities.
1. **Large Files**: 10 files exceed 1,000 lines (should be <500) 2. **Duplicate Functionality**: Multiple embedders, loaders, and config systems 3. **Unclear Organization**: Mixed concerns (response/vangaurd, engine overlap) 4. **Legacy Code**: Old implementations (legacy_utils.py, deprecated embedders) 5. **Test Fragmentation**: Tests scattered across multiple locations
| Issue | Severity | Files Affected | Impact | |-------|----------|----------------|--------| | **Pydantic v2 Incompatibility** | 🚨 Critical | models/*.py, base.py | Blocks imports | | **Duplicate Embedders** | ❌ High | engine/embedder.py (1604 lines), engine/ircp_embedder.py | Confusion | | **Mega Files** | ⚠️ High | inference/artificial.py (3691 lines) | Unmaintainable | | **Legacy Utils** | ⚠️ Medium | utils/legacy_utils.py (1518 lines) | Tech debt | | **Test Scatter** | ⚠️ Medium | tests/, core/tests/ | Fragmentation |
**Current State:** - `engine/embedder.py` (1604 lines) - Old implementation - `engine/ircp_embedder.py` (300 lines) - Deprecated - `core/embeddings.py` (295 lines) - NEW Week 2 implementation
Promotion decision
What has to happen next
Attach run IDs, datasets, metrics, and reproduction commands.
Why this is not always a full paper yet
Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.