IRCP & DLMDataLoader Integration - Quick Reference
| Component | File Path | |-----------|-----------| | **IRCP Trainer** | `packages/ircp/training/icp_trainer.py` | | **IRCP Database Loader** | `packages/ircp/data/database_loader.py` | | **IRCP Base Models** | `packages/ircp/core/base_models.py` | | **DLM Data Loader** | `packages/dlm/core/data_loader.py` | | **TPO Trainer** | `packages/tpo/training/trainer.py` | | **Database Enhanced RCP** | `packages/tpo/consolidation/knowledge_base/database_enhanced_rcp.py` |
Full Public Reader
IRCP & DLMDataLoader Integration - Quick Reference
File Locations
| Component | File Path |
|---|---|
| IRCP Trainer | `packages/ircp/training/icp_trainer.py` |
| IRCP Database Loader | `packages/ircp/data/database_loader.py` |
| IRCP Base Models | `packages/ircp/core/base_models.py` |
| DLM Data Loader | `packages/dlm/core/data_loader.py` |
| TPO Trainer | `packages/tpo/training/trainer.py` |
| Database Enhanced RCP | `packages/tpo/consolidation/knowledge_base/database_enhanced_rcp.py` |
Key Classes
IRCP Training
ICPTrainer (main trainer class)
- train()
- validate_epoch()
- train_epoch()
- _compute_loss() [5-component loss]
- save_checkpoint()
- export_model()
ICPDataset (PyTorch Dataset)
- __getitem__() returns dict with:
- embedding: torch.Tensor
- coordinates: torch.Tensor (4D)
- target: torch.Tensor
- message_id, conversation_id, authorIRCP Data Loading
DatabaseLoader
- get_conversation_ids()
- load_conversation()
- load_conversations_parallel()
- _load_coordinates_batch()
- _load_embeddings_batch()
- create_icp_dataset()
ConversationDataLoader (High-level interface)
- load_training_data() [returns train/val/test split]
- load_sample_data()
- get_statistics()DLM Data Loading
DLMDataLoader (context manager enabled)
- get_conversation_ids()
- load_conversation()
- load_conversations() [iterator pattern]
- _load_coordinates_batch() [with caching]
- _load_embeddings_batch() [with caching]
- get_statistics()
- close()Data Structure Compatibility
### Coordinates
| IRCP DLMCoordinates | DLM DLMCoordinate | Status |
|-------|-------|--------|
| x | x | Direct |
| y | y | Direct |
| z | z | Direct |
| t | t | Direct |
| depth | depth_level | Direct |
| sibling_count | n_parts | Semantic map |
| is_linear | (missing) | Need default |
| confidence | confidence | Direct |
### ConversationNode
Both have ConversationNode but:
- IRCP uses: `DLMCoordinates` (from ircp/core/base_models.py)
- DLM uses: `DLMCoordinate` (from dlm/core/coordinates.py)
Loss Function Components
1. Coordinate Prediction Loss (weight: 1.0) - MSE
2. Embedding Consistency Loss (weight: 0.1) - Cosine similarity
3. Conservation Constraint Loss (weight: 0.05) - Measure preservation
4. Topological Consistency Loss (weight: 0.1) - k-NN preservation
5. L2 Regularization (weight: 1e-5) - Parameter regularization
Training Configuration Parameters
config = {
"epochs": 50,
"batch_size": 32,
"learning_rate": 1e-4,
"weight_decay": 1e-5,
"optimizer": "adamw", # adamw, adam, sgd
"scheduler": "cosine", # cosine, step, exponential
"max_grad_norm": 1.0,
"save_checkpoints": True,
"output_dir": "./checkpoints"
}Database Schema
### IRCP Expected Schema
- conversations: conversation_id, total_messages
- messages: message_id, conversation_id, parent_id, content, author, create_time, token_count
- dlm_coordinates: message_id, x_coord, y_coord, z_coord, t_coord, depth, sibling_order, sibling_count, is_linear
- embeddings: message_id, embedding_vector
### DLM Expected Schema
- conversations: conversation_id, total_messages
- messages: message_id, conversation_id, parent_id, content, author, create_time, token_count, end_turn, weight
- dlm_coordinates: message_id, x, y, z, t, n_parts, depth_level, sibling_index, confidence
- embeddings: message_id, embedding
Critical Difference: Column names in dlm_coordinates table!
Integration Challenges
Challenge 1: Coordinate System
# Need adapter function
def dlm_to_ircp_coordinates(dlm_coord) -> DLMCoordinates:
return DLMCoordinates(
x=dlm_coord.x,
y=dlm_coord.y,
z=dlm_coord.z,
t=dlm_coord.t,
depth=dlm_coord.depth_level,
sibling_count=dlm_coord.n_parts,
confidence=dlm_coord.confidence,
metadata={"sibling_index": dlm_coord.sibling_index}
)Challenge 2: ConversationGraph Structure
# IRCP approach
graph.edges = {parent_id: [child_ids]}
graph.reverse_edges = {child_id: parent_id}
# DLM approach
graph.root_ids = [root_ids]
# Has methods: get_children(), get_ancestors(), get_depth()### Challenge 3: Database Column Names
- IRCP: x_coord, y_coord, z_coord, t_coord
- DLM: x, y, z, t
Current Data Flow
IRCP Current:
Database → DatabaseLoader → ConversationGraph → ICPDataPoint
→ ICPDataset → DataLoader → Training Loop
Proposed with DLMDataLoader:
Database → DLMDataLoader → DLM ConversationGraph → Adapter
→ ICPDataPoint → ICPDataset → DataLoader → Training LoopIntegration Benefits
1. Unified coordinate caching (DLM has both coords + embeddings)
2. Context manager support (automatic cleanup)
3. Better logging integration
4. Reduced code duplication
5. Iterator pattern for memory efficiency
6. Flexible database schema handling
Adapter Layer Tasks
1. Convert DLMCoordinate → DLMCoordinates
2. Convert DLM ConversationGraph → IRCP ConversationGraph
3. Create ICPDataPoint from DLM nodes
4. Handle database schema differences
5. Provide statistics and validation
Testing Checklist
- [ ] Coordinate conversion preserves precision
- [ ] Embedding arrays identical between loaders
- [ ] Training loss curves within 1
- [ ] Backward compatibility with existing code
- [ ] Performance metrics documented
- [ ] Edge cases handled (missing data, schema variants)
Performance Targets
- Data loading: 10-20
- Memory: Reduced with iterator pattern
- Quality: No regression in training results
- Maintainability: 50
Key Files to Modify
1. Create: `packages/ircp/data/dlm_adapter.py`
2. Create: `packages/ircp/data/dlm_data_loader.py`
3. Optional: Modify `packages/ircp/training/icp_trainer.py` (for loader flexibility)
Example Integration Flow
# Step 1: Use DLMDataLoader
from dlm.core.data_loader import DLMDataLoader
with DLMDataLoader(db_path) as loader:
conv_ids = loader.get_conversation_ids()
graphs = list(loader.load_conversations(conv_ids[:100]))
# Step 2: Convert to IRCP format
from ircp.data.dlm_adapter import convert_dlm_to_icp_dataset
icp_data = convert_dlm_to_icp_dataset(graphs)
# Step 3: Train as normal
from ircp.training.icp_trainer import ICPTrainer
trainer = ICPTrainer(model, config)
results = trainer.train(icp_data[:80], icp_data[80:])Database Query Differences
Getting Coordinates
# IRCP expects
SELECT x_coord, y_coord, z_coord, t_coord FROM dlm_coordinates
# DLM expects
SELECT x, y, z, t FROM dlm_coordinatesGetting Conversations
# Both use same approach
SELECT conversation_id, total_messages FROM conversations
WHERE total_messages >= min_messages
ORDER BY total_messages DESCCompatibility Matrix
| Feature | IRCP | DLM | Compatibility |
|---|---|---|---|
| Conversation loading | Yes | Yes | 100 |
| Parallel loading | Yes | Yes | 100 |
| Embedding caching | Yes | Yes | 100 |
| Coordinate caching | No | Yes | Improvement |
| Context manager | No | Yes | Improvement |
| Iterator pattern | No | Yes | Improvement |
| Logging integration | Standard | Enhanced | Improvement |
Common Gotchas
1. Schema Mismatch: Database has DLM schema but IRCP expects different column names
- Solution: Check database and use appropriate loader
2. Coordinate Precision: DLM might have float vs int differences
- Solution: Ensure all conversions use consistent float type
3. Memory Issues: Large datasets with embedding caching
- Solution: Use DLMDataLoader iterator pattern and load_conversations()
4. Missing Metadata: is_linear field not in DLM coordinates
- Solution: Provide sensible defaults in adapter
5. Timing: Different timestamp formats possible
- Solution: Normalize to float (seconds since epoch) in adapter
Command Reference
Load Data (Current)
python -c "
from ircp.data.database_loader import ConversationDataLoader
loader = ConversationDataLoader('path/to/db.sqlite')
train, val, test = loader.load_training_data()
"Load Data (With DLMDataLoader)
python -c "
from dlm.core.data_loader import DLMDataLoader
from ircp.data.dlm_adapter import convert_dlm_to_icp_dataset
with DLMDataLoader('path/to/db.sqlite') as loader:
graphs = list(loader.load_conversations())
data = convert_dlm_to_icp_dataset(graphs)
"Next Steps
1. Implement adapter layer (Phase 1) - 2-3 hours
2. Test coordinate conversion - 1 hour
3. Test training equivalence - 2-3 hours
4. Create documentation - 1 hour
5. Performance benchmarking - 1-2 hours
Total Estimated Time: 7-10 hours
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/backend/cc-trajectory/legacy/cc-tpo-original/cc-tpo/docs/guides/IRCP_INTEGRATION_QUICK_REFERENCE.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture