DLM Core - Coordinate System
This module provides the foundational coordinate system that spatially represents conversation structures in a 5-dimensional space. It unifies the original DLM coordinate model with enhanced calculation methods from TPO's RCP system.
Full Public Reader
DLM Core - Coordinate System
Unified coordinate system for the Discourse Latent Manifold (DLM)
This module provides the foundational coordinate system that spatially represents conversation structures in a 5-dimensional space. It unifies the original DLM coordinate model with enhanced calculation methods from TPO's RCP system.
Overview
The DLM coordinate system maps each message in a conversation to a point in 5D space:
- x (depth): How deep the message is in the conversation tree
- y (sibling order): Position among siblings at the same depth
- z (homogeneity): Semantic/structural similarity to siblings
- t (temporal): Time-based ordering of messages
- n_parts (complexity): Structural complexity of the message content
Core Components
1. DLMCoordinate
The `DLMCoordinate` class represents a single point in the 5D conversation space.
Basic Fields
from dlm.core.coordinates import DLMCoordinate
# Create a basic coordinate
coord = DLMCoordinate(
x=1.0, # Depth level
y=0.5, # Sibling position
z=0.8, # Homogeneity score
t=0.3, # Temporal ordering
n_parts=3 # Message complexity
)Rich Metadata
DLMCoordinate also tracks structural metadata:
coord = DLMCoordinate(
x=2.0, y=1.0, z=0.7, t=0.5, n_parts=2,
# Tree structure
depth_level=2, # Integer depth
sibling_index=1, # Index among siblings
sibling_count=3, # Total siblings
# Quality metrics
homogeneity_score=0.75, # Raw homogeneity
confidence=0.95, # Calculation confidence
# Relationships
parent="msg_001", # Parent message ID
children=["msg_003", "msg_004"] # Child message IDs
)Distance Calculations
# Euclidean distance (3D spatial only: x, y, z)
distance = coord1.euclidean_distance_to(coord2)
# Full 5D distance (includes t and n_parts)
full_distance = coord1.distance_to(coord2)
# Cosine similarity
similarity = coord1.cosine_similarity_to(coord2)Conversion Methods
# To dictionary
coord_dict = coord.to_dict()
# To tensor (x, y, z, t, n_parts)
tensor = coord.to_tensor() # Returns torch.Tensor
# To numpy array
array = coord.to_numpy()
# Backward compatibility
from dlm.models.chain import ChainCoordinate
old_coord = coord.to_chain_coordinate() # DEPRECATED2. DLMCoordinateCalculator
The calculator computes coordinates for entire conversation trees.
Basic Usage
from dlm.core.coordinates import DLMCoordinateCalculator
# Initialize calculator
calc = DLMCoordinateCalculator(
normalize_coordinates=True, # Normalize to [0,1]
homogeneity_method="similarity_based", # or "variance_based"
use_cache=True # Cache embeddings
)
# Conversation tree structure
tree = {
"id": "msg_root",
"content": "What is machine learning?",
"children": [
{
"id": "msg_child1",
"content": "Machine learning is...",
"children": []
},
{
"id": "msg_child2",
"content": "Another perspective...",
"children": []
}
]
}
# Compute coordinates
coordinates = calc.compute_coordinates(tree)
# Returns: Dict[str, DLMCoordinate]Advanced Options
# With embeddings (for better homogeneity)
embeddings = {
"msg_root": np.array([0.1, 0.2, ...]),
"msg_child1": np.array([0.15, 0.18, ...]),
# ...
}
# With timestamps
timestamps = {
"msg_root": 1234567890.0,
"msg_child1": 1234567920.0,
}
coordinates = calc.compute_coordinates(
tree,
embeddings=embeddings,
timestamps=timestamps
)Configuration Options
calc = DLMCoordinateCalculator(
# Normalization
normalize_coordinates=True,
x_bounds=(0.0, 1.0),
y_bounds=(0.0, 1.0),
z_bounds=(0.0, 1.0),
# Homogeneity calculation
homogeneity_method="similarity_based", # "variance_based" or "similarity_based"
# Performance
use_cache=True,
batch_size=32,
# Temporal
normalize_temporal=True,
# Complexity
count_paragraphs=True,
count_sentences=True,
count_code_blocks=True
)3. DLMCoordinateValidator
Validates coordinate correctness and relationships.
Basic Validation
from dlm.core.coordinates import DLMCoordinateValidator
# Validate coordinate values
is_valid, errors = DLMCoordinateValidator.validate_coordinates(coordinates)
if not is_valid:
print(f"Validation errors: {errors}")Relationship Validation
# Validate tree structure and parent-child relationships
results = DLMCoordinateValidator.validate_relationships(coordinates)
print(f"Tree structure valid: {results['tree_structure']}")
print(f"Issues found: {results['issues']}")Individual Coordinate Validation
# Validate single coordinate
is_valid = DLMCoordinateValidator.validate_coordinate_values(coord)Migration Guide
From ChainCoordinate to DLMCoordinate
If you're migrating from the legacy `ChainCoordinate`:
# OLD WAY (deprecated)
from dlm.models.chain import ChainCoordinate
old_coord = ChainCoordinate(x=1, y=2, z=3, t=0.5, n_parts=2)
# NEW WAY (recommended)
from dlm.core.coordinates import DLMCoordinate
new_coord = DLMCoordinate(x=1, y=2, z=3, t=0.5, n_parts=2)Automatic Conversion
# Convert from old to new
new_coord = DLMCoordinate.from_chain_coordinate(old_coord)
# Convert from new to old (for backward compatibility)
old_coord = new_coord.to_chain_coordinate()Deprecation Timeline
- v0.9.x: `ChainCoordinate` emits deprecation warnings
- v1.0.0: `ChainCoordinate` will be removed
- Now: Migrate to `DLMCoordinate`
Complete Workflow Example
from dlm.core.coordinates import (
DLMCoordinate,
DLMCoordinateCalculator,
DLMCoordinateValidator
)
# 1. Create calculator
calc = DLMCoordinateCalculator(
normalize_coordinates=True,
homogeneity_method="similarity_based"
)
# 2. Define conversation tree
tree = {
"id": "msg_001",
"content": "What is recursion?",
"children": [
{
"id": "msg_002",
"content": "Recursion is when a function calls itself.",
"children": [
{
"id": "msg_003",
"content": "Can you give an example?",
"children": []
}
]
}
]
}
# 3. Compute coordinates
coordinates = calc.compute_coordinates(tree)
# 4. Validate
is_valid, errors = DLMCoordinateValidator.validate_coordinates(coordinates)
assert is_valid, f"Validation failed: {errors}"
# 5. Access coordinates
root_coord = coordinates["msg_001"]
print(f"Root depth: {root_coord.x}")
print(f"Root position: ({root_coord.x}, {root_coord.y}, {root_coord.z})")
# 6. Calculate distances
coord_002 = coordinates["msg_002"]
coord_003 = coordinates["msg_003"]
distance = coord_002.euclidean_distance_to(coord_003)
print(f"Distance between msg_002 and msg_003: {distance}")
# 7. Export for visualization
import json
coord_data = {msg_id: coord.to_dict() for msg_id, coord in coordinates.items()}
with open("coordinates.json", "w") as f:
json.dump(coord_data, f, indent=2)Understanding Coordinate Dimensions
X - Depth Coordinate
- Range: Typically 0 to max_depth
- Meaning: Position in conversation hierarchy
- Root message: x = 0
- Direct child: x = 1
- Grandchild: x = 2
# Depth increases with tree depth
assert coordinates["msg_001"].x == 0 # Root
assert coordinates["msg_002"].x == 1 # Child
assert coordinates["msg_003"].x == 2 # GrandchildY - Sibling Order Coordinate
- Range: 0 to 1 (normalized) or 0 to sibling_count-1
- Meaning: Position among siblings at same depth
- First sibling: y = 0
- Last sibling: y = sibling_count - 1
# Siblings have different y values
tree_with_siblings = {
"id": "root",
"children": [
{"id": "child1", "children": []}, # y = 0
{"id": "child2", "children": []}, # y = 1
{"id": "child3", "children": []}, # y = 2
]
}Z - Homogeneity Coordinate
- Range: 0 to 1
- Meaning: Semantic similarity to siblings
- High z: Message is similar to its siblings
- Low z: Message is unique/diverse
# Calculation methods
calc_similarity = DLMCoordinateCalculator(homogeneity_method="similarity_based")
calc_variance = DLMCoordinateCalculator(homogeneity_method="variance_based")
# similarity_based: Uses embeddings and cosine similarity
# variance_based: Uses statistical variance in coordinatesT - Temporal Coordinate
- Range: 0 to 1 (normalized)
- Meaning: Time-based ordering of messages
- Earlier messages: Lower t values
- Later messages: Higher t values
# With explicit timestamps
timestamps = {
"msg_001": 1609459200.0, # 2021-01-01 00:00:00
"msg_002": 1609459260.0, # 2021-01-01 00:01:00
}
coordinates = calc.compute_coordinates(tree, timestamps=timestamps)
# Without timestamps, uses global message indexN_parts - Complexity Coordinate
- Range: 0 to max_complexity (integer)
- Meaning: Structural complexity of message
- Components counted:
- Paragraphs
- Sentences
- Code blocks
- List items
simple_message = "Hello" # n_parts = 1
complex_message = """
Paragraph 1.
Paragraph 2 with multiple sentences. Here's another.
def example():
pass
""" # n_parts = 6+ (paragraphs + sentences + code blocks)Performance Optimization
Caching
# Enable caching for better performance
calc = DLMCoordinateCalculator(use_cache=True)
# Cached operations:
# - Embeddings
# - Similarity calculations
# - Intermediate coordinate calculationsBatch Processing
# Process multiple trees efficiently
trees = [tree1, tree2, tree3]
all_coordinates = {}
for i, tree in enumerate(trees):
coords = calc.compute_coordinates(tree)
all_coordinates[f"tree_{i}"] = coordsMemory Management
# Clear cache when needed
calc.clear_cache()
# Use smaller batch sizes for large trees
calc = DLMCoordinateCalculator(batch_size=16)Type Safety
All classes use Pydantic for runtime validation:
from pydantic import ValidationError
try:
# This will raise validation error
bad_coord = DLMCoordinate(x="not a number")
except ValidationError as e:
print(f"Invalid coordinate: {e}")
# Type hints are fully supported
coordinates: Dict[str, DLMCoordinate] = calc.compute_coordinates(tree)API Reference
DLMCoordinate
Fields:
- `x: float` - Depth coordinate
- `y: float` - Sibling order coordinate
- `z: float` - Homogeneity coordinate
- `t: float` - Temporal coordinate
- `n_parts: int` - Complexity measure
- `depth_level: int` - Integer depth
- `sibling_index: int` - Index among siblings
- `sibling_count: int` - Total sibling count
- `homogeneity_score: float` - Raw homogeneity
- `confidence: float` - Calculation confidence
- `parent: Optional[str]` - Parent message ID
- `children: List[str]` - Child message IDs
Methods:
- `distance_to(other) -> float` - Full 5D distance
- `euclidean_distance_to(other) -> float` - 3D spatial distance
- `cosine_similarity_to(other) -> float` - Cosine similarity
- `to_dict() -> Dict` - Convert to dictionary
- `to_tensor() -> torch.Tensor` - Convert to tensor
- `to_numpy() -> np.ndarray` - Convert to numpy array
- `from_chain_coordinate(coord) -> DLMCoordinate` - Convert from legacy
- `to_chain_coordinate() -> ChainCoordinate` - Convert to legacy (deprecated)
DLMCoordinateCalculator
Constructor Parameters:
- `normalize_coordinates: bool = True` - Normalize to [0,1]
- `homogeneity_method: str = "similarity_based"` - Homogeneity calculation
- `use_cache: bool = True` - Enable caching
- `batch_size: int = 32` - Batch processing size
- `x_bounds, y_bounds, z_bounds: Tuple[float, float]` - Normalization bounds
Methods:
- `compute_coordinates(tree, embeddings=None, timestamps=None) -> Dict[str, DLMCoordinate]`
- `clear_cache()` - Clear internal caches
DLMCoordinateValidator
Static Methods:
- `validate_coordinates(coords) -> Tuple[bool, List[str]]` - Validate all coordinates
- `validate_relationships(coords) -> Dict` - Validate tree structure
- `validate_coordinate_values(coord) -> bool` - Validate single coordinate
Embedding System
IRCPEmbedder
Production-ready IRCP embedding provider with automatic caching and batch processing.
Features
- Automatic Caching: Configurable TTL-based caching for improved performance
- Efficient Batch Processing: Optimized batch embedding generation
- Coordinate Prediction: IRCP-specific 4D coordinate prediction
- Response Pattern Prediction: User response pattern analysis
- Confidence Estimation: Prediction confidence scores
- Fallback Mode: Graceful degradation when IRCP model unavailable
Basic Usage
from dlm.core.embeddings import IRCPEmbedder
# Create embedder with caching
embedder = IRCPEmbedder(
cache_capacity=512,
cache_ttl=3600, # 1 hour
batch_size=32,
enable_caching=True
)
# Single embedding
embedding = embedder.generate_embeddings("Hello world")
# Returns: np.ndarray of shape (384,)
# Batch embeddings
embeddings = embedder.generate_embeddings(["Hi", "Hello", "Hey"])
# Returns: List[np.ndarray], each of shape (384,)IRCP-Specific Features
# Predict IRCP coordinates
coords = embedder.predict_coordinates("Hello world")
# Returns: np.ndarray of shape (4,) - (x, y, z, t) coordinates
# Predict response patterns
patterns = embedder.predict_response_patterns("Hello world")
# Returns: np.ndarray of shape (384,)
# Estimate confidence
confidence = embedder.estimate_confidence("Hello world")
# Returns: float in [0, 1]
# Get all predictions at once (efficient)
results = embedder.predict_all("Hello world")
# Returns: dict with keys: embeddings, coordinates, response_patterns, confidenceLoading Trained Models
from pathlib import Path
# Load from checkpoint
embedder = IRCPEmbedder(
model_path=Path("training/ircp/full_dataset/best_model.pt"),
config_path=Path("training/ircp/full_dataset/inferred_config.json"),
enable_caching=True
)
# Or create with custom config
embedder = IRCPEmbedder(
model_name="sentence-transformers/all-MiniLM-L6-v2",
coordinate_dim=4,
hidden_dim=512,
dropout=0.1,
freeze_encoder=True
)Caching Behavior
embedder = IRCPEmbedder(enable_caching=True)
# First call - cache miss
emb1 = embedder.generate_embeddings("test")
# Second call - cache hit (instant)
emb2 = embedder.generate_embeddings("test")
# Check cache statistics
stats = embedder.get_cache_stats()
print(f"Cache hits: {stats['hits']}, misses: {stats['misses']}")
print(f"Hit rate: {stats['hit_rate']:.2%}")Batch Processing
# Efficient batch processing
texts = ["Message 1", "Message 2", "Message 3", ...]
embeddings = embedder.generate_embeddings(texts)
# Batch IRCP predictions
results = embedder.predict_all(texts)
# results["coordinates"] is a list of 4D arrays
# results["confidence"] is a list of floatsMigration from dlm.engine.ircp_embedder
Old way (deprecated):
from dlm.engine.ircp_embedder import IRCPEmbeddingEngine
engine = IRCPEmbeddingEngine(
model_path="path/to/model.pt",
cache_embeddings=True
)
emb = engine.generate_embedding("text", message_id="msg_001")New way (recommended):
from dlm.core.embeddings import IRCPEmbedder
embedder = IRCPEmbedder(
model_path="path/to/model.pt",
enable_caching=True
)
emb = embedder.generate_embeddings("text")Performance Tips
1. Enable Caching: For repeated embeddings, caching provides ~100x speedup
2. Batch Processing: Process multiple texts at once for 3-5x speedup
3. Cache Tuning: Adjust `cache_capacity` based on your working set size
4. Device Selection: Use `device="cuda"` for GPU acceleration if available
# Optimized for production
embedder = IRCPEmbedder(
model_path="models/best_model.pt",
enable_caching=True,
cache_capacity=1024, # Larger cache for production
batch_size=64, # Larger batches for throughput
device="cuda", # GPU if available
)IRCP Theory Modules
Advanced IRCP components are available in `dlm.core.ircp`:
from dlm.core.ircp import (
InverseAttentionMechanism,
MeasurePreservingTransform,
RingTopology,
IRCP_AVAILABLE, # Flag indicating if IRCP package is loaded
)
# Check IRCP availability
if IRCP_AVAILABLE:
# Use IRCP-specific features
attention = InverseAttentionMechanism(hidden_dim=384, num_heads=8)
else:
# Fallback or warning
print("IRCP package not available")Testing
Run the test suite:
# Run all coordinate tests
pytest packages/dlm/core/tests/test_coordinates.py -v
# Run all embedding tests
pytest packages/dlm/core/tests/test_embeddings.py -v
# Run specific test class
pytest packages/dlm/core/tests/test_coordinates.py::TestDLMCoordinate -v
# Run all core tests
pytest packages/dlm/core/tests/ -v
# Run with coverage
pytest packages/dlm/core/tests/ --cov=dlm.coreRelated Documentation
- [Phase 2.1: Coordinate Unification](../../../PHASE_2_1_COORDINATES.md) - Implementation plan
- [Phase 2.2: Embedding Integration](../../../PHASE_2_2_EMBEDDINGS.md) - Embedding integration plan
- [Integration Strategy](../../../DLM_FUSION_STRATEGY.md) - Overall fusion strategy
- [Response Module](../response/README.md) - Enhanced response generation
License
Part of the Computational Choreography DLM package.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/backend/cc-trajectory/legacy/cc-tpo-original/cc-tpo/packages/dlm/core/README.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture