Grand Diomande Research · Full HTML Reader

DLM Core - Coordinate System

This module provides the foundational coordinate system that spatially represents conversation structures in a 5-dimensional space. It unifies the original DLM coordinate model with enhanced calculation methods from TPO's RCP system.

Agents That Account for Themselves proposal experiment writeup candidate score 32 .md

Full Public Reader

DLM Core - Coordinate System

Unified coordinate system for the Discourse Latent Manifold (DLM)

This module provides the foundational coordinate system that spatially represents conversation structures in a 5-dimensional space. It unifies the original DLM coordinate model with enhanced calculation methods from TPO's RCP system.

Overview

The DLM coordinate system maps each message in a conversation to a point in 5D space:

  • x (depth): How deep the message is in the conversation tree
  • y (sibling order): Position among siblings at the same depth
  • z (homogeneity): Semantic/structural similarity to siblings
  • t (temporal): Time-based ordering of messages
  • n_parts (complexity): Structural complexity of the message content

Core Components

1. DLMCoordinate

The `DLMCoordinate` class represents a single point in the 5D conversation space.

Basic Fields

python
from dlm.core.coordinates import DLMCoordinate

# Create a basic coordinate
coord = DLMCoordinate(
    x=1.0,        # Depth level
    y=0.5,        # Sibling position
    z=0.8,        # Homogeneity score
    t=0.3,        # Temporal ordering
    n_parts=3     # Message complexity
)

Rich Metadata

DLMCoordinate also tracks structural metadata:

python
coord = DLMCoordinate(
    x=2.0, y=1.0, z=0.7, t=0.5, n_parts=2,

    # Tree structure
    depth_level=2,           # Integer depth
    sibling_index=1,         # Index among siblings
    sibling_count=3,         # Total siblings

    # Quality metrics
    homogeneity_score=0.75,  # Raw homogeneity
    confidence=0.95,         # Calculation confidence

    # Relationships
    parent="msg_001",                    # Parent message ID
    children=["msg_003", "msg_004"]      # Child message IDs
)

Distance Calculations

python
# Euclidean distance (3D spatial only: x, y, z)
distance = coord1.euclidean_distance_to(coord2)

# Full 5D distance (includes t and n_parts)
full_distance = coord1.distance_to(coord2)

# Cosine similarity
similarity = coord1.cosine_similarity_to(coord2)

Conversion Methods

python
# To dictionary
coord_dict = coord.to_dict()

# To tensor (x, y, z, t, n_parts)
tensor = coord.to_tensor()  # Returns torch.Tensor

# To numpy array
array = coord.to_numpy()

# Backward compatibility
from dlm.models.chain import ChainCoordinate
old_coord = coord.to_chain_coordinate()  # DEPRECATED

2. DLMCoordinateCalculator

The calculator computes coordinates for entire conversation trees.

Basic Usage

python
from dlm.core.coordinates import DLMCoordinateCalculator

# Initialize calculator
calc = DLMCoordinateCalculator(
    normalize_coordinates=True,     # Normalize to [0,1]
    homogeneity_method="similarity_based",  # or "variance_based"
    use_cache=True                  # Cache embeddings
)

# Conversation tree structure
tree = {
    "id": "msg_root",
    "content": "What is machine learning?",
    "children": [
        {
            "id": "msg_child1",
            "content": "Machine learning is...",
            "children": []
        },
        {
            "id": "msg_child2",
            "content": "Another perspective...",
            "children": []
        }
    ]
}

# Compute coordinates
coordinates = calc.compute_coordinates(tree)
# Returns: Dict[str, DLMCoordinate]

Advanced Options

python
# With embeddings (for better homogeneity)
embeddings = {
    "msg_root": np.array([0.1, 0.2, ...]),
    "msg_child1": np.array([0.15, 0.18, ...]),
    # ...
}

# With timestamps
timestamps = {
    "msg_root": 1234567890.0,
    "msg_child1": 1234567920.0,
}

coordinates = calc.compute_coordinates(
    tree,
    embeddings=embeddings,
    timestamps=timestamps
)

Configuration Options

python
calc = DLMCoordinateCalculator(
    # Normalization
    normalize_coordinates=True,
    x_bounds=(0.0, 1.0),
    y_bounds=(0.0, 1.0),
    z_bounds=(0.0, 1.0),

    # Homogeneity calculation
    homogeneity_method="similarity_based",  # "variance_based" or "similarity_based"

    # Performance
    use_cache=True,
    batch_size=32,

    # Temporal
    normalize_temporal=True,

    # Complexity
    count_paragraphs=True,
    count_sentences=True,
    count_code_blocks=True
)

3. DLMCoordinateValidator

Validates coordinate correctness and relationships.

Basic Validation

python
from dlm.core.coordinates import DLMCoordinateValidator

# Validate coordinate values
is_valid, errors = DLMCoordinateValidator.validate_coordinates(coordinates)

if not is_valid:
    print(f"Validation errors: {errors}")

Relationship Validation

python
# Validate tree structure and parent-child relationships
results = DLMCoordinateValidator.validate_relationships(coordinates)

print(f"Tree structure valid: {results['tree_structure']}")
print(f"Issues found: {results['issues']}")

Individual Coordinate Validation

python
# Validate single coordinate
is_valid = DLMCoordinateValidator.validate_coordinate_values(coord)

Migration Guide

From ChainCoordinate to DLMCoordinate

If you're migrating from the legacy `ChainCoordinate`:

python
# OLD WAY (deprecated)
from dlm.models.chain import ChainCoordinate
old_coord = ChainCoordinate(x=1, y=2, z=3, t=0.5, n_parts=2)

# NEW WAY (recommended)
from dlm.core.coordinates import DLMCoordinate
new_coord = DLMCoordinate(x=1, y=2, z=3, t=0.5, n_parts=2)

Automatic Conversion

python
# Convert from old to new
new_coord = DLMCoordinate.from_chain_coordinate(old_coord)

# Convert from new to old (for backward compatibility)
old_coord = new_coord.to_chain_coordinate()

Deprecation Timeline

  • v0.9.x: `ChainCoordinate` emits deprecation warnings
  • v1.0.0: `ChainCoordinate` will be removed
  • Now: Migrate to `DLMCoordinate`

Complete Workflow Example

python
from dlm.core.coordinates import (
    DLMCoordinate,
    DLMCoordinateCalculator,
    DLMCoordinateValidator
)

# 1. Create calculator
calc = DLMCoordinateCalculator(
    normalize_coordinates=True,
    homogeneity_method="similarity_based"
)

# 2. Define conversation tree
tree = {
    "id": "msg_001",
    "content": "What is recursion?",
    "children": [
        {
            "id": "msg_002",
            "content": "Recursion is when a function calls itself.",
            "children": [
                {
                    "id": "msg_003",
                    "content": "Can you give an example?",
                    "children": []
                }
            ]
        }
    ]
}

# 3. Compute coordinates
coordinates = calc.compute_coordinates(tree)

# 4. Validate
is_valid, errors = DLMCoordinateValidator.validate_coordinates(coordinates)
assert is_valid, f"Validation failed: {errors}"

# 5. Access coordinates
root_coord = coordinates["msg_001"]
print(f"Root depth: {root_coord.x}")
print(f"Root position: ({root_coord.x}, {root_coord.y}, {root_coord.z})")

# 6. Calculate distances
coord_002 = coordinates["msg_002"]
coord_003 = coordinates["msg_003"]
distance = coord_002.euclidean_distance_to(coord_003)
print(f"Distance between msg_002 and msg_003: {distance}")

# 7. Export for visualization
import json
coord_data = {msg_id: coord.to_dict() for msg_id, coord in coordinates.items()}
with open("coordinates.json", "w") as f:
    json.dump(coord_data, f, indent=2)

Understanding Coordinate Dimensions

X - Depth Coordinate

  • Range: Typically 0 to max_depth
  • Meaning: Position in conversation hierarchy
  • Root message: x = 0
  • Direct child: x = 1
  • Grandchild: x = 2
python
# Depth increases with tree depth
assert coordinates["msg_001"].x == 0  # Root
assert coordinates["msg_002"].x == 1  # Child
assert coordinates["msg_003"].x == 2  # Grandchild

Y - Sibling Order Coordinate

  • Range: 0 to 1 (normalized) or 0 to sibling_count-1
  • Meaning: Position among siblings at same depth
  • First sibling: y = 0
  • Last sibling: y = sibling_count - 1
python
# Siblings have different y values
tree_with_siblings = {
    "id": "root",
    "children": [
        {"id": "child1", "children": []},  # y = 0
        {"id": "child2", "children": []},  # y = 1
        {"id": "child3", "children": []},  # y = 2
    ]
}

Z - Homogeneity Coordinate

  • Range: 0 to 1
  • Meaning: Semantic similarity to siblings
  • High z: Message is similar to its siblings
  • Low z: Message is unique/diverse
python
# Calculation methods
calc_similarity = DLMCoordinateCalculator(homogeneity_method="similarity_based")
calc_variance = DLMCoordinateCalculator(homogeneity_method="variance_based")

# similarity_based: Uses embeddings and cosine similarity
# variance_based: Uses statistical variance in coordinates

T - Temporal Coordinate

  • Range: 0 to 1 (normalized)
  • Meaning: Time-based ordering of messages
  • Earlier messages: Lower t values
  • Later messages: Higher t values
python
# With explicit timestamps
timestamps = {
    "msg_001": 1609459200.0,  # 2021-01-01 00:00:00
    "msg_002": 1609459260.0,  # 2021-01-01 00:01:00
}

coordinates = calc.compute_coordinates(tree, timestamps=timestamps)

# Without timestamps, uses global message index

N_parts - Complexity Coordinate

  • Range: 0 to max_complexity (integer)
  • Meaning: Structural complexity of message
  • Components counted:
  • Paragraphs
  • Sentences
  • Code blocks
  • List items
python
simple_message = "Hello"  # n_parts = 1
complex_message = """
Paragraph 1.

Paragraph 2 with multiple sentences. Here's another.

def example():
pass

"""  # n_parts = 6+ (paragraphs + sentences + code blocks)

Performance Optimization

Caching

python
# Enable caching for better performance
calc = DLMCoordinateCalculator(use_cache=True)

# Cached operations:
# - Embeddings
# - Similarity calculations
# - Intermediate coordinate calculations

Batch Processing

python
# Process multiple trees efficiently
trees = [tree1, tree2, tree3]

all_coordinates = {}
for i, tree in enumerate(trees):
    coords = calc.compute_coordinates(tree)
    all_coordinates[f"tree_{i}"] = coords

Memory Management

python
# Clear cache when needed
calc.clear_cache()

# Use smaller batch sizes for large trees
calc = DLMCoordinateCalculator(batch_size=16)

Type Safety

All classes use Pydantic for runtime validation:

python
from pydantic import ValidationError

try:
    # This will raise validation error
    bad_coord = DLMCoordinate(x="not a number")
except ValidationError as e:
    print(f"Invalid coordinate: {e}")

# Type hints are fully supported
coordinates: Dict[str, DLMCoordinate] = calc.compute_coordinates(tree)

API Reference

DLMCoordinate

Fields:
- `x: float` - Depth coordinate
- `y: float` - Sibling order coordinate
- `z: float` - Homogeneity coordinate
- `t: float` - Temporal coordinate
- `n_parts: int` - Complexity measure
- `depth_level: int` - Integer depth
- `sibling_index: int` - Index among siblings
- `sibling_count: int` - Total sibling count
- `homogeneity_score: float` - Raw homogeneity
- `confidence: float` - Calculation confidence
- `parent: Optional[str]` - Parent message ID
- `children: List[str]` - Child message IDs

Methods:
- `distance_to(other) -> float` - Full 5D distance
- `euclidean_distance_to(other) -> float` - 3D spatial distance
- `cosine_similarity_to(other) -> float` - Cosine similarity
- `to_dict() -> Dict` - Convert to dictionary
- `to_tensor() -> torch.Tensor` - Convert to tensor
- `to_numpy() -> np.ndarray` - Convert to numpy array
- `from_chain_coordinate(coord) -> DLMCoordinate` - Convert from legacy
- `to_chain_coordinate() -> ChainCoordinate` - Convert to legacy (deprecated)

DLMCoordinateCalculator

Constructor Parameters:
- `normalize_coordinates: bool = True` - Normalize to [0,1]
- `homogeneity_method: str = "similarity_based"` - Homogeneity calculation
- `use_cache: bool = True` - Enable caching
- `batch_size: int = 32` - Batch processing size
- `x_bounds, y_bounds, z_bounds: Tuple[float, float]` - Normalization bounds

Methods:
- `compute_coordinates(tree, embeddings=None, timestamps=None) -> Dict[str, DLMCoordinate]`
- `clear_cache()` - Clear internal caches

DLMCoordinateValidator

Static Methods:
- `validate_coordinates(coords) -> Tuple[bool, List[str]]` - Validate all coordinates
- `validate_relationships(coords) -> Dict` - Validate tree structure
- `validate_coordinate_values(coord) -> bool` - Validate single coordinate

Embedding System

IRCPEmbedder

Production-ready IRCP embedding provider with automatic caching and batch processing.

Features

  • Automatic Caching: Configurable TTL-based caching for improved performance
  • Efficient Batch Processing: Optimized batch embedding generation
  • Coordinate Prediction: IRCP-specific 4D coordinate prediction
  • Response Pattern Prediction: User response pattern analysis
  • Confidence Estimation: Prediction confidence scores
  • Fallback Mode: Graceful degradation when IRCP model unavailable

Basic Usage

python
from dlm.core.embeddings import IRCPEmbedder

# Create embedder with caching
embedder = IRCPEmbedder(
    cache_capacity=512,
    cache_ttl=3600,  # 1 hour
    batch_size=32,
    enable_caching=True
)

# Single embedding
embedding = embedder.generate_embeddings("Hello world")
# Returns: np.ndarray of shape (384,)

# Batch embeddings
embeddings = embedder.generate_embeddings(["Hi", "Hello", "Hey"])
# Returns: List[np.ndarray], each of shape (384,)

IRCP-Specific Features

python
# Predict IRCP coordinates
coords = embedder.predict_coordinates("Hello world")
# Returns: np.ndarray of shape (4,) - (x, y, z, t) coordinates

# Predict response patterns
patterns = embedder.predict_response_patterns("Hello world")
# Returns: np.ndarray of shape (384,)

# Estimate confidence
confidence = embedder.estimate_confidence("Hello world")
# Returns: float in [0, 1]

# Get all predictions at once (efficient)
results = embedder.predict_all("Hello world")
# Returns: dict with keys: embeddings, coordinates, response_patterns, confidence

Loading Trained Models

python
from pathlib import Path

# Load from checkpoint
embedder = IRCPEmbedder(
    model_path=Path("training/ircp/full_dataset/best_model.pt"),
    config_path=Path("training/ircp/full_dataset/inferred_config.json"),
    enable_caching=True
)

# Or create with custom config
embedder = IRCPEmbedder(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    coordinate_dim=4,
    hidden_dim=512,
    dropout=0.1,
    freeze_encoder=True
)

Caching Behavior

python
embedder = IRCPEmbedder(enable_caching=True)

# First call - cache miss
emb1 = embedder.generate_embeddings("test")

# Second call - cache hit (instant)
emb2 = embedder.generate_embeddings("test")

# Check cache statistics
stats = embedder.get_cache_stats()
print(f"Cache hits: {stats['hits']}, misses: {stats['misses']}")
print(f"Hit rate: {stats['hit_rate']:.2%}")

Batch Processing

python
# Efficient batch processing
texts = ["Message 1", "Message 2", "Message 3", ...]
embeddings = embedder.generate_embeddings(texts)

# Batch IRCP predictions
results = embedder.predict_all(texts)
# results["coordinates"] is a list of 4D arrays
# results["confidence"] is a list of floats

Migration from dlm.engine.ircp_embedder

Old way (deprecated):

python
from dlm.engine.ircp_embedder import IRCPEmbeddingEngine

engine = IRCPEmbeddingEngine(
    model_path="path/to/model.pt",
    cache_embeddings=True
)
emb = engine.generate_embedding("text", message_id="msg_001")

New way (recommended):

python
from dlm.core.embeddings import IRCPEmbedder

embedder = IRCPEmbedder(
    model_path="path/to/model.pt",
    enable_caching=True
)
emb = embedder.generate_embeddings("text")

Performance Tips

1. Enable Caching: For repeated embeddings, caching provides ~100x speedup
2. Batch Processing: Process multiple texts at once for 3-5x speedup
3. Cache Tuning: Adjust `cache_capacity` based on your working set size
4. Device Selection: Use `device="cuda"` for GPU acceleration if available

python
# Optimized for production
embedder = IRCPEmbedder(
    model_path="models/best_model.pt",
    enable_caching=True,
    cache_capacity=1024,  # Larger cache for production
    batch_size=64,  # Larger batches for throughput
    device="cuda",  # GPU if available
)

IRCP Theory Modules

Advanced IRCP components are available in `dlm.core.ircp`:

python
from dlm.core.ircp import (
    InverseAttentionMechanism,
    MeasurePreservingTransform,
    RingTopology,
    IRCP_AVAILABLE,  # Flag indicating if IRCP package is loaded
)

# Check IRCP availability
if IRCP_AVAILABLE:
    # Use IRCP-specific features
    attention = InverseAttentionMechanism(hidden_dim=384, num_heads=8)
else:
    # Fallback or warning
    print("IRCP package not available")

Testing

Run the test suite:

bash
# Run all coordinate tests
pytest packages/dlm/core/tests/test_coordinates.py -v

# Run all embedding tests
pytest packages/dlm/core/tests/test_embeddings.py -v

# Run specific test class
pytest packages/dlm/core/tests/test_coordinates.py::TestDLMCoordinate -v

# Run all core tests
pytest packages/dlm/core/tests/ -v

# Run with coverage
pytest packages/dlm/core/tests/ --cov=dlm.core

Related Documentation

  • [Phase 2.1: Coordinate Unification](../../../PHASE_2_1_COORDINATES.md) - Implementation plan
  • [Phase 2.2: Embedding Integration](../../../PHASE_2_2_EMBEDDINGS.md) - Embedding integration plan
  • [Integration Strategy](../../../DLM_FUSION_STRATEGY.md) - Overall fusion strategy
  • [Response Module](../response/README.md) - Enhanced response generation

License

Part of the Computational Choreography DLM package.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/backend/cc-trajectory/legacy/cc-tpo-original/cc-tpo/packages/dlm/core/README.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture