Grand Diomande Research · Full HTML Reader

DLM Core - Coordinate System

This module provides the foundational coordinate system that spatially represents conversation structures in a 5-dimensional space. It unifies the original DLM coordinate model with enhanced calculation methods from TPO's RCP system.

Agents That Account for Themselves proposal experiment writeup candidate score 32 .md

Full Public Reader

DLM Core - Coordinate System

Unified coordinate system for the Discourse Latent Manifold (DLM)

Overview

The DLM coordinate system maps each message in a conversation to a point in 5D space:

x (depth): How deep the message is in the conversation tree
y (sibling order): Position among siblings at the same depth
z (homogeneity): Semantic/structural similarity to siblings
t (temporal): Time-based ordering of messages
n_parts (complexity): Structural complexity of the message content

Core Components

1. DLMCoordinate

The `DLMCoordinate` class represents a single point in the 5D conversation space.

Basic Fields

python

from dlm.core.coordinates import DLMCoordinate

# Create a basic coordinate
coord = DLMCoordinate(
    x=1.0,        # Depth level
    y=0.5,        # Sibling position
    z=0.8,        # Homogeneity score
    t=0.3,        # Temporal ordering
    n_parts=3     # Message complexity
)

Rich Metadata

DLMCoordinate also tracks structural metadata:

python

coord = DLMCoordinate(
    x=2.0, y=1.0, z=0.7, t=0.5, n_parts=2,

    # Tree structure
    depth_level=2,           # Integer depth
    sibling_index=1,         # Index among siblings
    sibling_count=3,         # Total siblings

    # Quality metrics
    homogeneity_score=0.75,  # Raw homogeneity
    confidence=0.95,         # Calculation confidence

    # Relationships
    parent="msg_001",                    # Parent message ID
    children=["msg_003", "msg_004"]      # Child message IDs
)

Distance Calculations

python

# Euclidean distance (3D spatial only: x, y, z)
distance = coord1.euclidean_distance_to(coord2)

# Full 5D distance (includes t and n_parts)
full_distance = coord1.distance_to(coord2)

# Cosine similarity
similarity = coord1.cosine_similarity_to(coord2)

Conversion Methods

python

# To dictionary
coord_dict = coord.to_dict()

# To tensor (x, y, z, t, n_parts)
tensor = coord.to_tensor()  # Returns torch.Tensor

# To numpy array
array = coord.to_numpy()

# Backward compatibility
from dlm.models.chain import ChainCoordinate
old_coord = coord.to_chain_coordinate()  # DEPRECATED

2. DLMCoordinateCalculator

The calculator computes coordinates for entire conversation trees.

Basic Usage

python

from dlm.core.coordinates import DLMCoordinateCalculator

# Initialize calculator
calc = DLMCoordinateCalculator(
    normalize_coordinates=True,     # Normalize to [0,1]
    homogeneity_method="similarity_based",  # or "variance_based"
    use_cache=True                  # Cache embeddings
)

# Conversation tree structure
tree = {
    "id": "msg_root",
    "content": "What is machine learning?",
    "children": [
        {
            "id": "msg_child1",
            "content": "Machine learning is...",
            "children": []
        },
        {
            "id": "msg_child2",
            "content": "Another perspective...",
            "children": []
        }
    ]
}

# Compute coordinates
coordinates = calc.compute_coordinates(tree)
# Returns: Dict[str, DLMCoordinate]

Advanced Options

python

# With embeddings (for better homogeneity)
embeddings = {
    "msg_root": np.array([0.1, 0.2, ...]),
    "msg_child1": np.array([0.15, 0.18, ...]),
    # ...
}

# With timestamps
timestamps = {
    "msg_root": 1234567890.0,
    "msg_child1": 1234567920.0,
}

coordinates = calc.compute_coordinates(
    tree,
    embeddings=embeddings,
    timestamps=timestamps
)

Configuration Options

python

calc = DLMCoordinateCalculator(
    # Normalization
    normalize_coordinates=True,
    x_bounds=(0.0, 1.0),
    y_bounds=(0.0, 1.0),
    z_bounds=(0.0, 1.0),

    # Homogeneity calculation
    homogeneity_method="similarity_based",  # "variance_based" or "similarity_based"

    # Performance
    use_cache=True,
    batch_size=32,

    # Temporal
    normalize_temporal=True,

    # Complexity
    count_paragraphs=True,
    count_sentences=True,
    count_code_blocks=True
)

3. DLMCoordinateValidator

Validates coordinate correctness and relationships.

Basic Validation

python

from dlm.core.coordinates import DLMCoordinateValidator

# Validate coordinate values
is_valid, errors = DLMCoordinateValidator.validate_coordinates(coordinates)

if not is_valid:
    print(f"Validation errors: {errors}")

Relationship Validation

python

# Validate tree structure and parent-child relationships
results = DLMCoordinateValidator.validate_relationships(coordinates)

print(f"Tree structure valid: {results['tree_structure']}")
print(f"Issues found: {results['issues']}")

Individual Coordinate Validation

python

# Validate single coordinate
is_valid = DLMCoordinateValidator.validate_coordinate_values(coord)

Migration Guide

From ChainCoordinate to DLMCoordinate

If you're migrating from the legacy `ChainCoordinate`:

python

# OLD WAY (deprecated)
from dlm.models.chain import ChainCoordinate
old_coord = ChainCoordinate(x=1, y=2, z=3, t=0.5, n_parts=2)

# NEW WAY (recommended)
from dlm.core.coordinates import DLMCoordinate
new_coord = DLMCoordinate(x=1, y=2, z=3, t=0.5, n_parts=2)

Automatic Conversion

python

# Convert from old to new
new_coord = DLMCoordinate.from_chain_coordinate(old_coord)

# Convert from new to old (for backward compatibility)
old_coord = new_coord.to_chain_coordinate()

Deprecation Timeline

v0.9.x: `ChainCoordinate` emits deprecation warnings
v1.0.0: `ChainCoordinate` will be removed
Now: Migrate to `DLMCoordinate`

Complete Workflow Example

python

from dlm.core.coordinates import (
    DLMCoordinate,
    DLMCoordinateCalculator,
    DLMCoordinateValidator
)

# 1. Create calculator
calc = DLMCoordinateCalculator(
    normalize_coordinates=True,
    homogeneity_method="similarity_based"
)

# 2. Define conversation tree
tree = {
    "id": "msg_001",
    "content": "What is recursion?",
    "children": [
        {
            "id": "msg_002",
            "content": "Recursion is when a function calls itself.",
            "children": [
                {
                    "id": "msg_003",
                    "content": "Can you give an example?",
                    "children": []
                }
            ]
        }
    ]
}

# 3. Compute coordinates
coordinates = calc.compute_coordinates(tree)

# 4. Validate
is_valid, errors = DLMCoordinateValidator.validate_coordinates(coordinates)
assert is_valid, f"Validation failed: {errors}"

# 5. Access coordinates
root_coord = coordinates["msg_001"]
print(f"Root depth: {root_coord.x}")
print(f"Root position: ({root_coord.x}, {root_coord.y}, {root_coord.z})")

# 6. Calculate distances
coord_002 = coordinates["msg_002"]
coord_003 = coordinates["msg_003"]
distance = coord_002.euclidean_distance_to(coord_003)
print(f"Distance between msg_002 and msg_003: {distance}")

# 7. Export for visualization
import json
coord_data = {msg_id: coord.to_dict() for msg_id, coord in coordinates.items()}
with open("coordinates.json", "w") as f:
    json.dump(coord_data, f, indent=2)

Understanding Coordinate Dimensions

X - Depth Coordinate

Range: Typically 0 to max_depth
Meaning: Position in conversation hierarchy
Root message: x = 0
Direct child: x = 1
Grandchild: x = 2

python

# Depth increases with tree depth
assert coordinates["msg_001"].x == 0  # Root
assert coordinates["msg_002"].x == 1  # Child
assert coordinates["msg_003"].x == 2  # Grandchild

Y - Sibling Order Coordinate

Range: 0 to 1 (normalized) or 0 to sibling_count-1
Meaning: Position among siblings at same depth
First sibling: y = 0
Last sibling: y = sibling_count - 1

python

# Siblings have different y values
tree_with_siblings = {
    "id": "root",
    "children": [
        {"id": "child1", "children": []},  # y = 0
        {"id": "child2", "children": []},  # y = 1
        {"id": "child3", "children": []},  # y = 2
    ]
}

Z - Homogeneity Coordinate

Range: 0 to 1
Meaning: Semantic similarity to siblings
High z: Message is similar to its siblings
Low z: Message is unique/diverse

python

# Calculation methods
calc_similarity = DLMCoordinateCalculator(homogeneity_method="similarity_based")
calc_variance = DLMCoordinateCalculator(homogeneity_method="variance_based")

# similarity_based: Uses embeddings and cosine similarity
# variance_based: Uses statistical variance in coordinates

T - Temporal Coordinate

Range: 0 to 1 (normalized)
Meaning: Time-based ordering of messages
Earlier messages: Lower t values
Later messages: Higher t values

python

# With explicit timestamps
timestamps = {
    "msg_001": 1609459200.0,  # 2021-01-01 00:00:00
    "msg_002": 1609459260.0,  # 2021-01-01 00:01:00
}

coordinates = calc.compute_coordinates(tree, timestamps=timestamps)

# Without timestamps, uses global message index

N_parts - Complexity Coordinate

Range: 0 to max_complexity (integer)
Meaning: Structural complexity of message
Components counted:
Paragraphs
Sentences
Code blocks
List items

python

simple_message = "Hello"  # n_parts = 1
complex_message = """
Paragraph 1.

Paragraph 2 with multiple sentences. Here's another.

def example():
pass

"""  # n_parts = 6+ (paragraphs + sentences + code blocks)

Performance Optimization

Caching

python

# Enable caching for better performance
calc = DLMCoordinateCalculator(use_cache=True)

# Cached operations:
# - Embeddings
# - Similarity calculations
# - Intermediate coordinate calculations

Batch Processing

python

# Process multiple trees efficiently
trees = [tree1, tree2, tree3]

all_coordinates = {}
for i, tree in enumerate(trees):
    coords = calc.compute_coordinates(tree)
    all_coordinates[f"tree_{i}"] = coords

Memory Management

python

# Clear cache when needed
calc.clear_cache()

# Use smaller batch sizes for large trees
calc = DLMCoordinateCalculator(batch_size=16)

Type Safety

All classes use Pydantic for runtime validation:

python

from pydantic import ValidationError

try:
    # This will raise validation error
    bad_coord = DLMCoordinate(x="not a number")
except ValidationError as e:
    print(f"Invalid coordinate: {e}")

# Type hints are fully supported
coordinates: Dict[str, DLMCoordinate] = calc.compute_coordinates(tree)

API Reference

DLMCoordinate

Fields:
- `x: float` - Depth coordinate
- `y: float` - Sibling order coordinate
- `z: float` - Homogeneity coordinate
- `t: float` - Temporal coordinate
- `n_parts: int` - Complexity measure
- `depth_level: int` - Integer depth
- `sibling_index: int` - Index among siblings
- `sibling_count: int` - Total sibling count
- `homogeneity_score: float` - Raw homogeneity
- `confidence: float` - Calculation confidence
- `parent: Optional[str]` - Parent message ID
- `children: List[str]` - Child message IDs

Methods:
- `distance_to(other) -> float` - Full 5D distance
- `euclidean_distance_to(other) -> float` - 3D spatial distance
- `cosine_similarity_to(other) -> float` - Cosine similarity
- `to_dict() -> Dict` - Convert to dictionary
- `to_tensor() -> torch.Tensor` - Convert to tensor
- `to_numpy() -> np.ndarray` - Convert to numpy array
- `from_chain_coordinate(coord) -> DLMCoordinate` - Convert from legacy
- `to_chain_coordinate() -> ChainCoordinate` - Convert to legacy (deprecated)

DLMCoordinateCalculator

Constructor Parameters:
- `normalize_coordinates: bool = True` - Normalize to [0,1]
- `homogeneity_method: str = "similarity_based"` - Homogeneity calculation
- `use_cache: bool = True` - Enable caching
- `batch_size: int = 32` - Batch processing size
- `x_bounds, y_bounds, z_bounds: Tuple[float, float]` - Normalization bounds

Methods:
- `compute_coordinates(tree, embeddings=None, timestamps=None) -> Dict[str, DLMCoordinate]`
- `clear_cache()` - Clear internal caches

DLMCoordinateValidator

Static Methods:
- `validate_coordinates(coords) -> Tuple[bool, List[str]]` - Validate all coordinates
- `validate_relationships(coords) -> Dict` - Validate tree structure
- `validate_coordinate_values(coord) -> bool` - Validate single coordinate

Embedding System

IRCPEmbedder

Production-ready IRCP embedding provider with automatic caching and batch processing.

Features

Automatic Caching: Configurable TTL-based caching for improved performance
Efficient Batch Processing: Optimized batch embedding generation
Coordinate Prediction: IRCP-specific 4D coordinate prediction
Response Pattern Prediction: User response pattern analysis
Confidence Estimation: Prediction confidence scores
Fallback Mode: Graceful degradation when IRCP model unavailable

Basic Usage

python

from dlm.core.embeddings import IRCPEmbedder

# Create embedder with caching
embedder = IRCPEmbedder(
    cache_capacity=512,
    cache_ttl=3600,  # 1 hour
    batch_size=32,
    enable_caching=True
)

# Single embedding
embedding = embedder.generate_embeddings("Hello world")
# Returns: np.ndarray of shape (384,)

# Batch embeddings
embeddings = embedder.generate_embeddings(["Hi", "Hello", "Hey"])
# Returns: List[np.ndarray], each of shape (384,)

IRCP-Specific Features

python

# Predict IRCP coordinates
coords = embedder.predict_coordinates("Hello world")
# Returns: np.ndarray of shape (4,) - (x, y, z, t) coordinates

# Predict response patterns
patterns = embedder.predict_response_patterns("Hello world")
# Returns: np.ndarray of shape (384,)

# Estimate confidence
confidence = embedder.estimate_confidence("Hello world")
# Returns: float in [0, 1]

# Get all predictions at once (efficient)
results = embedder.predict_all("Hello world")
# Returns: dict with keys: embeddings, coordinates, response_patterns, confidence

Loading Trained Models

python

from pathlib import Path

# Load from checkpoint
embedder = IRCPEmbedder(
    model_path=Path("training/ircp/full_dataset/best_model.pt"),
    config_path=Path("training/ircp/full_dataset/inferred_config.json"),
    enable_caching=True
)

# Or create with custom config
embedder = IRCPEmbedder(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    coordinate_dim=4,
    hidden_dim=512,
    dropout=0.1,
    freeze_encoder=True
)

Caching Behavior

python

embedder = IRCPEmbedder(enable_caching=True)

# First call - cache miss
emb1 = embedder.generate_embeddings("test")

# Second call - cache hit (instant)
emb2 = embedder.generate_embeddings("test")

# Check cache statistics
stats = embedder.get_cache_stats()
print(f"Cache hits: {stats['hits']}, misses: {stats['misses']}")
print(f"Hit rate: {stats['hit_rate']:.2%}")

Batch Processing

python

# Efficient batch processing
texts = ["Message 1", "Message 2", "Message 3", ...]
embeddings = embedder.generate_embeddings(texts)

# Batch IRCP predictions
results = embedder.predict_all(texts)
# results["coordinates"] is a list of 4D arrays
# results["confidence"] is a list of floats

Migration from dlm.engine.ircp_embedder

Old way (deprecated):

python

from dlm.engine.ircp_embedder import IRCPEmbeddingEngine

engine = IRCPEmbeddingEngine(
    model_path="path/to/model.pt",
    cache_embeddings=True
)
emb = engine.generate_embedding("text", message_id="msg_001")

New way (recommended):

python

from dlm.core.embeddings import IRCPEmbedder

embedder = IRCPEmbedder(
    model_path="path/to/model.pt",
    enable_caching=True
)
emb = embedder.generate_embeddings("text")

Performance Tips

1. Enable Caching: For repeated embeddings, caching provides ~100x speedup
2. Batch Processing: Process multiple texts at once for 3-5x speedup
3. Cache Tuning: Adjust `cache_capacity` based on your working set size
4. Device Selection: Use `device="cuda"` for GPU acceleration if available

python

# Optimized for production
embedder = IRCPEmbedder(
    model_path="models/best_model.pt",
    enable_caching=True,
    cache_capacity=1024,  # Larger cache for production
    batch_size=64,  # Larger batches for throughput
    device="cuda",  # GPU if available
)

IRCP Theory Modules

Advanced IRCP components are available in `dlm.core.ircp`:

python

from dlm.core.ircp import (
    InverseAttentionMechanism,
    MeasurePreservingTransform,
    RingTopology,
    IRCP_AVAILABLE,  # Flag indicating if IRCP package is loaded
)

# Check IRCP availability
if IRCP_AVAILABLE:
    # Use IRCP-specific features
    attention = InverseAttentionMechanism(hidden_dim=384, num_heads=8)
else:
    # Fallback or warning
    print("IRCP package not available")

Testing

Run the test suite:

bash

# Run all coordinate tests
pytest packages/dlm/core/tests/test_coordinates.py -v

# Run all embedding tests
pytest packages/dlm/core/tests/test_embeddings.py -v

# Run specific test class
pytest packages/dlm/core/tests/test_coordinates.py::TestDLMCoordinate -v

# Run all core tests
pytest packages/dlm/core/tests/ -v

# Run with coverage
pytest packages/dlm/core/tests/ --cov=dlm.core

License

Part of the Computational Choreography DLM package.

Promotion Decision

Attach run IDs, datasets, metrics, and reproduction commands.

Source Anchor

Comp-Core/backend/cc-trajectory/legacy/cc-tpo-original/cc-tpo/packages/dlm/core/README.md

Detected Structure

Method · Evaluation · Code Anchors · Architecture

Full Public Reader

DLM Core - Coordinate System

Overview

Core Components

1. DLMCoordinate

Basic Fields

Rich Metadata

Distance Calculations

Conversion Methods

2. DLMCoordinateCalculator

Basic Usage

Advanced Options

Configuration Options

3. DLMCoordinateValidator

Basic Validation

Relationship Validation

Individual Coordinate Validation

Migration Guide

From ChainCoordinate to DLMCoordinate

Automatic Conversion

Deprecation Timeline

Complete Workflow Example

Understanding Coordinate Dimensions

X - Depth Coordinate

Y - Sibling Order Coordinate

Z - Homogeneity Coordinate

T - Temporal Coordinate

N_parts - Complexity Coordinate

Performance Optimization

Caching

Batch Processing

Memory Management

Type Safety

API Reference

DLMCoordinate

DLMCoordinateCalculator

DLMCoordinateValidator

Embedding System

IRCPEmbedder

Features

Basic Usage

IRCP-Specific Features

Loading Trained Models

Caching Behavior

Batch Processing

Migration from dlm.engine.ircp_embedder

Performance Tips

IRCP Theory Modules

Testing

Related Documentation

License

Promotion Decision

Source Anchor

Detected Structure