DLM Configuration Guide
```python config.tokens.total_max_tokens = 16000 # Total token budget config.tokens.max_tokens_per_text = 8192 # Per-text limit config.tokens.truncation_buffer = 100 # Safety buffer ```
Full Public Reader
# DLM Configuration Guide
Complete guide to configuring the DLM system using the unified `DLMConfig`.
## Quick Start
from dlm.config import DLMConfig
# Use default configuration
config = DLMConfig.create_default()
# Or use a preset
config = DLMConfig.create_production()
config = DLMConfig.create_development()
config = DLMConfig.create_performance_optimized()
config = DLMConfig.create_quality_optimized()
## Configuration Sections
### 1. Tokens (`tokens`)
Controls token limits and truncation.
config.tokens.total_max_tokens = 16000 # Total token budget
config.tokens.max_tokens_per_text = 8192 # Per-text limit
config.tokens.truncation_buffer = 100 # Safety buffer
### 2. Coordinates (`coordinates`)
Coordinate system calculation settings.
# Normalization
config.coordinates.normalize_coordinates = True
config.coordinates.x_bounds = (0.0, 1.0)
config.coordinates.y_bounds = (0.0, 1.0)
config.coordinates.z_bounds = (0.0, 1.0)
# Homogeneity calculation
config.coordinates.homogeneity_method = "similarity_based" # or "variance_based"
config.coordinates.homogeneity_threshold = 0.5
# Weighting factors
config.coordinates.embedding_weight = 0.3
config.coordinates.temporal_weight = 0.2
config.coordinates.structural_weight = 0.5
# Performance
config.coordinates.use_cache = True
config.coordinates.batch_size = 32
### 3. IRCP (`ircp`)
Inverse-Ring Context Propagation parameters.
# Forward ring
config.ircp.alpha = 1.0
config.ircp.beta = 1.0
config.ircp.gamma = 1.0
# Inverse ring
config.ircp.alpha_prime = 1.0
config.ircp.beta_prime = 1.0
config.ircp.gamma_prime = 1.0
# Propagation
config.ircp.default_learning_rate = 0.1
config.ircp.convergence_threshold = 1e-4
config.ircp.max_propagation_steps = 10
config.ircp.apply_conservation = True
### 4. Embedding (`embedding`)
Embedding generation and caching.
# Model
config.embedding.model_name = "sentence-transformers/all-MiniLM-L6-v2"
config.embedding.embedding_dim = 384
config.embedding.device = "cpu" # or "cuda", "mps"
# Caching
config.embedding.enable_caching = True
config.embedding.cache_capacity = 512
config.embedding.cache_ttl = 3600.0 # seconds
# Batch processing
config.embedding.batch_size = 32
# IRCP-specific
config.embedding.coordinate_dim = 4
config.embedding.hidden_dim = 512
config.embedding.dropout = 0.1
### 5. Model (`model`)
IRCP model architecture.
config.model.name = "sentence_transformer_icp"
config.model.hidden_dim = 512
config.model.num_layers = 6
config.model.num_heads = 8
config.model.dropout = 0.1
config.model.activation = "relu"
config.model.use_layer_norm = True
### 6. Training (`training`)
Training hyperparameters.
# Optimization
config.training.learning_rate = 1e-4
config.training.batch_size = 32
config.training.epochs = 100
config.training.optimizer = "adamw"
config.training.weight_decay = 0.01
# Loss weights
config.training.coord_loss_weight = 1.0
config.training.pattern_loss_weight = 0.5
config.training.conservation_loss_weight = 0.1
# Checkpointing
config.training.save_checkpoints = True
config.training.checkpoint_frequency = 10
config.training.early_stopping_patience = 15
### 7. Context Archival (`archival`)
Context archival system settings.
config.archival.enabled = True
config.archival.max_active_chains = 15
config.archival.relevance_threshold = 0.6
config.archival.restore_threshold = 0.7
### 8. Database (`database`)
Data loading settings.
config.database.min_messages = 5
config.database.max_conversations = 1000 # or None for all
config.database.batch_size = 1000
config.database.cache_embeddings = True
config.database.max_workers = 4
### 9. Evaluation (`evaluation`)
Evaluation and metrics.
config.evaluation.metrics = [
"pattern_consistency",
"conservation_loss",
"response_quality",
"coordinate_accuracy",
]
config.evaluation.visualization = True
config.evaluation.save_plots = True
### 10. Resources (`resources`)
Resource allocation.
config.resources.device = "auto" # "auto", "cpu", "cuda", "mps"
config.resources.mixed_precision = False
config.resources.num_workers = 4
## Loading and Saving
### From File
# Load from YAML
config = DLMConfig.from_file("config.yaml")
# Load from JSON
config = DLMConfig.from_file("config.json")
# Save to file
config.to_file("my_config.yaml")
config.to_file("my_config.json")
### From Environment Variables
# Set environment variables:
# export DLM_TOKENS_TOTAL_MAX_TOKENS=20000
# export DLM_EMBEDDING_CACHE_CAPACITY=1024
# export DLM_TRAINING_LEARNING_RATE=0.001
config = DLMConfig.from_env()
Environment variable format: `DLM_<SECTION>_<PARAMETER>`
Examples:
- `DLM_TOKENS_TOTAL_MAX_TOKENS=16000`
- `DLM_EMBEDDING_ENABLE_CACHING=true`
- `DLM_TRAINING_BATCH_SIZE=64`
- `DLM_COORDINATES_NORMALIZE_COORDINATES=false`
### From Dictionary
config_dict = {
"tokens": {"total_max_tokens": 20000},
"embedding": {"cache_capacity": 1024},
"training": {"learning_rate": 0.001},
"verbose": True
}
config = DLMConfig.from_dict(config_dict)
## Preset Configurations
### Development
Fast iteration with minimal resources:
config = DLMConfig.create_development()
# - 10 conversations max
# - 5 epochs
# - Small batch size (8)
# - DEBUG logging
# - Verbose output
### Performance Optimized
Speed over quality:
config = DLMConfig.create_performance_optimized()
# - Large batch sizes (64)
# - Fewer propagation steps (5)
# - Large caches (1024)
# - Mixed precision enabled
# - 8 workers
### Quality Optimized
Quality over speed:
config = DLMConfig.create_quality_optimized()
# - 200 epochs
# - Small batch size (16)
# - More propagation steps (15)
# - Detailed analysis enabled
# - Higher convergence threshold
### Production
Balanced for production use:
config = DLMConfig.create_production()
# - All conversations
# - 100 epochs
# - Checkpoints enabled
# - Large cache (2048)
# - Logging to file
# - Auto device selection
### Coordinate Focus
Optimized for coordinate accuracy:
config = DLMConfig.create_coordinate_focus()
# - Advanced similarity enabled
# - Normalized coordinates
# - Higher coordinate loss weight (1.5)
# - Additional coordinate metrics
### Conservation Focus
Emphasizes conservation laws:
config = DLMConfig.create_conservation_focus()
# - Conservation enabled
# - Higher conservation loss weight (0.4)
# - Balanced with coordinates
## Custom Configuration
### Creating Custom Presets
def create_my_preset():
config = DLMConfig.create_default()
# Customize
config.training.learning_rate = 0.0005
config.training.batch_size = 48
config.embedding.cache_capacity = 2048
config.coordinates.batch_size = 64
# Set experiment name
config.experiment_name = "my_experiment"
return config
config = create_my_preset()
### Modifying Existing Presets
# Start from preset
config = DLMConfig.create_production()
# Override specific settings
config.training.learning_rate = 0.0005
config.embedding.device = "cuda"
config.database.max_conversations = 5000
# Save for reuse
config.to_file("custom_production.yaml")
## Configuration File Examples
### YAML Example
tokens:
total_max_tokens: 20000
max_tokens_per_text: 10000
embedding:
model_name: "sentence-transformers/all-MiniLM-L6-v2"
cache_capacity: 1024
enable_caching: true
device: "cuda"
training:
learning_rate: 0.0001
batch_size: 32
epochs: 100
optimizer: "adamw"
coordinates:
normalize_coordinates: true
homogeneity_method: "similarity_based"
batch_size: 64
verbose: true
experiment_name: "my_experiment"
### JSON Example
{
"tokens": {
"total_max_tokens": 20000
},
"embedding": {
"cache_capacity": 1024,
"device": "cuda"
},
"training": {
"learning_rate": 0.0001,
"batch_size": 32
},
"verbose": true
}
## Best Practices
### 1. Start with a Preset
Always start with a preset that matches your use case:
# Development
config = DLMConfig.create_development()
# Production
config = DLMConfig.create_production()
# Then customize as needed
config.training.learning_rate = 0.0005
### 2. Use Configuration Files
Store configurations in version control:
# In code
config = DLMConfig.from_file("configs/production.yaml")
# Save experiments
config.experiment_name = "experiment_001"
config.to_file(f"configs/experiments/{config.experiment_name}.yaml")
### 3. Environment Variables for Deployment
Use environment variables for deployment-specific settings:
# .env file
DLM_EMBEDDING_DEVICE=cuda
DLM_DATABASE_MAX_CONVERSATIONS=10000
DLM_TRAINING_BATCH_SIZE=64
# In code
config = DLMConfig.from_file("configs/base.yaml")
config = DLMConfig.from_env() # Override with env vars
### 4. Validation
Always validate your configuration:
config = DLMConfig.from_file("config.yaml")
# Check critical settings
assert config.training.learning_rate > 0
assert config.embedding.cache_capacity > 0
assert config.resources.device in ["auto", "cpu", "cuda", "mps"]
## Migration from Old Config
### Old Way (Deprecated)
from dlm.response.config import ResponseConfig # DEPRECATED
config = ResponseConfig.create_default()
config.tokens.total_max_tokens = 20000
### New Way (Recommended)
from dlm.config import DLMConfig
config = DLMConfig.create_default()
config.tokens.total_max_tokens = 20000
### Differences
1. **Unified**: All settings in one config (DLM + IRCP + TPO)
2. **More Sections**: Added embedding, model, database, evaluation, resources
3. **Presets**: More specialized presets available
4. **Environment Variables**: Built-in support
5. **File I/O**: YAML and JSON support
## Troubleshooting
### Issue: Config file not found
try:
config = DLMConfig.from_file("config.yaml")
except FileNotFoundError:
logger.warning("Config file not found, using defaults")
config = DLMConfig.create_default()
### Issue: Invalid configuration values
Check that:
- Learning rate > 0
- Batch sizes > 0
- Cache capacity > 0
- Device is valid ("auto", "cpu", "cuda", "mps")
- Loss weights sum to ~1.0
### Issue: Out of memory
Reduce:
- `training.batch_size`
- `embedding.cache_capacity`
- `database.max_workers`
Or enable:
- `resources.mixed_precision = True`
### Issue: Slow training
Increase:
- `training.batch_size`
- `embedding.batch_size`
- `coordinates.batch_size`
- `database.max_workers`
Enable:
- `resources.mixed_precision = True`
- `embedding.enable_caching = True`
## Related Documentation
- [DLM Core README](core/README.md) - Core module documentation
- [Phase 2.3: Configuration Consolidation](../../PHASE_2_3_CONFIG.md) - Implementation plan
- [Integration Plan](../../INTEGRATION_PLAN.md) - Overall project planLicense
Part of the Computational Choreography DLM package.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/backend/cc-trajectory/legacy/cc-tpo-original/cc-tpo/packages/dlm/CONFIG_GUIDE.md
Detected Structure
Method · Evaluation · Code Anchors · Architecture