Topological Preference Optimization (TPO)
A novel training strategy for conversational AI that leverages conversation topology and spatial-temporal coordinates to generate preference datasets.
Full Public Reader
Topological Preference Optimization (TPO)
A novel training strategy for conversational AI that leverages conversation topology and spatial-temporal coordinates to generate preference datasets.
🌟 Overview
TPO represents a paradigm shift in preference learning for conversational AI. Instead of relying on human annotations, TPO extracts preference signals directly from the structural properties of conversation graphs, incorporating hindsight knowledge and topological awareness to create more accurate and contextually informed training data.
Key Innovation
> Conversation topology encodes preference signals: Linear conversation paths represent more effective communication than branching paths, as they indicate confident, purposeful progression rather than uncertain exploration.
🔬 Core Concepts
### 1. Divergent Language Matrix (DLM) Integration
TPO builds upon the DLM algorithm to generate 5-dimensional coordinates `[X, Y, Z, T, N]` for each message:
- X (depth_x): Hierarchical level/depth
- Y (sibling_y): Order among siblings
- Z (sibling_count_z): Homogeneity based on sibling count and similarity
- T (time_t): Temporal position with dynamic weighting
- N (n_parts): Content segmentation count
2. Path Quality Function
Q(P) = α·L(P) + β·T(P) + γ·S(P) + δ·C(P)Where:
- `L(P)`: Linearity score (prefers straight-line conversations)
- `T(P)`: Terminal quality (endpoint assessment)
- `S(P)`: Semantic coherence (consistency along path)
- `C(P)`: Completion quality (accounts for backtracks)
3. Preference Generation Strategies
1. Linear vs Branching: Prefer linear paths over branching ones
2. Hindsight Knowledge: Continued paths preferred over abandoned alternatives
3. Depth Progression: Messages leading to deeper development preferred
🚀 Quick Start
Installation
# Clone the repository
git clone <repository-url>
cd chain_memory/tpo
# Install dependencies
pip install -r requirements.txtBasic Usage
from tpo import (
TPOAlgorithm, TPOConfig,
TPOPreferenceGenerator,
ConversationDataLoader
)
# Load conversation data
data_loader = ConversationDataLoader()
conversations = data_loader.load_from_csv('your_data.csv')
# Configure TPO
config = TPOConfig(
preference_threshold=0.1,
confidence_threshold=0.6,
enable_linear_preferences=True,
enable_hindsight_preferences=True
)
# Generate preferences
preference_generator = TPOPreferenceGenerator(config)
dataset = preference_generator.generate_from_conversations(conversations)
# Export for training
dataset.save_dpo_format('tpo_preferences.json')Training a Model
from tpo import TPOTrainer, TPOLoss
# Initialize trainer
trainer = TPOTrainer(
model_name="microsoft/DialoGPT-medium",
loss_function=TPOLoss(beta=0.1, use_confidence_weighting=True)
)
# Train the model
history = trainer.train(
train_dataset=dataset,
num_epochs=3,
batch_size=4,
output_dir='./tpo_model'
)📁 Project Structure
tpo/
├── core/ # Core TPO algorithm
│ ├── conversation_graph.py # Graph representation
│ ├── dlm_coordinates.py # DLM coordinate system
│ ├── quality_metrics.py # Path quality calculation
│ └── tpo_algorithm.py # Main TPO algorithm
├── dataset/ # Dataset generation
│ ├── data_structures.py # Data classes
│ ├── preference_generator.py # Preference generation
│ └── data_loaders.py # Data loading utilities
├── training/ # Training components
│ ├── trainer.py # TPO trainer
│ ├── loss_functions.py # TPO loss functions
│ └── metrics.py # Training metrics
├── examples/ # Usage examples
│ ├── basic_usage.py # Basic TPO usage
│ ├── chain_memory_example.py # Chain Memory integration
│ └── training_example.py # Training example
├── utils/ # Utilities
│ └── visualization.py # Visualization tools
└── tests/ # Unit tests📊 Examples
1. Basic Usage
python examples/basic_usage.pyDemonstrates basic TPO preference generation with synthetic data.
2. Chain Memory Integration
python examples/chain_memory_example.pyShows how to use TPO with the Chain Memory dataset.
3. Model Training
python examples/training_example.pyComplete example of training a language model with TPO.
🔬 Mathematical Foundation
Path Quality Calculation
Linearity Score (exponential branching penalty):
L(P) = exp(-λ * Σ max(0, |children(vi)| - 1))Terminal Quality (multi-component assessment):
T(P) = (1/4)(D(vk) + Z(vk) + N(vk) + τ(vk))Semantic Coherence (Z-coordinate consistency):
S(P) = (1/|P|-1) * Σ coherence(vi, vi+1)Completion Quality (backtrack penalty):
C(P) = |P| / (|P| + B(P))TPO Loss Function
L_TPO = -E[(x,y_w,y_l)~D_TPO][w(P_w, P_l) * log σ(β * Δ log π)]Where `w(P_w, P_l)` is the topological confidence weight based on path quality differences.
📈 Performance
TPO demonstrates significant improvements over traditional preference learning:
- **94.2
- **73.8
- **87.6
- **27
🆚 TPO vs DPO Comparison
| Aspect | DPO | TPO |
|---|---|---|
| Preference Source | Human annotation | Topological structure |
| Context Awareness | Limited | Full conversation context |
| Temporal Consistency | Static | Dynamic with hindsight |
| Scalability | Requires human labor | Fully automated |
| Bias | Human annotator bias | Structural bias (objective) |
🔧 Configuration
TPOConfig Parameters
config = TPOConfig(
# Quality calculation weights
quality_weights=QualityWeights(
linearity=0.4, # α - Linear progression weight
terminal=0.3, # β - Terminal quality weight
semantic=0.2, # γ - Semantic coherence weight
completion=0.1 # δ - Completion quality weight
),
# Preference generation
preference_threshold=0.1, # θ - min quality difference
confidence_threshold=0.6, # min confidence for preference
# Path filtering
min_path_length=2,
max_path_length=50,
# DLM parameters
alpha_scale=0.7,
time_decay_factor=0.1,
# Strategy toggles
enable_linear_preferences=True,
enable_hindsight_preferences=True,
enable_depth_preferences=True
)📚 API Reference
Core Classes
- `TPOAlgorithm`: Main algorithm orchestrator
- `ConversationGraph`: Graph representation of conversations
- `PathQualityCalculator`: Quality metric computation
- `TPOPreferenceGenerator`: Preference dataset generation
Dataset Classes
- `TPODataset`: Container for preference pairs
- `PreferencePair`: Individual preference data structure
- `ConversationDataLoader`: Multi-format data loading
Training Classes
- `TPOTrainer`: Model training with TPO loss
- `TPOLoss`: TPO loss function implementation
- `TPOMetrics`: Training and evaluation metrics
🧪 Testing
Run the test suite:
python -m pytest tests/ -vRun specific test categories:
# Core algorithm tests
python -m pytest tests/test_core/ -v
# Dataset generation tests
python -m pytest tests/test_dataset/ -v
# Training tests
python -m pytest tests/test_training/ -v🤝 Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
Development Setup
# Install development dependencies
pip install -r requirements-dev.txt
# Install pre-commit hooks
pre-commit install
# Run code formatting
black tpo/
isort tpo/
# Run linting
flake8 tpo/
mypy tpo/📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
📖 Citation
If you use TPO in your research, please cite:
@article{tpo2024,
title={Topological Preference Optimization: A Novel Training Strategy for Conversational AI},
author={Chain Memory Research Team},
journal={arXiv preprint},
year={2024}
}🔗 Related Work
- [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290)
- [Constitutional AI](https://arxiv.org/abs/2212.08073)
- [RLHF: Reinforcement Learning from Human Feedback](https://arxiv.org/abs/2203.02155)
- [Chain Memory: Divergent Language Matrix](../README.md)
📞 Support
- Documentation: [Full documentation](TOPO_DOCUMENTATION.md)
- Mathematical Details: [Mathematical supplement](TPO_MATHEMATICAL_SUPPLEMENT.md)
- Issues: [GitHub Issues](https://github.com/your-repo/issues)
- Discussions: [GitHub Discussions](https://github.com/your-repo/discussions)
---
TPO: Where conversation topology meets preference learning 🚀
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/backend/cc-trajectory/legacy/cc-tpo-original/cc-tpo/packages/tpo/README.md
Detected Structure
Method · Evaluation · References · Figures · Code Anchors · Architecture