TPO Mathematical Supplement: Detailed Formulations and Proofs
**Definition 1.1** (Conversation Graph): A conversation graph $G = (V, E, \mathbf{C}, \mathbf{M})$ where: - $V = \{v_1, v_2, ..., v_n\}$ is the set of message nodes - $E \subseteq V \times V$ is the set of directed edges representing reply relationships - $\mathbf{C}: V \rightarrow \mathbb{R}^5$ maps each node to its DLM coordinates - $\mathbf{M}: V \rightarrow \Sigma^*$ maps each node to its message content
Full Public Reader
TPO Mathematical Supplement: Detailed Formulations and Proofs
1. Extended Mathematical Framework
1.1 Conversation Graph Formalization
Definition 1.1 (Conversation Graph): A conversation graph $G = (V, E, \mathbf{C}, \mathbf{M})$ where:
- $V = \{v_1, v_2, ..., v_n\}$ is the set of message nodes
- $E \subseteq V \times V$ is the set of directed edges representing reply relationships
- $\mathbf{C}: V \rightarrow \mathbb{R}^5$ maps each node to its DLM coordinates
- $\mathbf{M}: V \rightarrow \Sigma^*$ maps each node to its message content
Definition 1.2 (Path): A path $P = \langle v_{i_1}, v_{i_2}, ..., v_{i_k} \rangle$ is a sequence of nodes where $(v_{i_j}, v_{i_{j+1}}) \in E$ for all $j \in [1, k-1]$.
Definition 1.3 (Root-to-Leaf Path): A path $P$ is root-to-leaf if $\text{in-degree}(v_{i_1}) = 0$ and $\text{out-degree}(v_{i_k}) = 0$.
1.2 DLM Coordinate System Integration
Where each coordinate has specific semantic meaning:
1.3 Advanced Path Quality Metrics
1.3.1 Linearity Score with Branching Penalty
Where $\lambda > 0$ is the branching penalty parameter.
1.3.2 Semantic Flow Coherence
This measures how smoothly semantic coherence changes along the path.
1.3.3 Temporal Consistency
1.3.4 Structural Progression
This penalizes erratic changes in structural complexity.
1.4 Comprehensive Path Quality Function
Where:
- $f_1(P) = L(P)$ (Linearity)
- $f_2(P) = T(P)$ (Terminal quality)
- $f_3(P) = S(P)$ (Semantic coherence)
- $f_4(P) = \tau(P) \cdot N(P)$ (Temporal-structural consistency)
- $\sum_{i=1}^{4} w_i = 1$ and $w_i > 0$
2. Preference Generation Algorithms
2.1 Linear vs Branching Preference Generation
Algorithm 2.1: Linear-Branching Preference Generation
Input: Paths P_linear, P_branching, threshold θ
Output: Preference set Π
Π ← ∅
for each P_l ∈ P_linear do
for each P_b ∈ P_branching do
if |depth(P_l) - depth(P_b)| ≤ 2 then
if Q(P_l) - Q(P_b) > θ then
π ← (terminal(P_l), terminal(P_b), "linear_preferred")
Π ← Π ∪ {π}
return Π2.2 Hindsight Knowledge Preference Generation
Algorithm 2.2: Hindsight Preference Generation
Input: Graph G, completed paths P_completed
Output: Preference set Π_hindsight
Π_hindsight ← ∅
for each P ∈ P_completed do
for each backtrack point b ∈ backtrack_points(P) do
v_chosen ← next_node_in_path(P, b)
V_rejected ← alternative_children(b) \ {v_chosen}
for each v_r ∈ V_rejected do
confidence ← calculate_hindsight_confidence(P, b)
π ← (v_chosen, v_r, "hindsight_knowledge", confidence)
Π_hindsight ← Π_hindsight ∪ {π}
return Π_hindsight2.3 Confidence Calculation
Where:
- $\sigma$ is the sigmoid function
- $\beta, \gamma$ are scaling parameters
- $\text{remaining\_depth}(P, b)$ is the path length after backtrack point $b$
3. Theoretical Analysis
3.1 Consistency Properties
Theorem 3.1 (Local Consistency): Within a single conversation graph $G$, TPO preferences satisfy local transitivity.
Proof:
Let $v_i, v_j, v_k$ be three messages with path contexts $P_i, P_j, P_k$ respectively.
If $Q(P_i) > Q(P_j) > Q(P_k)$ and all paths share the same conversation root, then:
1. TPO generates $v_i \succ v_j$ with confidence $c_{ij} = \sigma(Q(P_i) - Q(P_j))$
2. TPO generates $v_j \succ v_k$ with confidence $c_{jk} = \sigma(Q(P_j) - Q(P_k))$
3. TPO generates $v_i \succ v_k$ with confidence $c_{ik} = \sigma(Q(P_i) - Q(P_k))$
Since $Q(P_i) - Q(P_k) = (Q(P_i) - Q(P_j)) + (Q(P_j) - Q(P_k))$, we have $c_{ik} \geq \max(c_{ij}, c_{jk})$, ensuring consistency. □
Theorem 3.2 (Convergence): TPO-trained models converge to policies that maximize expected path quality.
Where $\Delta \log \pi = \log \frac{\pi_\theta(y_w|x)}{\pi_{\text{ref}}(y_w|x)} - \log \frac{\pi_\theta(y_l|x)}{\pi_{\text{ref}}(y_l|x)}$
Since $w(P_w, P_l) = \text{Confidence}(Q(P_w) - Q(P_l))$ is proportional to the quality difference, the gradient is weighted by the reliability of the preference signal, leading to convergence toward high-quality path generation. □
3.2 Optimality Conditions
Theorem 3.3 (Pareto Optimality): Under the linear progression assumption, TPO generates Pareto-optimal preference datasets.
Proof:
A preference dataset is Pareto-optimal if no other dataset can improve one quality metric without degrading another.
TPO optimizes four metrics simultaneously:
1. Linearity preference accuracy
2. Hindsight knowledge utilization
3. Semantic coherence preservation
4. Temporal consistency maintenance
The weighted product form of $Q(P)$ ensures that improvements in any single metric cannot be achieved without considering trade-offs with others, satisfying the Pareto optimality condition. □
3.3 Information-Theoretic Analysis
Theorem 3.4 (Information Content): TPO preferences contain more information than random preferences.
For TPO preferences: $P(y=1|q_1 > q_2) = \sigma(q_1 - q_2)$
For random preferences: $P(y=1|q_1 > q_2) = 0.5$
Since $\sigma(q_1 - q_2) \neq 0.5$ when $q_1 \neq q_2$, we have $I_{\text{TPO}}(Y; Q) > I_{\text{random}}(Y; Q) = 0$. □
4. Computational Complexity
### 4.1 Graph Construction
- Time Complexity: $O(n + m)$ where $n = |V|$ and $m = |E|$
- Space Complexity: $O(n + m)$
### 4.2 Path Enumeration
- Time Complexity: $O(n \cdot 2^n)$ in worst case (exponential in branching factor)
- Space Complexity: $O(n \cdot p)$ where $p$ is the number of paths
### 4.3 Preference Generation
- Time Complexity: $O(p^2)$ for pairwise comparisons
- Space Complexity: $O(p^2)$ for storing preferences
4.4 Optimization Strategies
Approximation Algorithm: For large graphs, use sampling:
1. Sample $k$ representative paths from each category
2. Generate preferences only between sampled paths
3. Time Complexity: $O(k^2)$ where $k \ll p$
Pruning Strategy: Eliminate low-quality paths early:
1. Calculate approximate quality scores
2. Prune paths below quality threshold
3. Reduction: Up to 70
5. Parameter Sensitivity Analysis
5.1 Weight Parameter Effects
The quality function weights $w_1, w_2, w_3, w_4$ affect preference generation:
5.2 Threshold Parameter $\theta$
The preference threshold affects the number of generated preferences:
5.3 Branching Penalty $\lambda$
The linearity score parameter $\lambda$ controls the strength of linear preference:
6. Experimental Validation Framework
6.1 Synthetic Data Generation
Algorithm 6.1: Synthetic Conversation Generation
Input: Parameters (depth_max, branch_prob, quality_noise)
Output: Synthetic conversation graph G_synth
G_synth ← empty graph
root ← create_root_node()
queue ← [root]
while queue not empty do
current ← queue.pop()
if depth(current) < depth_max then
if random() < branch_prob then
children ← create_multiple_children(current)
else
children ← [create_single_child(current)]
for child in children do
assign_dlm_coordinates(child)
add_quality_noise(child, quality_noise)
queue.append(child)
return G_synth6.2 Evaluation Metrics
7. Implementation Optimizations
7.1 Efficient Path Storage
Use compressed path representation:
class CompressedPath:
def __init__(self, nodes, quality_cache):
self.nodes = nodes
self.quality = quality_cache.get(tuple(nodes))
if self.quality is None:
self.quality = self.calculate_quality()
quality_cache[tuple(nodes)] = self.quality7.2 Parallel Processing
Distribute path analysis across multiple cores:
def parallel_path_analysis(paths, num_workers=4):
chunks = np.array_split(paths, num_workers)
with multiprocessing.Pool(num_workers) as pool:
results = pool.map(analyze_path_chunk, chunks)
return np.concatenate(results)7.3 Memory-Efficient Preference Generation
Use generators to avoid storing all preferences in memory:
def preference_generator(paths):
for i, path1 in enumerate(paths):
for path2 in paths[i+1:]:
if should_generate_preference(path1, path2):
yield create_preference(path1, path2)8. Future Extensions
8.1 Multi-Modal TPO
Where $\mathbf{v}_i$ and $\mathbf{a}_i$ are visual and audio feature vectors.
8.2 Dynamic Weight Learning
8.3 Hierarchical TPO
Apply TPO at multiple conversation scales:
- Sentence-level preferences
- Paragraph-level preferences
- Document-level preferences
---
This mathematical supplement provides the rigorous theoretical foundation for Topological Preference Optimization, including detailed proofs, complexity analysis, and implementation guidelines.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
Comp-Core/backend/cc-trajectory/legacy/cc-tpo-original/cc-tpo/docs/documentation/docs/TPO_MATHEMATICAL_SUPPLEMENT.md
Detected Structure
Method · Evaluation · References · Math