DLM
Divergent Language Matrix (DLM) is designed to generate a lower-dimensional representation of complex, hierarchical text data, such as conversations. The algorithm preserves both semantic and structural relationships within the data, allowing for more efficient analysis and visualization.
Full Public Reader
Divergent Language Matrix (DLM) is designed to generate a lower-dimensional representation of complex, hierarchical text data, such as conversations. The algorithm preserves both semantic and structural relationships within the data, allowing for more efficient analysis and visualization.
In the Divergent Language Matrix (DLM) framework, a conversation tree is formulated as a directed, acyclic graph where each node corresponds to a message in the conversation. Each message `t_i` is mathematically defined by a triplet `(d_i, s_i, c_i)`, such that:
* x_coord (Depth): Represents the hierarchical level of a message. If a message is a direct reply to another message, it will be one level deeper (e.g., the original message is at depth 0, a reply to it is at depth 1, a reply to that reply is at depth 2, and so on).
* y_coord (Order among siblings): Represents the order in which a message appears among its siblings. This is relevant when there are multiple replies (siblings) to a single message. It provides a sense of the sequence of the conversation.
* z_coord (Homogeneity based on sibling count and similarity score): This is the most direct measure of homogeneity in the provided method.It serves as an essential indicator for both the structural and semantic relationships among messages at the same hierarchical level. The `z_coord` value is calculated differently depending on whether similarity scores are included.
Isolated Messages (Zero Siblings)
Messages that are the only replies to their parent are assigned a `z_coord` of 0. These messages are unique in their respective contexts and do not share the depth level with any other message. Thus, they do not need differentiation based on sibling relationships or similarity scores.
Messages with Siblings
Messages that are part of a set of sibling messages have their `z_coord` calculated using one of two methods:
##### Without Considering Similarity Scores
If similarity scores are not considered, the `z_coord` for each sibling message is calculated as:
z_coord = -0.5 * (total_number_of_siblings - 1)Here, all sibling messages will have the same `z_coord`, indicating that they belong to the same homogeneous group at that hierarchical level.
##### Considering Similarity Scores
If similarity scores are available, then the `z_coord` is calculated as:
z_coord = (-0.5 + avg_similarity * 0.5) * (total_number_of_siblings - 1)In this formula, `avg_similarity` is the average similarity score among all sibling messages. The resulting `z_coord` will weight the homogeneity both by structural position and semantic similarity.
Example
1. Isolated message: If a message has no siblings, its `z_coord = 0`.
2. Multiple siblings without similarity scores: For 3 siblings, each would have `z_coord = -0.5 * (3 - 1) = -1`.
3. Multiple siblings with similarity scores: For 3 siblings with an average similarity score of 0.8, each would have `z_coord = (-0.5 + 0.8 0.5) (3 - 1) = 0.3 * 2 = 0.6`.
3. Hierarchical Spatial-Temporal Coordinate Assignment
The assignment of hierarchical spatial-temporal coordinates is a cornerstone in the DLM framework, bridging the gap between high-dimensional textual embeddings and the structured representation of a conversation. It assigns each segment a four-dimensional coordinate `(x, y, z, t)`, encoding both its place in the conversational hierarchy and its chronological order.
3.1. The Framework for Coordinate Assignment
- The Coordinate Tuple: Every segment `P_{i,j,k}` within a given conversation `C_i` is mapped to a unique coordinate tuple `(x_{i,j,k}, y_{i,j,k}, z_{i,j,k}, t_{i,j,k})`.
- Rooted in Message Metadata: The values of `x, y, z` are computed as functions `f(d_i, s_i, c_i)`, where `d_i, s_i, c_i` are as previously defined.
- Chronological Timestamp: `t_{i,j,k}` is defined by the temporal metadata associated with the message, normalized to a suitable scale for analysis.
3.2. Spatial Coordinate Calculations
- X-Axis (Thread Depth): `x_{i,j,k}` is directly proportional to `d_i`, representing the depth of the message in the conversation tree. It captures the level of nesting for each message.
`x_{i,j,k} = f_x(d_i)`
- Y-Axis (Sibling Order): `y_{i,j,k}` is a function of `s_i`, signifying the message's ordinal position among siblings.
`y_{i,j,k} = f_y(s_i)`
- Z-Axis (Sibling Density): `z_{i,j,k}` encapsulates the density of sibling messages at a given depth, calculated as a function of `c_i`.
`z_{i,j,k} = f_z(c_i)`
These functions `f_x, f_y, f_z` can be linear or nonlinear mappings based on the specific requirements of the analysis.
3.3. Temporal Coordinate Calculations
The temporal coordinate, denoted as `t_coord`, integrates both the message's temporal weight and its normalized depth in the conversation hierarchy. This offers a nuanced perspective on the timing of each message, factoring in its temporal context as well as its place in the conversation structure.
- Mathematical Representation:
The formula for `t_coord` is expressed as:
t_coord = dynamic_alpha_scale * temporal_weights[i] + (1 - dynamic_alpha_scale) * normalized_depthComponents:
##### 1. `dynamic_alpha_scale`
This is a dynamic scalar that helps balance the contribution of `temporal_weights[i]` and `normalized_depth`. It is computed dynamically, depending on variables like the type of message and the root of the sub-thread where the message resides.
- How It Varies:
- The scale is closer to 1 for messages that should be more sensitive to time.
- It moves closer to 0 for messages where hierarchical positioning is more critical.
- Computation:
if callable(alpha_scale):
dynamic_alpha_scale = alpha_scale(sub_thread_root, message_type)
else:
dynamic_alpha_scale = alpha_scale##### 2. `temporal_weights[i]`
This signifies the temporal importance of the message at index `i` in the conversation.
- Components:
- `TDF` (Time Decay Factor): Determines how much older messages should be "penalized" in the weight calculation.
- `timestamp[i]`: The actual timestamp of the message.
- Computation:
temporal_weights[i] = TDF * timestamp[i]##### 3. `normalized_depth`
This is the depth of a message in the hierarchical structure, normalized so it remains consistent across sub-threads of varying sizes.
- Computation:
normalized_depth = depth_of_message / max_depth_in_conversationNotes:
1. Dynamic Alpha Scaling:
- The dynamic nature of the `alpha_scale` allows for flexibility in adjusting the `t_coord` according to the contextual specifics of each message.
2. Temporal Weight Calculation:
- The Time Decay Factor (`TDF`) can be customized to match the requirements of the analysis. For instance, it can be computed based on the average time interval between messages in the thread to which the message belongs.
3. Depth Normalization:
- The normalization of depth is crucial to avoid biases in larger or more intricate conversation trees. It allows the model to account for the relative importance of a message's position within its specific context.
By utilizing these variables and calculations, `t_coord` becomes a multifaceted metric, rich in information about each message's temporal and hierarchical importance in the conversation.
3.4. Final Coordinate Assignment
After calculating these coordinates, each segment `P_{i,j,k}` in conversation `C_i` will have a unique 4D coordinate `(x_{i,j,k}, y_{i,j,k}, z_{i,j,k}, t_{i,j,k})`. These coordinates serve as a comprehensive representation of each segment's position in both the conversational hierarchy and the temporal sequence.
4. Dynamic Message Ordering (DMO)
The Dynamic Message Ordering (DMO) system utilizes a Hierarchical Spatial-Temporal Coordinate Assignment methodology to arrange messages in a conversation space. In essence, the DMO aims to spatially organize messages in such a way that:
- Similar messages are closer in this space.
- The spatial relationship of messages reflects the temporal relationship among them.
- The hierarchical structure is reflected in the spatial coordinates.
4.1. Spacing Calculation (Method: `calculate_spacing`)
In this part, the spacing `S` between siblings based on similarity scores is of utmost importance. Let's redefine the mathematical formulation with more specificity:
- Variable Definitions:
- `n = Number of children, n = | children_ids |`
- `avg_similarity = Average of normalized similarity scores, (Sum(s_i) from i=1 to n) / n`
- `s_i = Individual normalized similarity scores`
- Mathematical Representation:
`S(n, avg_similarity, method) = { 0 if n <= 1; -0.5 x (n - 1) if method = "spacing"; (-0.5 + avg_similarity) x (n - 1) if method = "both" }`
4.2. Temporal Weights Calculation (Method: `calculate_temporal_weights`)
The matrix of temporal weights is calculated as follows:
- Variable Definitions:
- `t = Vector of timestamps, t = [t_1, t_2, ..., t_n]`
- `Delta T = Matrix of pairwise time differences, Delta T_{ij} = | t_i - t_j |`
- Mathematical Representation:
`W_{ij} = f(Delta T_{ij})`
where `f(x)` is a decay function, applied element-wise.
4.3. Time Coordinate Calculation (Method: `calculate_time_coordinate`)
In this part, the focus is to determine a singular time coordinate `T` for a message. We base it on its relationship with its siblings:
- Variable Definitions:
- `t_message = Timestamp of the current message`
- `t_sibling_i = Timestamps of siblings`
- `Delta t_i = t_sibling_i - t_message`
- Mathematical Representation:
`T(t_message, Delta t) = g(time_diff)`
where `g(x)` is another decay function, and `time_diff` is the time difference between the message and a root message.
4.4. Time Decay Factor (Method: `time_decay_factor`)
The time decay factor `D` will amalgamate the impacts of both individual message time and sibling relations:
- Variable Definitions:
- `avg(Delta t) = Average time differences between a message and its siblings`
- Mathematical Representation:
`D = g(time_diff) x avg(Delta t)`
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
Comp-Core/backend/cc-trajectory/legacy/cc-tpo-original/cc-tpo/docs/architecture/docs/DLM.md
Detected Structure
Method · Architecture