Cognitive Metrics Specification
```python def divergence_rate(embeddings: list[np.ndarray], window: int = 5) -> float: """ Compute average cosine distance between consecutive prompt embeddings over a sliding window.
Full Public Reader
Cognitive Metrics Specification
Technical definitions for extracting cognitive analytics from AI interaction data.
Data Sources
### Primary (Mohamed - North Star)
- `claude_prompts` table in Supabase (112K+ turns)
- `memory_turns` table (332K rows, 768-dim embeddings via text-embedding-3-small)
- prompt-logger JSONL archives
- RAG++ vector search for semantic clustering
### Secondary (Future Users)
- ChatGPT export (JSON: `conversations[].mapping[].message`)
- Claude.ai export (when available)
- Gemini export (JSON format TBD)
- Any OpenAI-compatible API logs
Metric Definitions
1. Divergence Rate (DR)
Definition: Semantic distance between consecutive prompts within and across sessions.
def divergence_rate(embeddings: list[np.ndarray], window: int = 5) -> float:
"""
Compute average cosine distance between consecutive prompt embeddings
over a sliding window.
High DR = topic-jumping polymath
Low DR = focused specialist
"""
distances = []
for i in range(len(embeddings) - 1):
d = 1 - cosine_similarity(embeddings[i], embeddings[i+1])
distances.append(d)
# Smooth with rolling window
smoothed = rolling_mean(distances, window)
return {
"mean": np.mean(smoothed),
"std": np.std(smoothed),
"trend": linear_regression_slope(smoothed), # positive = expanding, negative = narrowing
"max_jump": np.max(distances),
"time_series": smoothed
}Visualization: Line chart over time. Color gradient from blue (focused) to red (divergent). Overlay session boundaries.
Extraction query (Supabase):
SELECT id, prompt, embedding, created_at, session_id
FROM memory_turns
WHERE source_type = 'claude_prompt'
AND embedding IS NOT NULL
ORDER BY created_at ASC;2. Thought Diet (TD)
Definition: Classification of each prompt into cognitive categories, then computing the distribution.
CATEGORIES = {
"building": "Generative prompts - creating new things, writing code, designing systems",
"fixing": "Debugging, error resolution, troubleshooting",
"learning": "Questions, explanations, 'how does X work'",
"directing": "Architecture decisions, system design, strategic choices",
"creating": "Novel ideation, brainstorming, creative work",
"operating": "DevOps, deployment, configuration, maintenance",
"analyzing": "Data analysis, investigation, research"
}
def thought_diet(prompts: list[str], classifier) -> dict:
"""
Classify each prompt and compute the distribution.
classifier: LLM-based or embedding-based classifier
Returns percentage breakdown + temporal evolution.
"""
classifications = [classifier(p) for p in prompts]
distribution = Counter(classifications)
total = sum(distribution.values())
return {
category: count / total
for category, count in distribution.items()
}Classifier approach: Use RAG++ embeddings + few-shot classification. Cluster prompts by embedding similarity, then label clusters. Cheaper than per-prompt LLM classification.
Visualization: Stacked area chart over time (shows how diet evolves). Pie/donut for overall snapshot.
3. Depth of Death (DoD)
Definition: When a topic is abandoned, how deep was the exploration? Measured by conversation depth within a topic cluster.
def depth_of_death(topic_clusters: list[TopicCluster]) -> dict:
"""
For each topic cluster:
1. Count total messages in the cluster
2. Measure semantic progression (did later messages go deeper?)
3. Detect abandonment (no return within N days)
4. Score: deep abandonment vs. shallow abandonment
"""
results = {}
for cluster in topic_clusters:
depth = len(cluster.messages)
progression = semantic_depth_score(cluster.messages) # embedding drift from first to last
returned = cluster.was_revisited_within(days=7)
results[cluster.topic] = {
"depth": depth,
"progression": progression,
"abandoned": not returned,
"depth_at_death": depth if not returned else None
}
return resultsVisualization: Scatter plot. X-axis = topic breadth (number of unique topics touched). Y-axis = average depth per topic. Each dot is a domain. Size = total interactions in that domain.
4. Cadence Fingerprint (CF)
Definition: Temporal engagement patterns that reveal work style.
def cadence_fingerprint(timestamps: list[datetime]) -> dict:
"""
Extract temporal patterns from interaction timestamps.
"""
# Inter-message intervals
intervals = [timestamps[i+1] - timestamps[i] for i in range(len(timestamps)-1)]
# Session detection (gap > 30 min = new session)
sessions = detect_sessions(timestamps, gap_threshold=timedelta(minutes=30))
return {
"avg_session_length": mean([s.duration for s in sessions]),
"session_length_std": std([s.duration for s in sessions]),
"messages_per_session": mean([s.message_count for s in sessions]),
"daily_pattern": hourly_distribution(timestamps), # 24-bin histogram
"weekly_pattern": daily_distribution(timestamps), # 7-bin histogram
"burst_score": burst_detection(intervals), # high = bursty, low = steady
"streak_max": max_consecutive_days(timestamps),
"return_interval": mean_time_between_sessions(sessions),
"total_active_days": len(set(t.date() for t in timestamps)),
}Visualization: Heatmap (GitHub contribution style). Hours on Y-axis, days on X-axis. Intensity = message count.
5. Prompt Entropy (PE)
Definition: Predictability of the next prompt given the previous N prompts. Measured via embedding-space prediction error.
def prompt_entropy(embeddings: list[np.ndarray], window: int = 5) -> float:
"""
Train a simple predictor (linear or MLP) on sliding windows of embeddings.
The prediction error IS the entropy measure.
High entropy = unpredictable thinker (out-of-distribution)
Low entropy = formulaic patterns
"""
X = [embeddings[i:i+window] for i in range(len(embeddings) - window)]
y = [embeddings[i+window] for i in range(len(embeddings) - window)]
predictor = train_predictor(X, y)
errors = [prediction_error(predictor, x, y_true) for x, y_true in zip(X, y)]
return {
"mean_entropy": np.mean(errors),
"entropy_trend": linear_regression_slope(errors),
"high_entropy_moments": find_peaks(errors), # most surprising transitions
"low_entropy_stretches": find_valleys(errors), # most predictable stretches
}Visualization: Entropy timeline with highlighted peaks ("most surprising thought transitions") and valleys ("most predictable stretches").
6. Recovery Topology (RT)
Definition: When a conversation hits a failure/error/dead-end, what's the shape of the recovery?
RECOVERY_TYPES = {
"repeat": "Same prompt again, maybe rephrased",
"pivot": "Immediately switch approach",
"zoom_out": "Reframe the problem at a higher level",
"cross_pollinate": "Go to a different domain, come back with insight",
"decompose": "Break the failed task into smaller pieces",
"escalate": "Ask for help or change tools",
"abandon": "Drop the topic entirely"
}
def recovery_topology(sessions: list[Session]) -> dict:
"""
1. Detect failure points (error messages, "that didn't work", retries)
2. Classify the recovery pattern
3. Measure recovery success rate per pattern
"""
failures = detect_failures(sessions) # heuristic + LLM classification
recoveries = [classify_recovery(f) for f in failures]
return {
"distribution": Counter(recoveries),
"success_rate_per_type": {
rtype: success_rate(failures, rtype)
for rtype in RECOVERY_TYPES
},
"avg_recovery_length": mean([f.messages_to_resolve for f in failures]),
"fastest_recovery": min(failures, key=lambda f: f.messages_to_resolve),
}Visualization: Sankey diagram. Left = failure type. Middle = recovery pattern. Right = outcome (resolved/abandoned).
Composite Scores
Cognitive Complexity Index (CCI)
CCI = (
0.20 * normalize(divergence_rate) +
0.15 * normalize(thought_diet_entropy) + # more diverse diet = higher
0.20 * normalize(mean_depth) +
0.15 * normalize(prompt_entropy) +
0.15 * normalize(recovery_success_rate) +
0.15 * normalize(total_interactions)
)### Domain Bridge Score (DBS)
Number of unique cross-domain transitions weighted by semantic distance. Measures how often and how far someone jumps between fields.
### Learning Velocity (LV)
For each domain: time from first interaction to demonstrating advanced usage patterns. Faster = higher velocity.
Privacy Extraction Protocol
All metrics can be computed from:
1. Embeddings (768-dim vectors, no text needed)
2. Timestamps
3. Session boundaries
4. Domain labels (auto-generated from embedding clusters)
5. Failure/success signals (auto-detected)
No raw text needs to leave the user's machine. The metric vectors alone power the dashboard, the comparisons, and the marketplace matching.
Promotion Decision
Attach run IDs, datasets, metrics, and reproduction commands.
Source Anchor
cognitive-hire/docs/metrics-spec.md
Detected Structure
Method · Evaluation · Architecture