Privacy Architecture
**Processing**: 1. Parse conversations into individual prompts with timestamps 2. Strip all named entities (NER pass): names, companies, URLs, emails, phone numbers, addresses 3. Strip code snippets containing credentials, API keys, file paths with usernames 4. Generate embeddings locally (or via privacy-preserving API with no logging) 5. Compute all 6 cognitive metrics locally 6. Cluster prompts into domain topics using embedding similarity 7. Label clusters with generic domain tags (not project-specific names)
Full Public Reader
Privacy Architecture
Principle
Raw thought stays local. Only the shape of thought travels.
4-Layer Model
### Layer 0: Local Extraction
Runs entirely on the user's machine. No network calls.
Input: Raw AI conversation exports (ChatGPT JSON, Claude history, Gemini data)
Processing:
1. Parse conversations into individual prompts with timestamps
2. Strip all named entities (NER pass): names, companies, URLs, emails, phone numbers, addresses
3. Strip code snippets containing credentials, API keys, file paths with usernames
4. Generate embeddings locally (or via privacy-preserving API with no logging)
5. Compute all 6 cognitive metrics locally
6. Cluster prompts into domain topics using embedding similarity
7. Label clusters with generic domain tags (not project-specific names)
Output:
- Metric vectors (6 metrics, each a time series)
- Domain topology graph (nodes = generic domain labels, edges = transition frequency)
- Embedding centroids per domain (not individual prompt embeddings)
- Session metadata (timestamps, durations, no content)
What NEVER leaves the machine:
- Raw prompt text
- AI response text
- Individual prompt embeddings (only centroids)
- Any personally identifiable information
- Project names, company names, people names
- Code, credentials, file paths
### Layer 1: Encrypted Transit
The metric vectors and topology graph upload encrypted (AES-256-GCM, user-held key).
Server stores: Encrypted blobs. Cannot compute on them without user's key.
For matching/search: User authorizes computation by providing a derived key that allows specific operations (comparison, ranking) without decrypting the full profile.
Alternative: compute metrics locally, upload only the final scores (not vectors). Simpler, less functionality, maximum privacy.
Layer 2: Consent-Gated Twin
The cognitive twin is trained/configured locally. When a hiring manager requests a session:
1. Hiring manager sends session request via platform
2. User receives notification with: company name, role, specific questions (optional)
3. User approves/denies
4. If approved: user's local twin endpoint becomes available for N minutes
5. Session is logged (questions + twin responses) and visible to user
6. User can revoke access at any time
For always-on profiles (like Mohamed's demo):
- Twin runs on user's infrastructure (or platform-hosted with user's key)
- User sets access level: public (anyone), authenticated (logged-in companies), invite-only
- All sessions logged and visible to profile owner
Layer 3: Audit Trail
Every interaction with a user's cognitive profile is logged:
{
"timestamp": "2026-04-15T14:30:00Z",
"accessor": "company:acme-corp",
"accessor_role": "hiring_manager",
"action": "twin_session",
"duration_seconds": 1847,
"questions_asked": 12,
"domains_touched": ["distributed_systems", "ml_training", "architecture"],
"user_notified": true
}Users see a dashboard of who accessed what, when, and what domains were explored. Full transparency.
Data Retention
- Raw exports: never stored on platform servers. Local only.
- Metric vectors: stored encrypted. User can delete at any time. Deletion is permanent and verified.
- Twin model weights: stored encrypted. User can delete at any time.
- Session logs: retained for 90 days, then auto-deleted unless user opts to keep.
- Audit trail: retained for 1 year for user's reference.
Threat Model
| Threat | Mitigation |
|---|---|
| Platform breach exposes profiles | Encrypted at rest with user-held keys. Breach yields ciphertext only. |
| Hiring manager records twin session | Sessions are logged. Terms of service prohibit recording. Twin responses include watermarks. |
| Reverse-engineering raw prompts from metrics | Metrics are aggregate statistics (means, distributions, topology). Cannot reconstruct individual prompts from centroids. |
| User coerced to share more data | Consent gates are technical, not policy. The system cannot share what it doesn't have. |
| Discrimination via cognitive profiling | Metrics are domain-agnostic. No demographic data. No name, age, gender, location in the profile. Twin doesn't know these either. |
Compliance Considerations
- GDPR: Right to deletion (full profile wipe), right to access (audit trail), data minimization (only metrics, not raw text), consent management (per-session approval)
- CCPA: Same deletion/access rights. No sale of personal data (metrics are derived, anonymized)
- SOC 2: Encryption at rest and transit, audit logging, access controls
- AI Act (EU): Transparency requirements for AI in hiring. The provenance panel satisfies "explainability" by showing which interaction clusters informed twin responses.
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
cognitive-hire/docs/privacy-architecture.md
Detected Structure
Method · Figures · Architecture