Grand Diomande Research · Full HTML Reader

Privacy Architecture

**Processing**: 1. Parse conversations into individual prompts with timestamps 2. Strip all named entities (NER pass): names, companies, URLs, emails, phone numbers, addresses 3. Strip code snippets containing credentials, API keys, file paths with usernames 4. Generate embeddings locally (or via privacy-preserving API with no logging) 5. Compute all 6 cognitive metrics locally 6. Cluster prompts into domain topics using embedding similarity 7. Label clusters with generic domain tags (not project-specific names)

Agents That Account for Themselves architecture technical paper candidate score 34 .md

Full Public Reader

Privacy Architecture

Principle

Raw thought stays local. Only the shape of thought travels.

4-Layer Model

### Layer 0: Local Extraction
Runs entirely on the user's machine. No network calls.

Input: Raw AI conversation exports (ChatGPT JSON, Claude history, Gemini data)

Processing:
1. Parse conversations into individual prompts with timestamps
2. Strip all named entities (NER pass): names, companies, URLs, emails, phone numbers, addresses
3. Strip code snippets containing credentials, API keys, file paths with usernames
4. Generate embeddings locally (or via privacy-preserving API with no logging)
5. Compute all 6 cognitive metrics locally
6. Cluster prompts into domain topics using embedding similarity
7. Label clusters with generic domain tags (not project-specific names)

Output:
- Metric vectors (6 metrics, each a time series)
- Domain topology graph (nodes = generic domain labels, edges = transition frequency)
- Embedding centroids per domain (not individual prompt embeddings)
- Session metadata (timestamps, durations, no content)

What NEVER leaves the machine:
- Raw prompt text
- AI response text
- Individual prompt embeddings (only centroids)
- Any personally identifiable information
- Project names, company names, people names
- Code, credentials, file paths

### Layer 1: Encrypted Transit
The metric vectors and topology graph upload encrypted (AES-256-GCM, user-held key).

Server stores: Encrypted blobs. Cannot compute on them without user's key.

For matching/search: User authorizes computation by providing a derived key that allows specific operations (comparison, ranking) without decrypting the full profile.

Alternative: compute metrics locally, upload only the final scores (not vectors). Simpler, less functionality, maximum privacy.

Layer 2: Consent-Gated Twin

The cognitive twin is trained/configured locally. When a hiring manager requests a session:

1. Hiring manager sends session request via platform
2. User receives notification with: company name, role, specific questions (optional)
3. User approves/denies
4. If approved: user's local twin endpoint becomes available for N minutes
5. Session is logged (questions + twin responses) and visible to user
6. User can revoke access at any time

For always-on profiles (like Mohamed's demo):
- Twin runs on user's infrastructure (or platform-hosted with user's key)
- User sets access level: public (anyone), authenticated (logged-in companies), invite-only
- All sessions logged and visible to profile owner

Layer 3: Audit Trail

Every interaction with a user's cognitive profile is logged:

{
  "timestamp": "2026-04-15T14:30:00Z",
  "accessor": "company:acme-corp",
  "accessor_role": "hiring_manager",
  "action": "twin_session",
  "duration_seconds": 1847,
  "questions_asked": 12,
  "domains_touched": ["distributed_systems", "ml_training", "architecture"],
  "user_notified": true
}

Users see a dashboard of who accessed what, when, and what domains were explored. Full transparency.

Data Retention

  • Raw exports: never stored on platform servers. Local only.
  • Metric vectors: stored encrypted. User can delete at any time. Deletion is permanent and verified.
  • Twin model weights: stored encrypted. User can delete at any time.
  • Session logs: retained for 90 days, then auto-deleted unless user opts to keep.
  • Audit trail: retained for 1 year for user's reference.

Threat Model

ThreatMitigation
Platform breach exposes profilesEncrypted at rest with user-held keys. Breach yields ciphertext only.
Hiring manager records twin sessionSessions are logged. Terms of service prohibit recording. Twin responses include watermarks.
Reverse-engineering raw prompts from metricsMetrics are aggregate statistics (means, distributions, topology). Cannot reconstruct individual prompts from centroids.
User coerced to share more dataConsent gates are technical, not policy. The system cannot share what it doesn't have.
Discrimination via cognitive profilingMetrics are domain-agnostic. No demographic data. No name, age, gender, location in the profile. Twin doesn't know these either.

Compliance Considerations

  • GDPR: Right to deletion (full profile wipe), right to access (audit trail), data minimization (only metrics, not raw text), consent management (per-session approval)
  • CCPA: Same deletion/access rights. No sale of personal data (metrics are derived, anonymized)
  • SOC 2: Encryption at rest and transit, audit logging, access controls
  • AI Act (EU): Transparency requirements for AI in hiring. The provenance panel satisfies "explainability" by showing which interaction clusters informed twin responses.

Promotion Decision

Promote into a technical note or architecture paper with implementation anchors.

Source Anchor

cognitive-hire/docs/privacy-architecture.md

Detected Structure

Method · Figures · Architecture