Back to corpus
proposalexperiment writeup candidatescore 30
discrawl Spec
- build a local-first Discord guild crawler - mirror all guild data the configured bot can access - store it in SQLite - support fast text search, semantic search, and raw SQL - support one-shot backfill and long-running live sync
Full HTML reader
Read the full artifact
Extracted abstract or opening context
- build a local-first Discord guild crawler - mirror all guild data the configured bot can access - store it in SQLite - support fast text search, semantic search, and raw SQL - support one-shot backfill and long-running live sync
This spec is intentionally detailed so an agent can keep shipping without re-asking foundational questions.
- one guild at a time - all accessible text channels - all accessible announcement channels - all accessible forum channels and their posts - all accessible public threads - all accessible private threads - archived thread coverage - full message history - current member snapshot - FTS5 search - optional OpenAI embeddings with local vector search - raw SQL access
- personal-account DMs - reactions as primary indexed entities - attachment blob downloads by default - cross-guild unified sync UX - write-back or moderation actions
- config format: `TOML` - config location: `[home-path]` - DB location: `[home-path]` - cache dir: `[home-path]` - log dir: `[home-path]` - token source: reuse Molty / existing OpenClaw Discord bot config - guild model: one guild in CLI UX, multi-guild-ready schema - search: hybrid, with FTS first and embeddings optional - embedding provider: OpenAI - API key source: `OPENAI_API_KEY` from shell env - message retention: current canonical row + append-only event log - member retention: current snapshot only - files: metadata only in DB, fetch binaries later on demand - reactions: not important for V1 - polls: flatten into text during normalization
Promotion decision
What has to happen next
Attach run IDs, datasets, metrics, and reproduction commands.
Why this is not always a full paper yet
Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.