Mohamed Diomande

Full HTML reader

Read the full artifact

Extracted abstract or opening context

**Date:** 2026-02-18 **Host:** Mac4 — Apple M4, 16GB RAM, macOS 15.6 **Ollama:** v0.16.2 **Task:** Routing classification (5-category: triage, coding, architecture, planning, ops) - 10 routing classification prompts per model - Temperature: 0, deterministic outputs - Measured: tokens/sec (generation), RSS memory, routing accuracy - Ground truth defined for each prompt - API: `http://[ip]:11434/api/generate` (non-streaming) | Model | Size | Accuracy | Gen Speed | RSS Memory | Passes Criteria? | |-------|------|----------|-----------|------------|-------------------| | **llama3.2:3b** | 2.0 GB | 7/10 (70%) | **71.3 tok/s** ✅ | ~2.2 GB ✅ | ❌ accuracy | | **qwen3:4b** | 2.5 GB | 6/10 (60%) | 29.0 tok/s ❌ | ~3.3 GB ✅ | ❌ speed + accuracy | | **gemma3:4b** | 3.3 GB | 8/10 (80%) | 44.3 tok/s ❌ | ~2.8 GB ✅ | ❌ speed + accuracy | | qwen3:30b-a3b | 18 GB | — | — | — | ❌ too large (18GB > 16GB RAM) | - **Fastest by far** — exceeds 50 tok/s threshold - Struggles with triage vs ops distinction - Outputs single words cleanly (no thinking overhead) - **Critical flaw:** Built-in "thinking" mode generates 200-500 internal reasoning tokens before answering - Most of the 500-token budget consumed by `<think>...</think>` blocks - Actual answer often truncated or empty - Not suitable for fast routing without disabling thinking (which Ollama doesn't fully support)

Promotion decision

What has to happen next

Attach run IDs, datasets, metrics, and reproduction commands.

Why this is not always a full paper yet

Corpus pages are public-safe readers for discovered workspace artifacts. They are not automatically final papers. A corpus item becomes a polished paper only after the editable source, evidence checkpoints, references, figures, render path, and release status are attached through the paper schema.