Architecture
llmwiki has two overlapping structures:
- The Karpathy three-layer wiki (conceptual):
raw/→wiki/→site/ - The eight-layer build (implementation): how responsibilities are distributed across Python modules, HTML templates, scripts, CI, etc.
This document covers both.
Layer 1: Karpathy's three-layer wiki
From the original LLM Wiki gist:
raw/ IMMUTABLE source documents
↓ (llmwiki converts .jsonl → .md here)
wiki/ LLM-MAINTAINED pages
↓ (your coding agent writes here via /wiki-ingest)
site/ GENERATED static HTML
(llmwiki builds here via `llmwiki build`)
raw/ — immutable layer
Everything under raw/ is treated as source-of-truth. The converter writes to it; nothing else should. If a source is wrong, fix the converter, not the output.
The converter writes one markdown file per session under raw/sessions/<project>/<date>-<slug>.md. Each file has YAML frontmatter (project, started, model, tools_used, gitBranch, etc.) and a Conversation body rendered turn-by-turn.
wiki/ — LLM-maintained layer
Your coding agent owns this layer entirely. It writes via the Ingest Workflow in CLAUDE.md:
wiki/
├── index.md catalog of all pages, updated on every ingest
├── log.md append-only chronological record
├── overview.md living synthesis across all sources
├── sources/ one summary page per raw source (kebab-case slug)
├── entities/ people, projects, tools (TitleCase.md)
├── concepts/ ideas, frameworks, patterns (TitleCase.md)
└── syntheses/ saved query answers (kebab-case slug)
Pages interlink via [[wikilinks]]. Contradictions are recorded, not silently overwritten. Pages compound over time — every new source makes the wiki richer.
site/ — generated static layer
llmwiki build reads raw/ (and wiki/ if populated) and renders a complete static HTML site. Nothing here is hand-maintained. Safe to delete and regenerate any time.
Layer 2: The eight-layer build
Internally the code is organised into eight functional layers. Each layer has one clear responsibility, and each feature in docs/roadmap.md maps to exactly one layer.
┌──────────────────────────────────────────────────────┐
│ L7 CI / ops .github/workflows/ │
│ L6 Adapters llmwiki/adapters/ │
│ L5 Schema / docs CLAUDE.md, AGENTS.md, docs/ │
│ L4 Distribution setup.sh, .bat, .claude/ │
│ L3 Viewer script.js in build.py │
│ L2 Site build.py (HTML + CSS) │
│ L1 Wiki CLAUDE.md workflows │
│ L0 Raw llmwiki/convert.py │
└──────────────────────────────────────────────────────┘
L0 — Raw
Owner: llmwiki/convert.py
Reads .jsonl from the agent's session store (via an adapter), filters out noise records, runs redaction, normalises the output into markdown, and writes to raw/sessions/.
Key properties:
- Idempotent — mtime tracked in
.llmwiki-state.json - Privacy-first — username + API keys + tokens + emails redacted by default
- Live-session safe — skips files with a record younger than 60 minutes
- Agent-agnostic — delegates discovery to the adapter registry
L1 — Wiki
Owner: your coding agent, following CLAUDE.md / AGENTS.md
llmwiki does NOT write to wiki/ directly. The agent does, via slash commands (/wiki-ingest, /wiki-query, /wiki-lint) that execute the workflows in the schema file.
L2 — Site (HTML generator)
Owner: llmwiki/build.py
Converts every file under raw/sessions/ (and any hand-authored files under wiki/) into static HTML. Uses python-markdown (the only runtime dep) — syntax highlighting runs in the browser via highlight.js loaded from a pinned jsdelivr CDN (v0.5, #73), so the build pipeline itself stays stdlib-only. Writes to site/.
Pages rendered (v0.9 surface):
site/index.html— home with hero + 365-day activity heatmap + token-usage stat grid + recently-updated card + project grid with topic chipssite/projects/index.html— project grid with freshness badgessite/projects/<project>.html— per-project page with topics strip, 365-day heatmap (scoped), tool-calling bar chart, token timeline, main sessions + sub-agentssite/sessions/index.html— sortable sessions table with filter barsite/sessions/<project>/<slug>.html— per-session transcript with tool chart + token card + full conversationsite/models/index.html— sortable AI-model directory (v0.7, #55)site/models/<slug>.html— per-model info card + changelog timeline + pricing sparkline (v0.7, #56)site/vs/index.html— auto-generated vs-comparison index (v0.7, #58)site/vs/<a>-vs-<b>.html— side-by-side info table + benchmark chart + price deltasite/changelog.html—CHANGELOG.mdrendered as a first-class page (v0.4.2, #72)site/search-index.json— pre-built client-side search indexsite/sources/<project>/<slug>.md— copies of raw source for download- Plus AI-consumable exports:
llms.txt,llms-full.txt,graph.jsonld,sitemap.xml,rss.xml, per-page.txt+.jsonsiblings
L3 — Viewer (browser JS)
Owner: script.js (a string constant inside build.py)
Everything that happens in the browser, in vanilla JS:
- Theme toggle with
data-themeattribute + localStorage + system preference - Reading progress bar (scroll-linked CSS)
- Copy-as-markdown + copy-code buttons (Clipboard API +
document.execCommandfallback for HTTP) - Auto-collapse of long tool-result sections into
<details> - Cmd+K command palette (fuzzy search over
search-index.json) - Keyboard shortcuts:
/,g h,g p,g s,j,k,? - Sessions-table filter bar (project, model, date range, slug text)
Zero dependencies. No bundler. No framework. One file.
L4 — Distribution
Owner: the repo root + .claude-plugin/
How users install and run llmwiki:
setup.sh/setup.bat— one-click installsync.sh/sync.bat— wrappers aroundpython3 -m llmwiki syncbuild.sh/build.bat— wrappers aroundpython3 -m llmwiki buildserve.sh/serve.bat— wrappers aroundpython3 -m llmwiki serveupgrade.sh/upgrade.bat—git pull+ re-run setup.claude-plugin/plugin.json+marketplace.json— Claude Code plugin packaging.claude/commands/— 7 slash commands.claude/skills/— 5 auto-discoverable skillsllmwiki/mcp/— MCP server stub
L5 — Schema / docs
Owner: root-level markdown + docs/
Tells humans and agents how the system works:
CLAUDE.md— Claude Code schema with Ingest / Query / Lint workflowsAGENTS.md— Codex / OpenCode / Gemini mirror of the same.kiro/steering/— always-loaded contribution / format / verification rulesdocs/framework.md— Open Source Framework v4.1 adapted for llmwikidocs/research.md— Phase 1.25 research reportdocs/feature-matrix.md— 161 features across 16 categoriesdocs/roadmap.md— Phase × Layer × Item MoSCoW table
L6 — Adapters
Owner: llmwiki/adapters/
One file per agent. Each subclass of BaseAdapter does three things:
- Knows where the agent writes its session store
- Walks that store to discover
.jsonlfiles - Derives a friendly project slug from the path
Everything else (record parsing, filtering, redaction, rendering) is shared in convert.py.
L7 — CI / ops
Owner: .github/workflows/ + tests/
ci.yml— lint + tests + build smoke on every push + PR (Python 3.9 and 3.12 matrix)gitleaks.yml— secret scanpages.yml— build + deploy to GitHub Pages on tag push (Phase 6.5 Self-Demo)tests/fixtures/<agent>/— synthetic fixturestests/snapshots/<agent>/— expected markdown outputstests/test_*.py— pytest unit + snapshot tests
Adding an adapter
See framework.md §5.25 Adapter Flow for the full contract. TL;DR: one new file at llmwiki/adapters/<agent>.py, one fixture, one snapshot test, one doc page, one README line, one CHANGELOG entry.
Design principles
- Stdlib first. Runtime dep:
markdownonly. Nothing else. Syntax highlighting runs client-side via a CDN-loaded highlight.js (v0.5, #73) so the build stays deterministic and offline-capable. - Privacy by default. Redact everything sensitive before it hits disk.
- Idempotent everything. Re-running any command is safe and cheap.
- Localhost only. No network, no telemetry, no cloud. The user controls if/when to publish.
- One file per concern. build.py is one file, not a folder of templates. The whole HTML rendering lives there including CSS + JS.
- Agent-agnostic core.
convert.pydoesn't know which agent produced the .jsonl. Adapters translate.