← Docs hub

08 · Synthesize wiki pages with Ollama

Time
15 min (including Ollama install).
You'll need
Ollama running locally + at
Result
every new session's wiki/sources/<slug>.md is synthesized

least 8 GB RAM for llama3.1:8b. by a local LLM instead of the dummy backend — no API key, no bill.

Why this matters

The default synthesis.backend is "dummy" — fast, deterministic, but the page it produces is a skeleton. Good for tests, not for reading. When you want actual LLM-written summaries without a Claude / OpenAI API key, point the pipeline at a local Ollama model.

Steps

1. Install Ollama

macOS:

brew install ollama
ollama serve &   # background daemon on 127.0.0.1:11434

Linux: curl -fsSL https://ollama.com/install.sh | sh then systemctl enable --now ollama.

2. Pull a model

ollama pull llama3.1:8b     # 4.7 GB, fits on 8 GB RAM
# or
ollama pull mistral:7b      # 4.1 GB

3. Configure sessions_config.json

Create or edit sessions_config.json at your repo root:

{
  "synthesis": {
    "backend": "ollama",
    "ollama": {
      "model": "llama3.1:8b",
      "base_url": "http://127.0.0.1:11434",
      "timeout": 60,
      "max_retries": 3
    }
  }
}

Defaults if omitted: llama3.1:8b at 127.0.0.1:11434, 60s timeout, 3 retries with exponential backoff.

4. Check the backend

llmwiki synthesize --check

Expected:

Backend: OllamaSynthesizer
Available: True

If Available: False, Ollama isn't running — ollama serve &.

5. Estimate cost (dry-run math)

llmwiki synthesize --estimate

Ollama backend cost is $0 — the estimator still shows token counts so you can compare against an API run later:

Corpus:                785 sessions in raw/sessions/
Synthesized (history): 714 already in wiki/sources/
New since last run:    71

Prefix: 3,944 tok  Model: llama3.1:8b  (local, $0)

6. Run synthesize

llmwiki synthesize

Each new raw session becomes a wiki/sources/<project>/<YYYY-MM-DD>-<slug>.md. Expect 2–5 seconds per session on a modern MacBook.

7. Inspect the output

ls wiki/sources/<your-project>/ | head
cat wiki/sources/<your-project>/$(ls wiki/sources/<your-project>/ | head -1)

The frontmatter is the same as the dummy backend; the body is the LLM's prose (one-paragraph summary + key claims + connections).

Verify

llmwiki build && llmwiki serve --open

Browse to a session page; the body should be an actual summary, not the canned "Auto-synthesis — replace with actual quotes from the session" placeholder.

Troubleshooting

OllamaUnavailableError: connection refused

Ollama isn't running. ollama serve & or check lsof -i :11434.

OllamaHTTPError: 404 /api/generate

Old Ollama version. ollama --version — upgrade to 0.1.31+.

Synthesize is slow (>30s per session)

Use a smaller model: ollama pull llama3.1:8b-instruct-q4_0 and set "model": "llama3.1:8b-instruct-q4_0" in the config. The q4_0 quantization is ~2.3 GB and ~3× faster.

Model hallucinates facts about the session

Local models have lower accuracy. Run llmwiki lint after to catch the obvious hallucinations (frontmatter_validity, duplicate_detection).

For higher quality, switch to API mode — the Claude API backend is tracked under #315.

Next

See also

Edit on GitHub ↗