<div style="background: #7C3AED; color: white; padding: 8px 16px; border-radius: 6px; font-weight: 600; margin-bottom: 24px;">API MODE — uses your Anthropic API key.</div>

Mode A · API

Runs synthesis + query against the Anthropic API directly. Faster than Agent mode on large corpora (batch API + prompt cache), costs money per token, doesn't require Claude Code to be running.

Status

Scaffolding has shipped — the prompt caching + batch primitives live in llmwiki/cache.py. The full backend lands in #315 · feat: Mode A claude-api synthesis backend with prompt caching.

Until #315 merges, use the Ollama backend as a stand-in: Tutorial 08 — Synthesize with Ollama.

Setup (once #315 ships)

# In your repo root:
echo "ANTHROPIC_API_KEY=sk-ant-…" >> .env

// sessions_config.json
{
  "synthesis": {
    "backend": "claude-api",
    "claude_api": {
      "model": "claude-sonnet-4-6",
      "max_retries": 3
    }
  }
}

Daily flow

llmwiki synthesize --estimate      # cost preview
llmwiki synthesize                 # batch run with prompt caching
llmwiki build && llmwiki serve --open

Cost model

Prompt prefix (CLAUDE.md + wiki/index.md + wiki/overview.md) is cached — one write on the first call, free reads on every subsequent call.
See prompt caching reference for the token math.
synthesize --estimate gives you an incremental-vs-full-force breakdown before you spend money.

When to pick this mode