When Documentation Stops Rotting: From Karpathy's LLM Wiki to Software Engineering Documentation // Zhanwei Wang

Humans write the original documents and make decisions. LLMs handle synthesis, updates, and consistency checks. Documentation is no longer a static artifact that rots the moment it’s written — it becomes a living knowledge base continuously maintained by LLMs.

An Age-Old Problem

Every software engineer has experienced this: you join a project, open the docs, and find the README still describing a module deleted six months ago, API docs with fields that don’t match the code at all, and architecture diagrams showing a design from two versions back. You ask a colleague, who says, “Don’t bother with the docs — just read the code.”

Documentation decay doesn’t happen because engineers don’t want to write docs. It happens because the cost of maintaining them is too high. Writing a design document might take two hours, but every subsequent code change requires coming back to sync updates — checking if cross-references still hold, if terminology is consistent, if edge cases are still accurate. This tedious bookkeeping wears people down, and eventually gets abandoned.

Karpathy’s Insight: Knowledge Compilation, Not Knowledge Retrieval

In 2025, Andrej Karpathy made an important observation about LLMs and knowledge management. His core claim is surprisingly simple:

Don’t make the LLM rediscover patterns from raw documents every time. Instead, have it build and maintain a structured knowledge base where knowledge accumulates continuously.

This is the LLM Wiki pattern. It has a key difference from traditional RAG (Retrieval-Augmented Generation) — RAG forces the LLM to rebuild understanding from scratch with every query, while the Wiki pattern treats synthesis itself as a first-class artifact. Knowledge accumulates because cross-references are already established, contradictions are already flagged, and relationships between concepts are already mapped.

Karpathy outlined a three-layer architecture:

Raw Sources: Human-curated immutable documents — papers, articles, notes. The LLM never touches this layer.
Wiki: LLM-generated and maintained Markdown pages — summaries, entities, concepts, comparisons. The LLM fully owns the structure and content of this layer.
Schema: Configuration files that define wiki structure, naming conventions, and workflows, turning the LLM into a disciplined maintainer rather than a general-purpose chatbot.

And three core operations: Ingest (absorb new sources and update the wiki), Query (search and write valuable findings back to the wiki), and Lint (health checks — find contradictions, orphan pages, missing references).

He also proposed a profound division of labor:

The human’s job is to curate sources, guide analysis, and ask good questions. The LLM’s job is everything else.

That “everything else” happens to be exactly what kills human wikis — updating summaries, maintaining cross-references, checking consistency. This work is tedious drudgery for humans, but reliable and tireless routine for LLMs.

Bringing the LLM Wiki to Software Engineering

Karpathy’s LLM Wiki is domain-agnostic — he uses it to manage paper reading notes and technical research. But when we examine documentation management in software engineering, we find this pattern has a natural fit, while also requiring domain-specific adaptation.

Why Software Documentation Is Especially Suited to This Pattern

Software project documentation has several unique characteristics:

Documentation has a strict correspondence with code. API docs must describe endpoints that exist in the code; data model docs must have fields that match the schema. This correspondence is verifiable — meaning lint can not only check internal documentation consistency but also deeply audit how well docs match the code.
Documentation has a clear lifecycle. A design spec evolves during implementation and stabilizes after completion; an Architecture Decision Record (ADR) doesn’t change once made. Software documentation follows predictable classification and change patterns.
Version control is natural infrastructure. Nearly all software projects use Git, meaning every documentation change automatically has a complete diff history, and git log and git diff can serve as signal sources for automatically discovering changes.
AI coding agents are natural consumers. In the age of AI coding, documentation serves not only human developers but also AI coding agents directly. A structured knowledge base lets AI agents understand design intent and constraints before modifying code.

From Three Layers to Three Layers: Domain Adaptation

We retained Karpathy’s three-layer architecture but adapted it for software engineering:

Layer 1: docs/raw/ — Raw Document Archive

Karpathy’s “Raw Sources” naturally organize by document type in software engineering: design specs, implementation plans, architecture decision records (ADR), API docs, product requirements (PRD), meeting notes… Each file has a date prefix and metadata frontmatter marking its source and classification.

A key design decision: raw is not absolutely immutable but evolves in a controlled manner. Karpathy’s raw sources (papers, articles) are truly immutable — a paper doesn’t change after publication. But software engineering specs and plans evolve during implementation. Rather than letting docs drift from reality, we provide a formal change channel with audit trails. Every change is tracked through both git history and audit logs, preserving the complete evolution.

Layer 2: docs/wiki/ — LLM-Maintained Current Knowledge Base

This layer faithfully inherits Karpathy’s design philosophy: the LLM fully owns this directory’s structure and content. Wiki pages have no date prefix because they represent “current state” rather than “a point-in-time record.” Each page traces back to raw documents through a sources field, establishing a traceable knowledge chain.

The relationship between Wiki and Raw is like materialized views and source tables in a database — the wiki is a synthesized projection of raw, always reflecting the latest integrated understanding.

Layer 3: docs/schema.md + docs/README.md — AI Entry Point

Schema defines the system’s conventions — directory structure, document format, classification system, operation contracts. README is the wiki’s navigation index. Together they form the entry point for AI coding agents: an agent reads these two files to decide which wiki pages to explore further.

Beyond Ingest-Query-Lint: Five Operations

Karpathy defined three core operations. We extended these to five to accommodate software engineering workflows:

Init — One-click initialization of the documentation system in any project. Works from a cold start with no pre-existing documents needed.
Ingest — Inherits Karpathy’s design but adds auto-discovery: when run without parameters, it uses git diff to automatically find new document-like files on the current branch, reducing manual friction.
Update — An operation specific to software engineering. When a spec needs to evolve during implementation, update performs in-place modification with audit trails rather than creating new files. The default auto-discovery mode analyzes git commit history to find which raw documents may need updating.
Lint — Inherits Karpathy’s health checks but adds deep document-code auditing. Beyond checking wiki link validity and inter-page contradictions, it cross-references claims in documentation (API endpoints, data models, configuration items) against actual code.
Query — Inherits Karpathy’s query and write-back mechanism but adds Agent mode: before modifying a file, an AI coding agent can request design context for that file — design intent, constraints, historical decisions — output in a structured format for direct consumption.

Two Key Design Philosophies

Git as the signal source. We didn’t build an independent change tracking system. Instead, we use Git itself as the signal source for change discovery. git diff tells us what happened on a branch; git log tells us about code changes during implementation. This means documentation maintenance can become a “one-click sync” operation — run it after completing a feature, and the system automatically analyzes changes and suggests which documents need updating.

Progressive enrichment. The system works from a cold start without importing all documents at once. Each time you use ingest to add a document, the wiki gains more knowledge; each time lint finds an inconsistency and it gets fixed, the system becomes more accurate. As usage accumulates, the documentation system grows from blank to a rich project knowledge base. This aligns perfectly with Karpathy’s vision of “knowledge accumulating continuously.”

A More Fundamental Shift

Back to the original question: why does documentation rot?

Because the marginal cost of maintaining documentation is always borne by humans — and human attention is a scarce resource. Karpathy’s insight is that LLMs change this equation. The tedious work that kills human wikis — updating cross-references, syncing summaries, checking consistency — is exactly what LLMs excel at and never tire of.

This isn’t about replacing humans with AI for writing documentation. Quite the opposite — humans still handle the most critical parts: deciding what to build, why to build it, and how to make trade-offs. These decisions, intentions, and judgments are the truly valuable content in documentation. The LLM’s job is to weave these scattered decisions into a coherent, continuously updated knowledge network.

When we apply this philosophy to software engineering, what we get isn’t just “documentation that doesn’t rot” but a living knowledge base that co-evolves with code — new developers can quickly understand the project’s design lineage through it, AI coding agents can obtain rich design context before writing code, and every code change can potentially trigger synchronized documentation updates.

Documentation finally has a chance to become what it was always meant to be: a team’s shared memory, not an outdated snapshot.

The software engineering documentation system described in this article is open source: llm-docs.

Table of Contents