June 2026

I Let an AI Build My Research Library. Then It Invented a Law That Didn’t Exist.

How a weekend experiment based on a Karpathy idea grew into a knowledge pipeline that fact-checks itself — and what that taught me about using AI for serious work.

A self-maintaining research library compiled by an AI. Three weeks of compounding knowledge. And then, one day, a regulation that had never existed — confident, cross-linked, and present on seventeen pages.

Author Joel Seignior Published 13 June 2026 Read 7 min

A note before you start: the knowledge base described here is AI-compiled. So, in part, is this article. That’s the whole point — the interesting question was never “can an AI write this?” It obviously can. The interesting question was “how do I make what it writes trustworthy enough to stake my professional reputation on?” This is the answer I arrived at. Treat every specific claim as something a human (me) has governed, not something an AI has guaranteed.

It started as a Karpathy side-project.

01 — the original pattern

Andrej Karpathy described a simple, elegant idea: keep your knowledge as plain markdown files, and let an LLM do the grunt work of compiling, cross-referencing, filing and book-keeping. You curate the sources and ask the questions; the model summarises, links, and maintains.

The human’s job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM’s job is everything else.
Andrej Karpathy

I’m a construction and commercial lawyer. Tinkering with AI isn’t my job — it’s a side project. But you can’t practise in this profession right now and not watch AI arriving in it, and I wanted to actually understand the field rather than skim headlines about it. The trouble is it changes weekly: vendors pivot, valuations move, regulations shift, benchmarks get superseded. A self-maintaining research library was exactly the tool I wanted. So I built one, almost exactly to Karpathy’s pattern.

raw/

My sources. Articles, reports, case studies, regulation, product docs, my own meeting notes. The AI reads these; it never edits them.

wiki/

The compiled output. Concept pages, market profiles, playbooks, all interlinked. The AI writes these; I read them.

CLAUDE.md

The rules. A schema that tells the AI how the wiki works. The AI and I co-evolve it.

Drop a source in, say “ingest this,” and a single article would ripple out into five to fifteen interlinked pages. Ask a question, and good answers got filed back as new pages. The knowledge compounded. For a few weeks it felt like magic.

Then it tried to lie to me — confidently, and consistently.

02 — the fabrication

The magic broke the day I actually fact-checked it. The AI-generated synthesis notes I’d been feeding back in — tidy summary memos the model had written from other sources — were quietly seeding errors. A wrong launch date here. A job title that promoted a COO to CEO there. A headline figure no primary source actually supported. Small things. But the wiki had ingested them as fact, cross-linked them, and propagated them across pages. Each copy agreed with every other copy, which made the whole thing look authoritative.

Then came the one that rattled me. An audit found that a synthesis note had fabricated a regulation outright — a specific “technology-competence duty” with a commencement date, a regulator, and disciplinary consequences. None of it existed. The model had fused a few real things — a real money-laundering deadline, a real professional-conduct statement, an American rule with no local equivalent — into a plausible, dated, official-sounding obligation. It had propagated into seventeen pages, worked its way into material I was preparing to share, and survived three previous reviews — precisely because it was consistent everywhere it appeared. Nothing contradicted it, so nothing flagged it.

Consistency is not corroboration. Cross-page agreement is evidence of a common origin, not of truth. A fabrication copied into seventeen places never disagrees with itself.
from section 02 — what the fabricated regulation taught me

A knowledge base that compounds is also a knowledge base that compounds its errors. The thing Karpathy’s pattern gave me for free — effortless propagation — was the exact mechanism that turned one hallucination into seventeen.

The thing I was afraid of is the thing everyone’s afraid of.

03 — the reliability problem

Here’s what makes this more than a personal anecdote. In March 2026, Anthropic published one of the largest studies ever done on what people actually want — and fear — from AI.

81,000

People surveyed. The single biggest concern — ahead of job loss and privacy — was unreliability.

159

Countries represented. The occupational group that named unreliability most often: lawyers. Close to half flagged it.

That tracks. The entire value of a lawyer is that you can rely on what they tell you. An AI that is confidently wrong isn’t a productivity tool for that person — it’s a liability with a fast typing speed. The fear is rational, it’s widespread, and in my profession it’s nearly universal.

But here’s the move most of the hand-wringing misses. The reliability problem is not a reason to keep AI at arm’s length. It’s a specification for how to build with it. And — the counter-intuitive part — the most effective tool for fighting AI’s unreliability is AI itself. Two moves do most of the work.

Ground it in data you curated, not the open model. My system’s first rule is that the AI compiles only from a folder of sources I chose, and is forbidden from filling gaps with “general knowledge.” A model answering from a controlled corpus you assembled is a different, far more trustworthy animal than a model free-associating from everything it absorbed in training. The curation is the moat.

Build the verification with AI, too. Every check I’m about to describe — cross-referencing a figure across sources, confirming an executive’s title, testing whether a cited regulation actually exists — is run by the AI, on a leash, against rules I set. I didn’t hire a fact-checking team. I pointed the same technology that writes the draft at the job of attacking it.

So I stopped building a wiki and started building epistemics.

04 — the load-bearing pieces

The markdown-and-an-LLM skeleton is still Karpathy’s. But almost everything I’ve added since is about one question the original pattern doesn’t address: how do you know which claims to trust?

Confidence tags on every substantive claim.

Nothing is just “stated.” Each fact is tagged so you can read any page and instantly see how much weight it can bear. Right now, across around 100 pages, the honest tally is hundreds of flagged gaps and only a couple of pages a human has fully signed off. The system’s job is to make that visible, not to hide it.

A truth-check step before anything gets written.

When a new source comes in, its load-bearing claims — dates, dollar figures, named people and their titles, percentages, direct quotes — get verified first: grepped against existing sources, checked against the live web, executive titles confirmed against the company’s own site. Synthesis notes are guilty until proven innocent.

An existence check for anything that sounds official.

After the fabricated-regulation episode, any claim about a regulation, deadline, consultation, or “first/only” superlative has to be verified to exist in the outside world — on the regulator’s own site, not merely to be internally consistent. Consistency had already fooled me three times; I don’t let it vote anymore.

“Verify the discard.”

This one’s counter-intuitive and it’s saved me more than once: when the system rejects a claim as a probable error, that rejection is itself a claim, and it gets verified too. Scepticism has two failure modes — believing something false, and dismissing something true. I’ve caught both.

Registers that hold the system accountable.

A canonical value-register for volatile numbers (so the same figure can’t drift to a different value on a different page); a debunked-claims ledger enforced by a commit-time check that blocks them from re-entering; a source-trust register; and a health dashboard that tracks the ratio of verified-to-unverified claims over time, so “how trustworthy is this thing” is a trend line, not a vibe.

One more structural rule: every page splits what is (sourced, tagged) from what it means (synthesis). The model compiles the first. The judgement in the second is mine. And a human gate for anything that goes outside — machine-compiled pages stay marked draft until a human flips them to reviewed, and only reviewed pages are cleared for any use beyond my own desk.

A worked example, from this week.

05 — the discipline in action

While writing this, I had the system chase down a productivity statistic — a “+1.8%” figure I’d seen attributed to a particular AI research paper. The truth-check couldn’t find it in that paper. Under the old regime, I’d have shrugged and dropped the number.

Under “verify the discard,” dropping it wasn’t allowed without proof. So the system went and read the cited paper in full — and discovered two things. First, the figure genuinely wasn’t there; it belonged to a different paper by an overlapping author, which I then tracked down and added properly. Second — and this is the part I’d never have caught otherwise — while reading the paper, it found a separate, pre-existing error sitting in my own wiki: a statistic I’d recorded months ago as “4.5× more likely” when the source actually said “almost fourfold.” I’d misread a percentage (4.5%) as a multiple (4.5×). One had been quietly wrong for months.

Chasing one number I’d nearly discarded surfaced a different error entirely. That’s the whole philosophy in one episode: the discipline isn’t there to make the AI look good. It’s there to make the AI catch itself.

What it is now.

06 — the transferable lesson

It started as “an LLM that tidies my markdown.” It’s become something closer to a knowledge pipeline with a conscience — an ingestion process with verification gates, an audit trail, a propagation guard, and an explicit, visible epistemics. The AI still does the enormous grunt work Karpathy promised: reading, summarising, cross-linking, filing. I’d never go back to doing that by hand. But the value I’ve actually added on top isn’t more automation. It’s doubt, made systematic.

If there’s a transferable lesson for anyone pointing AI at serious work, it’s this: the hard part was never getting the machine to produce. It was getting it to tell me, reliably and legibly, which of the things it produced I’m allowed to believe. Build that layer first. The output is only as valuable as your ability to trust it — and trust, it turns out, is the part you still have to engineer yourself.

A note on this knowledge base

This knowledge base is a personal working tool, not a published reference. It is AI-compiled and human-governed; most pages are unreviewed drafts by design, and every claim carries a confidence tag for exactly that reason. The method is what I’m sharing here — not the contents as gospel. If you’re building something similar and want to compare notes, get in touch.

Anthropic survey figures from “What 81,000 People Want from AI,” March 2026; headline figures corroborated by independent press. The Karpathy framing draws on his public writing on flat-file knowledge systems.