Why RobotCheck updates its rules
automatically
AI crawlers evolve faster than any documentation. A website checked once and then left alone is soon being tested against outdated rules — and nobody notices.
The problem: rules go stale, websites don't
When RobotCheck.coffee launched, llms.txt was just a proposal on a handful of blogs. Today Anthropic, OpenAI and Google each have their own crawler policies, their own bot names and their own directives — and all three have updated their documentation multiple times since then.
The classic problem with analysis tools: the rules a website is checked against are already outdated by the time the check runs. The user sees a green tick and assumes everything is fine. But a new directive may already be missing — one that would block or allow ClaudeBot or GPTBot.
Checking a website against outdated AI rules is like parking assistance with a 2019 map — the world has moved on.
The solution: a living rule catalog
RobotCheck.coffee solves this with a fully automated update mechanism: once a week, a job fetches the official sources from AI providers — Anthropic, OpenAI, Google, llmstxt.org — extracts new or changed rules, compares them against the existing catalog, and only writes a new version when something has genuinely changed.
// How the pipeline works
every Monday, 06:00
Google, llmstxt.org
as structured JSON
rules compared
only on real changes
The key point in step 5: no commit without a change. If the sources are identical, nothing happens. The version number only increments when rules differ — a new directive appears, an old one disappears, or a severity level changes.
// Why every rule has a stable ID
Every rule gets an immutable ID like robots-gptbot-disallow or llmstxt-contact-url-present. This is what makes the diff reliable: if a rule's text changes, the ID stays the same — the analyzer recognizes it: "I know this rule, but it looks different now." Without stable IDs, every run would produce a flood of apparently new rules.
Versions: what changed and when
Each new version of the rule catalog lands in a versioned CHANGELOG. This makes it possible to trace when a rule first appeared — and which source triggered it:
This gives users confidence: if a score suddenly drops even though nothing changed on the website, the changelog shows which new rule triggered it — and since when it has been in effect.
Where to see rule versions
The public Changelog on RobotCheck.coffee shows every version with date, number of changed rules and source reference. Every analysis report also shows in the bottom right which rule version the check was run against — so a saved report remains traceable three months later.
An additional benefit of the public changelog page: it shows how actively the standard is developing. Regular visitors get a sense of the direction AI-readability is heading — and when a re-analysis of their own website would be worthwhile.
What's coming next
Soon: Optional email notifications when a new rule version is released — with a direct link to re-analyze your saved websites.
Later: A public API for the rule catalog, so other tools can use the same living standard. Open source like the rest of RobotCheck.
☕ Support RobotCheck.coffee
The tool is free — running it isn't quite. If you find the work on the AI-readable web useful, a coffee helps keep the pipeline running.
Thank you — every contribution goes directly to API costs and hosting.