⚠️This article is not yet available in your language. Showing the English version.
Product · Technology

Why RobotCheck updates its rules
automatically

AI crawlers evolve faster than any documentation. A website checked once and then left alone is soon being tested against outdated rules — and nobody notices.

Andreas Reuter · May 2026 · 7 min read

The problem: rules go stale, websites don't

When RobotCheck.coffee launched, llms.txt was just a proposal on a handful of blogs. Today Anthropic, OpenAI and Google each have their own crawler policies, their own bot names and their own directives — and all three have updated their documentation multiple times since then.

The classic problem with analysis tools: the rules a website is checked against are already outdated by the time the check runs. The user sees a green tick and assumes everything is fine. But a new directive may already be missing — one that would block or allow ClaudeBot or GPTBot.

Checking a website against outdated AI rules is like parking assistance with a 2019 map — the world has moved on.

The solution: a living rule catalog

RobotCheck.coffee solves this with a fully automated update mechanism: once a week, a job fetches the official sources from AI providers — Anthropic, OpenAI, Google, llmstxt.org — extracts new or changed rules, compares them against the existing catalog, and only writes a new version when something has genuinely changed.

// How the pipeline works

Weekly Update Pipeline
01
Cron trigger
Server cronjob
every Monday, 06:00
02
Fetch sources
Anthropic, OpenAI,
Google, llmstxt.org
03
Claude API
Rule extraction
as structured JSON
04
Semantic diff
New vs. known
rules compared
05
New version
rules.json + CHANGELOG
only on real changes

The key point in step 5: no commit without a change. If the sources are identical, nothing happens. The version number only increments when rules differ — a new directive appears, an old one disappears, or a severity level changes.

// Why every rule has a stable ID

Every rule gets an immutable ID like robots-gptbot-disallow or llmstxt-contact-url-present. This is what makes the diff reliable: if a rule's text changes, the ID stays the same — the analyzer recognizes it: "I know this rule, but it looks different now." Without stable IDs, every run would produce a flood of apparently new rules.

Versions: what changed and when

Each new version of the rule catalog lands in a versioned CHANGELOG. This makes it possible to trace when a rule first appeared — and which source triggered it:

data/CHANGELOG.mdRobotCheck.coffee — Rule Changelog
v0.4.02026-05-12llmstxt.org
  • +llmstxt-optional-section-present (info)
  • +llmstxt-api-docs-linked (warning)
  • ~llmstxt-description-length — min. 80 chars
v0.3.02026-04-28Anthropic Docs
  • +robots-claudebot-explicit-allow (warning)
  • ~robots-ai-crawlers-present — ClaudeBot → critical

This gives users confidence: if a score suddenly drops even though nothing changed on the website, the changelog shows which new rule triggered it — and since when it has been in effect.

Where to see rule versions

The public Changelog on RobotCheck.coffee shows every version with date, number of changed rules and source reference. Every analysis report also shows in the bottom right which rule version the check was run against — so a saved report remains traceable three months later.

robotcheck.coffee/changelog
Rule Changelog
Current rule version: v0.4.0 · 11 rules active
v0.4.0
12 May 2026
+2 new rules (llmstxt.org) · 1 change
Current
v0.3.0
28 Apr 2026
+1 new rule (Anthropic ClaudeBot) · 1 change
Archive
v0.2.0
7 Apr 2026
Initial base: 8 rules · 2× Google-Extended
Archive

An additional benefit of the public changelog page: it shows how actively the standard is developing. Regular visitors get a sense of the direction AI-readability is heading — and when a re-analysis of their own website would be worthwhile.

What's coming next

Soon: Optional email notifications when a new rule version is released — with a direct link to re-analyze your saved websites.

Later: A public API for the rule catalog, so other tools can use the same living standard. Open source like the rest of RobotCheck.

☕ Support RobotCheck.coffee

The tool is free — running it isn't quite. If you find the work on the AI-readable web useful, a coffee helps keep the pipeline running.

Thank you — every contribution goes directly to API costs and hosting.

Check my website →