AEO Fundamentals5 min read|

The llms.txt File: How to Tell AI Crawlers What to Cite

The llms.txt file is the emerging standard for telling AI crawlers what your site wants cited. Here is what to put in it and why it matters.

A technical SEO practitioner in a warm soft cardigan reviewing printed crawler directives and content priority lists

Key Highlights

  • llms.txt is a plain text file at the root of your domain that tells AI crawlers which pages and sections you want cited.
  • It is the AI era equivalent of robots.txt and sitemap.xml combined, designed specifically for large language model retrieval.
  • Adoption is climbing fast across major SaaS brands; early movers earn priority signal in retrieval indexes.
  • The file is trivial to ship (a few hundred lines of markdown), and the citation lift is measurable within weeks for content rich sites.

What llms.txt Actually Is

llms.txt is a proposed standard for communicating with AI crawlers and retrieval systems. It lives at the root of your domain (yoursite.com/llms.txt) and contains a structured markdown document that points to the most important content on your site.

Think of it as a curated table of contents written for AI agents. Instead of forcing a crawler to discover your best pages through link graphs and sitemaps, you hand it a prioritized list. "These are our docs. These are our case studies. These are our pricing pages. Here are our most authoritative blog posts."

The format was proposed in 2024 and has been adopted by Anthropic, Cloudflare, Stripe, and a growing list of SaaS companies. As of mid 2026, most major AI crawlers respect the directive even though it is not yet a hard standard.

Why Bother With Another Text File

Two reasons. First, AI crawlers are aggressive but indiscriminate. Without guidance they index everything, including stale pages, marketing fluff, and outdated documentation. The model then has to do the work of figuring out what is canonical.

Second, retrieval is increasingly snippet based. When ChatGPT answers a question, it pulls short passages from a handful of pages. The pages that get pulled are the ones that signal "this is the canonical source for this topic." llms.txt is one way to send that signal explicitly.

Brands that ship a clean llms.txt tend to see a measurable shift in which pages get cited. The pages they want surfaced (case studies, deep dives, technical docs) start showing up at higher rates while the pages they do not want quoted (legal boilerplate, press releases from five years ago) recede.

The Format

llms.txt is markdown. The structure is loose but the convention has stabilized around the following shape.

# Company Name

> One sentence description of what the company does.

Brief paragraph (2-4 sentences) giving important context about the company, products, and positioning. This is the part that gets quoted most often when models summarize the brand.

## Docs

- [Getting Started](https://example.com/docs/getting-started): Quickstart guide for new users.
- [API Reference](https://example.com/docs/api): Complete API documentation.
- [Webhooks](https://example.com/docs/webhooks): Event subscription guide.

## Guides

- [Authentication](https://example.com/guides/auth): OAuth and API key setup.
- [Best Practices](https://example.com/guides/best-practices): Production hardening checklist.

## Examples

- [Sample Apps](https://example.com/examples): Reference implementations in 6 languages.

## Optional

- [Changelog](https://example.com/changelog): Release notes.
- [Blog](https://example.com/blog): Engineering and product writing.

Sections marked "Optional" are treated by crawlers as lower priority. Sections at the top of the file (Docs, Guides) get the strongest signal.

What to Include and What to Skip

The temptation is to dump everything. Resist it. llms.txt works because it is curated.

IncludeSkip
Product documentationInternal team pages
PricingOld press releases
Case studiesAuthor archive pages
Top performing blog postsTag and category indexes
API referenceLogin and account pages
Security and compliance pagesCart and checkout flows
FAQ pagesDuplicate content variants

A practical rule of thumb: if you would not link to it from your homepage navigation, do not include it in llms.txt. The file should represent your site at its best.

llms-full.txt: The Expanded Variant

A complementary file, llms-full.txt, is gaining traction. Instead of just listing URLs, it inlines the full markdown content of your most important pages.

This is useful for sites where you want to guarantee the model has access to the canonical text even if the JavaScript heavy version of your site is hard to crawl. Anthropic publishes a llms-full.txt at claude.com that includes the full text of their docs.

The tradeoff is file size and freshness. A large llms-full.txt can hit hundreds of kilobytes and needs to be regenerated whenever the source content changes. For most brands, llms.txt alone is enough to start.

Common Mistakes

Three patterns we see when auditing client llms.txt files.

The kitchen sink file. Brands list every URL on the site. The result is no signal at all because everything is "important." Cut ruthlessly. A focused file with 30 well chosen entries beats a 500 entry dump.

The stale file. llms.txt is checked into a repo somewhere and never updated. After a year the listed pages drift away from current reality. Schedule a quarterly review.

The hidden file. The file exists but is not linked from anywhere and the crawler can only find it by guessing the path. Reference llms.txt in your robots.txt and from your humans.txt if you have one.

Combining llms.txt With Schema and Sitemap

llms.txt does not replace structured data or sitemaps. It supplements them.

Sitemap.xml tells search engines what exists. Structured data (Schema.org, JSON-LD) tells search engines what each page is about in machine readable form. llms.txt tells AI crawlers what you want them to prioritize.

A complete AEO stack ships all three. The brands that show up in AI answers most consistently are the ones that treat these layers as complementary rather than redundant.

How OnlyAEO Implements llms.txt for Clients

We treat llms.txt as part of every new client onboarding. The process is straightforward: audit existing content for citation worthiness, draft a curated llms.txt, validate against the major AI crawlers, ship to production, and monitor citation patterns for the first 60 days.

The lift is consistent. Clients see meaningful shifts in which pages get cited, with the pages they want surfaced moving up and stale content moving down. For documentation heavy sites the effect is most pronounced; we have seen citation rates on technical docs more than double within the first quarter.

Get your free AI visibility audit

We audit your content, draft a focused llms.txt, and track the citation shift over the first 90 days.

Get Your Free Audit

Frequently Asked Questions

Is llms.txt an official standard?+
Not yet. It is a community proposal that has been widely adopted but is not formally codified by the IETF or W3C. Major AI crawlers respect it informally as of 2026. The lack of formal standardization is not a reason to skip it; the adoption curve is steep enough that early movers benefit.
How is llms.txt different from robots.txt?+
robots.txt is a denylist that tells crawlers what not to access. llms.txt is an allowlist that highlights what you most want cited. They serve opposite functions and both belong on a modern site.
How often should we update llms.txt?+
Quarterly is the floor. Whenever you ship significant new documentation, case studies, or pricing changes, update the file the same week. Stale entries are worse than missing entries.
Will llms.txt help if our site is small?+
Yes. The smaller your site, the easier it is to curate a focused llms.txt that signals exactly which 10 to 20 pages matter most. Small brands benefit disproportionately because the signal to noise ratio is high.
OnlyAEO

OnlyAEO

Expert insights on Answer Engine Optimization and AI visibility strategy.

Related Articles