What is llms.txt? The Complete Guide to AI Search Optimization

llms.txt is a Markdown file at your site root that points AI assistants to the pages worth reading. Learn the format, the difference from robots.txt and sitemap.xml, what crawlers actually do with it, and how to write one in five minutes.

Published: 2026-04-27

llms.txt is a Markdown file you place at the root of your domain (yoursite.com/llms.txt) that lists the pages an AI assistant should read to understand your site. Think of it as a curated reading list written in plain Markdown, so a language model can parse it without burning tokens on navigation, ads, and footer junk.

Jeremy Howard of Answer.AI proposed the format in September 2024. Adoption has moved fast. A 2026 survey of 300,000 domains found 10.13% had a live llms.txt, and tracker estimates put the total above 844,000 sites including Anthropic, Stripe, Cloudflare, Vercel, and Mintlify. The pitch is simple: stop forcing models to scrape your full HTML when a 200-line Markdown file can give them the same answer faster.

llms.txt vs robots.txt vs sitemap.xml

Three files, three jobs. Reading them as competitors confuses what each one actually does.

robots.txt is about exclusion. It tells crawlers (Googlebot, GPTBot, ClaudeBot, PerplexityBot) which paths they can and cannot fetch. Enforced by user-agent string at the protocol level.
sitemap.xml is about discovery. It is an XML inventory of every URL you want indexed, with last-modified dates and priority hints. Search crawlers use it to find pages they might otherwise miss.
llms.txt is about curation. It points an LLM at the small handful of pages that actually answer common questions, in a clean format the model can read without guessing.

The key difference is timing. robots.txt and sitemap.xml are read by automated crawlers on a schedule. llms.txt is most often pulled live by an AI assistant or retrieval pipeline that fetches your homepage and looks for /llms.txt before answering a user. One file, one moment, one question.

The llms.txt format

The spec is short. Five elements, in this order:

A required H1 with the project or site name.
An optional blockquote with a one-sentence summary.
An optional body paragraph for context. No headings allowed in the body.
One or more H2 sections, each containing a Markdown list of links. Each link can have a colon and a short description.
An optional H2 section literally named "Optional", whose links can be skipped if the model needs a shorter context.

Here is the canonical template from the spec:

# Site name

> One-sentence summary of what the site is and who it serves.

Background paragraph if you need it. No headings here.

## Docs

- [Getting started](https://example.com/docs/start.md): Install and your first request.
- [API reference](https://example.com/docs/api.md): All endpoints and parameters.

## Optional

- [Changelog](https://example.com/changelog.md)

The spec also recommends serving Markdown versions of each linked page at the same path with .md appended, so /docs/start becomes /docs/start.md. That gives the model clean text to fetch instead of full HTML wrapped in menus and modals.

A complete example

Here is what a real llms.txt looks like for a site like Talos Tools:

# Talos Tools

> Free web developer toolkit: 77 tools, 24 generators, comparisons, learning roadmaps, and self-hosted app picks.

Talos Tools collects tools developers reach for daily, from JSON formatters to schema generators, plus comparison content and free interactive learning roadmaps.

## Tools

- [Robots.txt Generator](https://talos.tools/robots-txt-generator.md): Build robots.txt with crawler-specific rules.
- [Schema Generator](https://talos.tools/schema-generator.md): Generate JSON-LD for FAQPage, HowTo, Product, Article.
- [Meta Tag Generator](https://talos.tools/meta-tag-generator.md): SEO and Open Graph tags for any page.

## Generators

- [Gradient Generator](https://talos.tools/generators/gradient.md): CSS gradients with live preview.

## Roadmaps

- [Frontend Developer Roadmap](https://talos.tools/roadmaps/frontend-developer.md): Curated path with free resources.

## Optional

- [Blog](https://talos.tools/blog.md): Tutorials and tool reviews.

The H1 names the site. The blockquote gives a model an instant sense of what is here. Each H2 groups related URLs. The Optional section flags links the model can drop if context is tight.

How AI crawlers actually use it

This is the part most articles skip. As of 2026, no major AI platform has officially confirmed it reads llms.txt as a special signal. Google's John Mueller said publicly that no AI system currently uses it. OpenAI, Anthropic, Microsoft, and Perplexity have stayed silent on the topic.

So why do so many companies publish one anyway?

Two reasons. First, AI assistants and RAG pipelines that browse the live web (Claude with web search, ChatGPT search, Perplexity, custom agents) will fetch your homepage and check for /llms.txt as part of their retrieval step. That is not the same thing as a Googlebot-style crawl. It is one assistant, one query, one fetch. The file gets used by the agent, not indexed by a search engine.

Second, your own AI tools can use it. If you build an internal documentation chatbot or hand a customer a custom GPT, pointing it at your llms.txt is the cleanest way to ground it in the right pages.

The training and indexing crawlers themselves (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) still read robots.txt, not llms.txt. They do not treat llms.txt as a ranking signal because there is no ranking system involved. Anyone selling you llms.txt as a guaranteed AI Overview boost is selling you a story, not a tactic.

llms-full.txt: the long-form variant

llms.txt is a navigation file. llms-full.txt is the actual content, concatenated into one Markdown blob.

Anthropic publishes both. Their llms.txt runs about 8,364 tokens of links. Their llms-full.txt is 481,349 tokens of full API documentation, sized to drop into a long-context model in one shot.

Use llms-full.txt when:

Your docs are stable and you want a model to load everything at once.
You have under roughly 500K tokens of meaningful content.
You are willing to regenerate the file on each release so it stays current.

Skip it when your content is large, fast-changing, or already well-served page by page.

Common mistakes

Five things that quietly break llms.txt:

Wrong Content-Type. Serve as text/markdown or text/plain, not text/html. A model fetching a "Markdown" file with HTML wrapping will treat it like any other webpage.
Missing H1. The first line must be a single H1 naming the site or project. Skipping it breaks every parser that follows the spec.
Blocked in robots.txt. If your robots.txt disallows /llms.txt, retrieval bots cannot fetch it. Allow the path explicitly.
Linking to redirected URLs. Link to the final URL, not a 301. Every redirect costs a model an extra fetch.
No .md companions. The spec recommends serving Markdown versions of your linked pages. If you only serve HTML, models burn tokens parsing your nav and footer instead of reading your content.

How to write your own in five minutes

Three steps:

Pick the 5 to 20 pages a stranger would actually need. Not every page. Your homepage, top tutorials, key API references, pricing, contact, top comparisons.
Write a one-sentence blockquote that answers "what is this site." Keep it under 200 characters. The model will quote it back when summarizing your work.
Group the links into 2 to 5 H2 sections. Use plain section names: Docs, Guides, API, Tools. Skip clever headings.

If you want help with the surrounding crawl rules, our robots.txt generator builds a robots.txt that explicitly allows AI bots to reach your llms.txt. For structured data on the same pages, the schema generator outputs FAQPage, HowTo, and Article JSON-LD that AI Overviews and search engines parse alongside your llms.txt. And if you want OG and SEO tags wired up at the same time, the meta tag generator takes care of those.

Save the file as plain Markdown. Upload it to your web root. Confirm it loads at https://yoursite.com/llms.txt with content type text/markdown. That is the whole job.

FAQ

Is llms.txt a real standard?

It is a proposal, not an IETF or W3C standard. It has not been ratified, but adoption has grown to over 800,000 sites and several documentation platforms (Mintlify, GitBook, ReadMe) generate llms.txt automatically.

Do I need both llms.txt and robots.txt?

Yes. They do different jobs. robots.txt controls what crawlers can fetch. llms.txt highlights what an AI assistant should read once it is allowed in. They are not substitutes for each other.

Does Google read llms.txt?

No. Google's John Mueller stated that no AI system currently uses llms.txt as a ranking or crawling signal, and Google has explicitly said the file has no influence on search rankings. Publishing one will not move your Google position by itself.

How big should llms.txt be?

Keep it under a few thousand tokens. The whole point is to fit comfortably in a model's context window alongside the user's actual question. If you need to ship more, that is what llms-full.txt is for.

Where do I host it?

At the root of your domain: https://yoursite.com/llms.txt. Sub-paths and subdomains do not work as a substitute. The path is part of the spec.

Can I block specific AI crawlers?

Yes, but you do that in robots.txt with User-agent lines (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot), not in llms.txt. llms.txt is a curation file, not an access control file.

Where to go from here

llms.txt is a low-cost, low-risk addition. Even if Google and OpenAI never officially adopt it, retrieval-augmented assistants already use it, and the file is trivial to maintain. Publish one, link it from your sitemap, and move on.

For the rest of your AI search stack, browse the full tools catalog for SEO-adjacent utilities, or pick up a frontend developer roadmap if you are coming at this from the build-the-site side. More guides on AI search optimization are coming to the Talos Tools blog.

Last updated: April 2026.

Last updated: 2026-07-26