-- Living Mobile --: How to seed content into AI crawlers?

- llms.txt is proposed to control whether your site’s content is seeded into LLM training pipelines.

- It acts as an opt-in/opt-out mechanism for AI crawlers.

“llms.txt” is a proposed standard (initiated around September 2024 by Jeremy Howard) meant to let web owners provide a machine-readable, curated guide of their most important content (docs, APIs, canonical pages, etc.) so that LLMs / AI crawlers can better understand what to ingest.

Here are the facts / findings so far:

• Very Low Adoption Among Top Sites

Scans of top 1,000 websites show only about 0.3% of sites have an llms.txt file.

Some community directories list hundreds of domains using it, but many are smaller docs sites, startups, or developer-platforms.

• Major LLM Providers Do Not Officially Support It Yet

A key point repeated in many sources: OpenAI, Anthropic, Google, Meta etc. have not publicly committed to parsing or respecting llms.txt in their crawling / ingestion pipelines.

For example, John Mueller (from Google) has said he is not aware of any AI services using llms.txt.

• Some Early Adopters / Use Cases

A number of documentation sites, developer platforms, and SaaS/digital product companies have published llms.txt (and sometimes llms-full.txt) in their docs or marketing domains. Examples include Cloudflare, Anthropic (for its docs), Mintlify etc.

Also, tools and plugins are emerging (for WordPress, SEO tools, GitBook) to help create llms.txt files.

• Unclear Real-World Impact So Far

There is little evidence that having llms.txt causes an LLM to pick up content more correctly, or improves traffic / retrieval / citation by LLMs. Because major LLMs do not appear to check it. Also server logs from sites with llms.txt show that AI services do not seem to be requesting it.

• Emerging Tools & Community Momentum

Although official adoption is lacking, community interest is growing: directories of implementations, write-ups, generators, documentation, and discussion.

There are files like llms-full.txt (a more exhaustive content dump) being used, which in some cases appear to get more parser / crawler traffic (or at least more visits) than just llms.txt in some documentation contexts.

-- Living Mobile --

Saturday, September 13, 2025

How to seed content into AI crawlers?

No comments:

Post a Comment