- llms.txt is proposed to control whether your site’s content is seeded into LLM training pipelines.
- It acts as an opt-in/opt-out mechanism for AI crawlers.
“llms.txt” is a proposed standard (initiated around September 2024 by Jeremy Howard) meant to let web owners provide a machine-readable, curated guide of their most important content (docs, APIs, canonical pages, etc.) so that LLMs / AI crawlers can better understand what to ingest. 
Here are the facts / findings so far:
• Very Low Adoption Among Top Sites
Scans of top 1,000 websites show only about 0.3% of sites have an llms.txt file. 
Some community directories list hundreds of domains using it, but many are smaller docs sites, startups, or developer-platforms. 
• Major LLM Providers Do Not Officially Support It Yet
A key point repeated in many sources: OpenAI, Anthropic, Google, Meta etc. have not publicly committed to parsing or respecting llms.txt in their crawling / ingestion pipelines. 
For example, John Mueller (from Google) has said he is not aware of any AI services using llms.txt. 
• Some Early Adopters / Use Cases
A number of documentation sites, developer platforms, and SaaS/digital product companies have published llms.txt (and sometimes llms-full.txt) in their docs or marketing domains. Examples include Cloudflare, Anthropic (for its docs), Mintlify etc. 
Also, tools and plugins are emerging (for WordPress, SEO tools, GitBook) to help create llms.txt files. 
• Unclear Real-World Impact So Far
There is little evidence that having llms.txt causes an LLM to pick up content more correctly, or improves traffic / retrieval / citation by LLMs. Because major LLMs do not appear to check it. Also server logs from sites with llms.txt show that AI services do not seem to be requesting it. 
• Emerging Tools & Community Momentum
Although official adoption is lacking, community interest is growing: directories of implementations, write-ups, generators, documentation, and discussion. 
There are files like llms-full.txt (a more exhaustive content dump) being used, which in some cases appear to get more parser / crawler traffic (or at least more visits) than just llms.txt in some documentation contexts. 
No comments:
Post a Comment