llms.txt Adoption in 2026: Only 5.86% of Top Sites Use It, But Early Data Reveals a Clear Signal

Originally published on The Searchless Journal

The debate around llms.txt has been stuck between two camps. One side argues it is the next robots.txt, a foundational infrastructure layer that every website needs. The other side dismisses it as speculative, pointing out that no major LLM provider has publicly committed to using it as a ranking or citation signal.

Fresh data from a rigorous crawl-based study suggests both camps are partially right and both are missing the point.

In a May 6, 2026 crawl of the Tranco Top 10,000 domains, researchers at Thunderbit found 586 valid llms.txt files. That is a 5.86% observed adoption rate. The companion llms-full.txt file was even rarer, at just 1.03%. By any measure, llms.txt is not mainstream.

But the data also shows that adoption is growing fast, implementation quality among adopters is surprisingly high, and the companies that have implemented it read like a who's who of the modern internet infrastructure stack.

The real story is not whether llms.txt is ubiquitous. It is who has adopted it, what their implementations look like, and what that signals about where the web is heading.

The Adoption Numbers, Measured Correctly

The most important methodological finding in the Thunderbit study is that status codes are a terrible proxy for llms.txt adoption.

The crawler observed 1,606 HTTP 200 responses for /llms.txt across the Top 10,000 domains. Only 586 passed validation. The remaining 1,020 were off-target redirects, generic HTML pages, empty bodies, or other invalid responses. A naive crawler counting every 200 response as adoption would overestimate the real number by 2.74 times.

This matters because earlier adoption estimates varied wildly depending on methodology. Rankability reported a 0.3% adoption rate across the top 1,000 websites in June 2025, using validation logic similar to Thunderbit's. By May 2026, Thunderbit found 75 valid llms.txt files in the Tranco Top 1,000, or 7.50%.

The two data points are not strictly comparable because the ranking sources, crawl timing, and validation details differ. But the direction is clear: adoption moved from negligible to measurable in under a year, especially among developer, SaaS, cloud, and documentation-heavy sites.

Snapshot	Sample	Valid Adoption
Rankability, June 2025	Top 1,000	0.3%
Thunderbit, May 2026	Tranco Top 1,000	7.50%
Thunderbit, May 2026	Tranco Top 10,000	5.86%

A separate study by AI Visibility, covering the top 1,905 websites in Q2 2026, found 7.2% had any AI discovery file, which includes llms.txt and similar formats. That is directionally consistent.

Who Adopted First

The early adopter list is revealing. Among the 586 domains with valid llms.txt files, Thunderbit identified major infrastructure and SaaS companies:

Cloudflare, Azure, GitHub, DigiCert, WordPress.org, Adobe, Dropbox, PayPal, Stripe, Salesforce, Slack, Zendesk, Okta, Datadog, and Cloudinary.

This is not a random sampling of the internet. It is a concentration of the companies that build developer tools, manage cloud infrastructure, process payments, and power the software stack that other businesses depend on. These are the sites that AI engines are most likely to crawl for authoritative technical documentation, API references, and product information.

Presenc AI's industry-level breakdown adds more context. Their research measured adoption across 15 industries and found that technology, developer tools, and SaaS sectors led adoption, while healthcare, retail, and traditional media lagged significantly. The pattern is consistent with every previous web standard: infrastructure companies adopt first, content companies follow, and laggards adopt only when competitive pressure forces them.

Implementation Quality Is Higher Than Expected

Among the valid llms.txt files that Thunderbit found, the implementation quality suggests this is not just placeholder experimentation.

The median valid file was about 7.1 KB. 61.77% of valid files were larger than 5 KB, indicating substantial content rather than a token one-liner. 70.82% contained six or more Markdown sections, and 77.47% contained 11 or more Markdown links.

In practical terms, the companies that have implemented llms.txt are not just dropping an empty file at their root. They are building structured, navigable documents that point AI systems toward their most important pages, documentation, APIs, policies, and product information.

This aligns with the original intent of the llms.txt proposal, introduced by Jeremy Howard in 2024. The format frames the file as a Markdown document that provides LLM-friendly information at inference time. The argument is straightforward: HTML pages include navigation, advertising, scripts, and other noise that makes them harder for language models to parse efficiently. A concise Markdown file can direct models to authoritative, current content without the overhead.

The Traffic Evidence: Causal or Correlative?

The most contentious question around llms.txt is whether it actually drives AI referral traffic. The evidence is mixed but instructive.

Search Engine Land published a 10-site analysis in January 2026 that tracked sites for 90 days before and 90 days after llms.txt implementation. Two sites saw AI traffic increases of 12.5% and 25%. Eight saw no measurable improvement. One declined by 19.7%.

The key nuance: the two apparent success stories also launched new templates, rebuilt resource centers, added extractable comparison tables, earned press coverage, and published new FAQ-style content during the same period. In that framing, llms.txt documented stronger content and technical work. It did not appear to cause the growth independently.

OtterlyAI reached a more positive conclusion from a single-site observation. After adding both llms.txt and llms-full.txt, LLM referral sessions rose from 75 to 92 over comparable four-month periods, a 23% increase. But total referral traffic grew faster, from 160 to 290 sessions, meaning LLM session share actually fell from 47% to 32%.

The honest read: llms.txt alone is unlikely to move AI traffic in a meaningful way. But as part of a broader AI-readiness strategy that includes structured content, clean technical signals, and authoritative documentation, it is a low-cost, low-risk addition that positions a site to benefit as LLM providers begin or expand their use of the format.

Why It Matters Even If No LLM Provider Has Committed Publicly

The skeptical argument against llms.txt is that no major LLM provider has publicly committed to using it as a ranking, crawling, or citation signal. This is factually correct as of May 2026.

But it misses two important dynamics.

First, LLM providers are not monolithic. Different teams within OpenAI, Google, Anthropic, and Perplexity experiment with different signals. A provider does not need to make a public commitment for llms.txt to be used as a supplementary context source during retrieval. The format is designed to help models find authoritative information efficiently. That is useful regardless of whether it is officially endorsed.

Second, adoption creates its own momentum. When Stripe, Salesforce, GitHub, and Cloudflare publish llms.txt files, they create a corpus of structured, high-quality AI-facing signals. That corpus becomes a training and evaluation resource. The more high-quality implementations exist, the more likely LLM providers are to build tooling that uses them.

The comparison to early sitemap.xml adoption is instructive. Google did not need to endorse sitemaps for them to become useful. Enough sites adopted them that building support became an obvious engineering decision. llms.txt is on a similar trajectory, compressed into a shorter timeline.

What Smart Brands Should Do Now

The data supports a specific set of actions, not a wait-and-see approach.

Implement llms.txt if you have structured content worth surfacing. The cost is minimal: a Markdown file at your site root that points to your most important pages, docs, APIs, and product information. The upside is positional: you join the early adopter corpus that LLM providers are most likely to crawl and use.

Do not treat it as a standalone traffic driver. The Search Engine Land data is clear. llms.txt without corresponding improvements to content quality, technical infrastructure, and third-party presence does not move the needle.

Validate your implementation. Thunderbit's finding that 63.51% of HTTP 200 responses for /llms.txt failed validation means many sites think they have implemented it correctly when they have not. Test your file against the specification, not just against whether your server returns a 200.

Track AI referral traffic separately. You cannot measure the impact of llms.txt or any other GEO intervention if you cannot see AI-referred sessions in your analytics. Set up UTM tracking and segment AI traffic before making changes.

Watch the adoption trajectory, not the current rate. Moving from 0.3% to 5.86% in under a year is a meaningful shift. The companies implementing now are the same companies that AI engines are most likely to crawl for authoritative information. Being in that cohort matters even if the direct traffic impact is not yet measurable.

To see how your site performs across AI engines and whether your current signals are working, run a free AI visibility audit.

Sources

Thunderbit, "The Rise of llms.txt: How Websites Are Signaling to AI," crawl-based study of Tranco Top 10,000, May 6, 2026
Rankability, llms.txt adoption study, Top 1,000 websites, June 2025
AI Visibility, "AI Discovery File Adoption Research: Q2 2026," Top 1,905 websites
Presenc AI, "llms.txt Adoption by Industry 2026"
Search Engine Land, 10-site llms.txt before/after study, January 2026
OtterlyAI, "llms.txt and AI Visibility: Results from OtterlyAI's GEO Study"
Jeremy Howard, llms.txt proposal, 2024

FAQ

Is llms.txt worth implementing for a small business website?
Yes, if you have structured content like product documentation, service descriptions, or FAQ pages that would benefit from being surfaced in AI answers. The implementation cost is minimal and the positional advantage of being an early adopter grows as more LLM providers begin using the format.

How is llms.txt different from robots.txt?
robots.txt tells crawlers what they cannot access. llms.txt tells AI systems what they should prioritize when they do access your site. They serve complementary purposes: robots.txt is about access control, llms.txt is about content navigation and context.

Does Google use llms.txt?
Google has not publicly committed to using llms.txt as of May 2026. But the format is designed to help any LLM-based system find authoritative content efficiently, and adoption by major infrastructure companies creates pressure for providers to support it.

How do I validate my llms.txt file?
Ensure the file is valid Markdown, placed at your site root, accessible via a direct URL request, and not returning an HTML page, redirect, or empty response. Thunderbit's data shows that 63% of sites returning HTTP 200 for /llms.txt are actually serving invalid responses.

Explore AI visibility strategy or check Searchless pricing for ongoing AI search monitoring.