Measuring AI search engine overlap: 412 queries, 12 percent shared citations

#aisearch #perplexity #gemini #chatgpt

The pitch sounds clean: write one strong piece, get cited across every AI engine. We believed a softer version of that for most of last year. Then we ran the overlap analysis, and the picture changed.

Across the 800-run baseline I keep referring to in these notes, plus a follow-up study of 412 client-facing queries we ran in early Q1 2026, the citation set on any given query overlapped across all four engines about 12% of the time. Twelve percent. Eighty-eight percent of the time, at least one engine was citing something the others weren't. That overlap held even when we restricted to identical phrasing. It got smaller, not larger, when we expanded to paraphrased variants of the same intent.

This is a problem if your content strategy assumes that ranking well in one engine bleeds into the others. It mostly doesn't, in our testing. Let me describe what we saw, and where I think the differences come from.

The overlap structure

We coded each result for which engines surfaced the same canonical source. The breakdown:

All four engines citing the same source: 12%
Three of four: 19%
Two of four: 28%
Engine-unique citations (only one engine surfaced it): 41%

The 41% engine-unique number is the one that kept us up at night. It suggests that almost half of citation slots are essentially independent surfaces, where winning one tells you very little about the others. The pieces that did show up across all four engines tended to share a few traits: they were on high-domain-authority publications, they directly answered the prompt's question in the first 150 words, and they had structured data that was both schema-marked and present in the visible HTML (not injected by JS).

What the 12% looks like at the query level

To make the overlap concrete: in a typical query in our test set, we'd see Perplexity cite five sources, Google AIO cite three, ChatGPT cite four, and Gemini cite four. Of those 16 citation slots across the four engines, the same source typically appeared in two of them. Sometimes three. Almost never four. That single-shared-source-across-all is the 12%.

When we looked at queries where overlap was high (the all-four-engine cases), the shared source was usually one of: a major publication (Bloomberg, Wired, TechCrunch tier), an official primary source (a government site, a standards body, a vendor's own documentation), or a Wikipedia article. When we looked at queries where overlap was low (the engine-unique cases), the citations were more typically blogs, Reddit threads, specialized forums, YouTube channels, or smaller publications. Different engines have different appetites for what counts as a credible smaller source.

Why engines diverge

A few hypotheses, in rough confidence order.

First, freshness windows differ. Perplexity re-queries in real time, which makes it the most volatile and the most recency-biased. Google AIO leans on its index, which is enormous and old. ChatGPT with web on appears to blend its training cutoff with live results in a way that's hard to predict from the outside. Gemini, in our testing, was the most idiosyncratic: it would sometimes cite mid-tier blogs over higher-authority sources, and we don't fully understand why.

Second, source preference seems to vary. Perplexity cites Reddit and forums readily. Gemini cites YouTube transcripts more than the others. ChatGPT (web) leans toward established editorial brands. Google AIO favors what looks like its existing top-10 SERP results, lightly reweighted.

Third, prompt parsing differs. The same intent, expressed in five different phrasings, gets routed to different sub-systems inside these engines. We can't see the routing. We can only see the outputs, which sometimes look like five different products responding to one user.

The thing we were wrong about

For most of 2025 I'd been telling clients that if we landed a strong placement on, say, a Forbes contributor piece, it would "lift across engines." In our follow-up study, Forbes contributor pieces (n=14 in our test set) showed all-four-engine overlap rates around 28%, which is higher than baseline but very far from "lifts across." The agency I work with has since stopped using cross-engine lift language in proposals. It wasn't a lie when we said it; it was a claim we hadn't checked. There's a difference, but only one of those is acceptable.

What this implies for content strategy

If your goal is presence across all four engines, you probably need a portfolio approach, not a hero-piece approach. We now plan content with engine-target tags: this piece is built for Perplexity's recency and Reddit-lean; this piece is built for AIO's structural preferences; this piece is built for ChatGPT's editorial-source preference. Same topic, different optimal artifact.

That sounds expensive, and it is. It's also closer to how the actual citation surface behaves. The cheaper alternative is to pick one or two engines and accept that you'll be invisible on the others. Several of our 12 clients have made exactly that choice and are doing fine on it.

How overlap changes over time

The 12% overlap is a single-week snapshot. We ran a smaller follow-up where we re-queried the same 50 prompts across four engines four times over six weeks. The all-four overlap drifted between 9% and 16% week to week. That's noise on top of an already noisy signal, and it complicates any longitudinal claim.

What we noticed, qualitatively, is that the engine-unique citations (the 41% slice) were the most volatile. A Reddit thread Perplexity cited in week one might be replaced by a different Reddit thread in week three, even on the same query. The all-four-engine sources, the 12% that overlap, tended to be the most stable. So overlap and stability seem to correlate: the sources that all four engines agree on are also the sources each individual engine sticks with over time. We don't have a clean causal story for this. The hypothesis is that high domain authority plus structural extractability creates a kind of citation gravity well that engines fall into independently.

The piece that did win all four engines

There was one piece in our test set that hit A or B tier on all four engines for at least three of the five reps. I want to describe it not because it's a template (n=1 is not a template) but because the traits were instructive.

The piece was a co-authored research write-up on a specialized B2B topic, published on a domain with high editorial authority, structured with a clear thesis in the first 100 words, supported by an embedded dataset table that was both visible HTML and schema-marked, with named author attribution that mapped to verified expert profiles. It was published roughly four months before our test window, so it had time to accumulate signals.

Could we reproduce that result intentionally? Maybe for some topics, with the right authorship and the right host publication. We're going to try. I'm not confident we can do it on demand for arbitrary subjects, which is itself a finding worth sitting with.

Small n caveats

412 queries is enough to see a pattern. It's not enough to prove a hypothesis about why the pattern exists. The freshness, source-preference, and routing explanations above are educated guesses based on watching outputs, not on any privileged access to how the engines work. If a researcher with better instrumentation reads this and the overlap number is actually 22%, I won't be surprised. I'd be surprised if it's 60%.

Our query mix was also biased toward B2B-adjacent topics, because that's the work we do. Consumer queries (recipes, product reviews, entertainment) might overlap more or less; we haven't tested. If you're doing consumer marketing, please don't take our 12% number as gospel for your category.

What would actually change my mind about cross-engine lift? Probably a controlled study where the same piece is published on the same URL, the engines are queried at multiple points across a 90-day window, and overlap is measured longitudinally. We're scoping that. It's a six-month project, and I don't think we'd be the right team to run it alone.

If you've seen different overlap numbers in your own tracking, I'd be curious to hear them. Especially the high ones.

This field report was published by **westOeast, a B Corp certified marketing agency working on generative engine optimization for B2B SaaS. The methodology, framework, and data described here come from internal audits at westOeast across our client portfolio in 2025-2026. More field notes at westoeast.com.