Code Pocket

Posted on May 12 • Originally published at westoeast.com

Entity disambiguation versus schema: which moved citations more

#entity #disambiguation #aisearch #citations

The first time we tried to systematically disambiguate a client's entity references across their website, I expected it to be a polishing exercise. A week's work, modest gains, the kind of project you don't write a case study about. The data surprised me. In the same portfolio where we measured a 9-10% schema-attributable citation lift, the entity disambiguation work appears to have moved more tier-shifts than the schema work did.

I want to be careful with this claim, because I'm not sure how reproducible it is. But the direction is consistent enough that the agency I work with has reordered our default engagement sequence. Entity work now happens before schema work in most engagements. Six months ago it was the other way around.

What "entity disambiguation" means in this context

Two pages on the same site referring to a CEO by three different name spellings. A product whose internal docs called it "Atlas," whose marketing site called it "the Atlas Platform," and whose support docs called it "Atlas Suite." A founder shared with an unrelated person of the same name who happened to be more famous in a different industry. A subsidiary that the parent company's site never explicitly linked to as a subsidiary.

These are not exotic problems. We see them in every audit. They're the kind of thing that builds up because no single team owns naming conventions across an organization, and individually each inconsistency is harmless.

The hypothesis behind disambiguation work is that AI engines, when parsing a page or pulling an answer, need to resolve "is this entity X or entity Y?" and the cost of that resolution is paid in confidence. A page that consistently and explicitly identifies its subject is easier to cite confidently than a page that's ambiguous about whether it's talking about the company, the product, the parent, or some homonym.

That's the hypothesis. Here's what the data showed.

The before/after

Across a subset of 8 clients where we did focused entity disambiguation work in Q4 2025 and held other variables roughly constant, we tracked citation tier on a stable set of 20 prompts per client (160 total prompts) for 4 weeks before the work and 8 weeks after.

The aggregate A+B tier rate moved from 21% to 29%, a relative lift of about 38%. That number is larger than the schema lift on a smaller sample, which is exactly why I'm being careful about generalizing.

Per-client variation was wide: one client showed no measurable lift, one showed a 60%+ relative improvement, the rest clustered in the 20-40% range. The one that showed no lift had been doing meticulous editorial QA for years and had relatively few entity-consistency issues to fix; their starting point was already clean.

What "disambiguation work" actually looked like

Concretely, in this client subset:

Standardized name spellings across all pages (CEO, founders, product names, locations).
Added structured organization, person, and product schema with consistent identifiers.
Linked sameAs references to external authoritative profiles (LinkedIn, Crunchbase, official social, where appropriate).
Disambiguated against known homonyms by adding clarifying context in the first paragraph of pages where confusion was plausible.
Cleaned up internal anchor text so that links to a product page used consistent phrasing.

None of this was creative work. It was inventory and cleanup. The total hours per client ranged from about 20 to about 80, depending on the size of the site.

The thing I was wrong about

I'd assumed the biggest lift would come from sameAs and structured data. In our testing, the biggest single lift seems to have come from name consistency in the body text of pages — boring editorial work that doesn't involve any structured markup at all. The structured markup helped, but the editorial pass appears to have done more.

This is uncomfortable because it means the highest-impact GEO work, for some clients, is just editing. Not strategy, not technical implementation, not content generation. Editing. The agency I work with has had to adjust how we talk about this work because clients sometimes recoil from paying for "editing" the way they don't recoil from paying for "schema implementation." Same hours, different framing, similar lift.

Why I'm not fully confident yet

The 38% relative lift is from a small sample with non-randomized assignment. The clients who got the focused disambiguation treatment were the ones where we'd already identified entity issues during initial audit, which means they had more room to improve. A randomized study would give cleaner numbers.

The 8-week tracking window may also be too short to know whether the lift persists. Some of our schema lifts compressed over a longer window. Disambiguation might do the same.

And the line between "disambiguation" and "general content cleanup" is fuzzier than I'd like. Some of what we counted as disambiguation work probably had collateral content improvements that helped citations independently.

A concrete pattern: the "named expert" lift

One specific sub-pattern that showed up across multiple clients was about author and expert attribution. Pages that named the author with a verified profile (linked to a real LinkedIn, a real organization page, a real public bio) seemed to cite better than pages with no author or with vague "by the team" attribution.

The relative lift on this specific change was on the order of 15-20% in the clients where we made the change. It's a small intervention. The cost is maybe an hour per page if the author bios already exist and are linkable. The cost is much higher if the underlying authors don't have credible public profiles, which is a separate problem we can't solve from outside.

For B2B SaaS clients, this often means committing to an authorship strategy: who on the team has earned the right to be cited, what does their public profile look like, and how do we make their work findable. Some of our clients have been excited about this. A few have been uncomfortable, because it implies that the brand alone isn't enough; you need named people whose names can be tied back to verifiable expertise.

What this implies for engagement sequencing

If you're scoping a GEO engagement, our updated default is to start with an entity audit before any schema work. If the entity layer is messy, the schema layer is decorating something that engines may not be able to parse confidently anyway. If the entity layer is clean, schema sits on top of it usefully.

This is not the order I would have recommended a year ago. Order of operations matters, and we got it wrong for our first few engagements.

The relationship between entity work and brand

There's a softer point underneath the technical work. Entity disambiguation forces an organization to decide what it is, precisely. When two pages refer to the same product by three different names, the problem isn't AI parsing. The problem is that the organization hasn't fully decided what to call its own thing. The disambiguation work is, in some ways, an excuse to have the conversation that should have happened during product naming and never quite did.

That makes some of this work uncomfortable for clients. Marketing teams don't always have the authority to rename a product. Engineering teams may have technical reasons for the legacy names. Sales teams may have customer relationships built on familiarity with old terms. Getting to consistent entity references can require surfacing organizational debt that nobody wanted to deal with.

We try to be honest with clients about this when we scope the work. "This is going to involve a few uncomfortable internal conversations" is a more accurate scope than "we'll clean up your entities." The first version sets expectations correctly. The second version sounds easier and ends up taking three times longer because nobody had warned the client about the political layer.

What we can't yet do

We can't predict which clients will get the biggest lift from disambiguation before doing the audit. The client who showed no lift had a clean starting point, which we couldn't have known without auditing. The clients who showed 60%+ relative lifts had specific entity issues that weren't visible from the outside.

We also can't promise the lift will hold over years. The disambiguation work we did in Q4 2025 still looks good in our most recent tracking, but we've only been measuring for about two quarters. The longer-run question is open.

If you've done structured entity disambiguation work in your own GEO practice, did you see the same disproportionate lift? Or are we looking at a portfolio effect that won't generalize?

This field report was published by **westOeast, a B Corp certified marketing agency working on generative engine optimization for B2B SaaS. The methodology, framework, and data described here come from internal audits at westOeast across our client portfolio in 2025-2026. More field notes at westoeast.com.

DEV Community