That citation set is the authority map of the open web in 2026. The sources the engines reach for first are the sources the engines have decided to trust. Everything outside that set is invisible at the answer layer. A brand can spend ten million dollars on content production and rank on page one of Google for a hundred terms and still not appear in a single AI answer because none of the trust signals the engines weigh have accumulated against its name.
This is the new ranking problem. Not page rank. Source rank. And the criteria the engines use are not the criteria the SEO industry spent twenty years optimizing against.
What changed at the source layer
Classical search ranked pages. The unit was a URL. A page won by having the right keywords, the right backlinks, and the right technical signals. Twenty thousand SEO firms built businesses inside that game.
AI engines rank sources. The unit is a publisher, a brand, a person, or an institution — the entity behind the URL. Inside a retrieval-augmented generation pipeline, the engine is not selecting "the best page for this query." It is selecting "the most trustworthy source for this claim, then synthesizing across multiple trusted sources." The page is a token in a larger trust ledger the engine maintains about who said what, how reliably, over what period.
Three structural shifts follow.
First, repetition across sources matters more than depth within one source. If five trusted publishers say the same thing, the engine treats the claim as established. If one publisher says it ten different ways, the engine treats the claim as one signal.
Second, the entity behind the content matters more than the content itself. A two-hundred-word post by a recognized expert on a topic the engine has already cited that expert on three times outweighs a two-thousand-word post by an anonymous SEO writer on a domain with no prior citation history.
Third, the answer is now the product. Buyers ask. The engine answers. The publisher whose work is woven into the answer wins the trust transfer. The publisher whose work is not cited does not exist inside that conversation.
The seven signals AI engines actually weigh
From the public documentation of OpenAI's GPTBot, Google's AI Overviews retrieval, Anthropic's Claude search behavior, and the academic literature on retrieval-augmented generation, the trust signals fall into seven categories. They are not weighted equally and they interact.
1. Citation graph position
Who cites you. Not how many. Who. A citation from Reuters carries more retrieval weight than a hundred citations from low-trust domains. A citation from a peer-reviewed paper carries more weight than a citation from a corporate blog. The engines maintain an implicit graph of which sources cite which other sources, and the graph rewards proximity to high-trust nodes. This is the AI-engine cousin of academic citation analysis. Brands that have never been cited by Wikipedia, the major newswires, or the trade press of their sector start at the periphery and have to earn their way in.
2. Cross-engine convergence
If a fact appears in ChatGPT's answer, Claude's answer, Gemini's answer, and Perplexity's answer, the engines are reading from overlapping source pools. The pools overlap most heavily at the highest-trust tier: Wikipedia, the major newswires, the encyclopedia of the open web (Britannica, Stanford Encyclopedia of Philosophy, the major medical and legal references), and a small set of trade publications per vertical. A source that appears across multiple engines for the same query has compounding trust. A source that appears in one engine and not the others is on the edge of the trust set.
3. Entity disambiguation density
The engines need to know who you are before they can cite you. That requires entity disambiguation — enough structured signal in the world that the engine can resolve the string "Acme Corp" to a single, well-defined organization with a known address, a known founding date, a known leadership team, a known sector, and a known set of products. Brands with thin entity records get confused with similarly named entities, get cited inconsistently, or get omitted because the engine cannot resolve the reference. The entity record lives in Wikipedia, Wikidata, Crunchbase, LinkedIn, the SEC filings if public, the trade-press archives, and the brand's own site. Density across these surfaces is what makes the entity legible.
4. Topical consistency over time
The engines reward sources that have published on a topic consistently for five or more years. A publication that has covered crisis communications since 2009 carries more topical authority than a publication that started covering it in 2024. The signal is not raw volume — it is durational consistency at a meaningful cadence. A five-year track record of monthly coverage on a single category compounds into the kind of source the engines reach for when a query lands in that category.
5. Original research and primary reporting
Synthesis sites that summarize other sources are useful to humans but lower-trust to the engines. A site that conducts original surveys, publishes proprietary data, runs primary reporting, or contributes new analysis becomes a node the engines want to cite because it adds information rather than recirculating it. The 5W AI Visibility Index series, the Edelman Trust Barometer, the Pew Research surveys, and the academic working-paper archives all function this way: they are upstream of the synthesis layer.
6. Schema and structured data
The retrieval layer parses structured markup. Article schema, Organization schema, Person schema, FAQ schema, breadcrumbs, canonical tags, OpenGraph metadata, JSON-LD payloads — all of these tell the engine what a page is, who wrote it, when it was published, what it relates to, and how it connects to other pages on the same domain. Domains with clean schema are easier to retrieve, easier to attribute, and easier to trust. Domains with messy or absent schema get downweighted because the engine has to guess.
7. Crawl access
If GPTBot, ClaudeBot, Google-Extended, or PerplexityBot cannot crawl the page, the page does not exist in the engine's index. A surprising share of brand sites still block AI crawlers via robots.txt — either by accident, by inheriting a policy from a former CMS, or by deliberate decision in 2023 that no one has revisited. Every blocked page is a citation surface that has been voluntarily removed from the trust graph.
The retrieval anchor
Inside the engines, certain sources function as retrieval anchors — the sources the system reaches for first when a query lands in their category. Reuters is a retrieval anchor for breaking news. The Mayo Clinic is a retrieval anchor for medical information. Stanford Encyclopedia is a retrieval anchor for philosophy. The Department of Defense site is a retrieval anchor for U.S. military information. These anchors are stable across engines and across query types because the trust signals stacked against them are dense enough that no individual query causes the engine to look elsewhere first.
Becoming a retrieval anchor is the highest-leverage outcome in AI Communications. A brand or publication that becomes the retrieval anchor for its category wins citation share by default — the engine returns to the same source repeatedly because the alternatives have not stacked enough signal to displace it. The anchor is not earned by one piece of content or one announcement. It is earned by five-plus years of consistent publishing, cross-engine convergence, dense entity records, and primary research that the rest of the category cites in return.
Domain authority versus AI authority
The two are correlated but not identical. A domain with high Google authority can have low AI authority if it has thin entity records, no schema, no primary research, and no cross-engine citation history. A domain with modest Google authority can have high AI authority if it owns a category with deep entity density, consistent multi-year publishing, and structured markup. The transitive case is the one most brands are caught in: years of investment in Google SEO produced a domain that ranks but does not get cited. The fix is not more Google SEO. The fix is rebuilding the entity layer for the engines.
What the engines actually do at retrieval time
A simplified view of the pipeline. The user submits a query. The engine constructs a search intent. The retrieval layer reaches into the index for candidate sources — typically three to fifty documents per query, depending on the engine and the topic. A reranking layer scores the candidates against the query for relevance, recency, source authority, and answer completeness. The top three to seven survive into the answer-generation stage. The language model synthesizes across those survivors. The cited sources in the final answer are the ones whose claims the model used directly.
Every step of this pipeline is a filter. A page that is not in the index does not enter the candidate set. A page in the candidate set that has weak authority signals does not survive reranking. A page that survives reranking but does not contain a claim the model needs does not get cited in the final answer. The funnel from "the engine could have cited you" to "the engine did cite you" is narrow, and authority signals are the dominant variable at every stage.
The Authority Signals Index
The seven signals above, organized as a measurable framework. A brand or publication can be scored on each. The composite score is a reasonable proxy for AI authority across the major engines.
Citation graph position. How many high-trust sources (newswires, Wikipedia, academic publications, top-tier trade press) cite the entity over the trailing thirty-six months. Below five: weak. Five to twenty: building. Twenty to one hundred: established. Above one hundred: anchor candidate.
Cross-engine convergence. The share of category queries (ten representative buyer prompts per vertical) where the entity appears in at least two of the four major engines' answers. Below ten percent: invisible. Ten to thirty: peripheral. Thirty to sixty: present. Above sixty: dominant.
Entity disambiguation density. Presence and quality of records across Wikipedia, Wikidata, Crunchbase, LinkedIn, SEC filings (where applicable), and the brand's own about pages. Score on completeness, consistency, and recency. Below three of six surfaces with current data: weak. All six with current data: strong.
Topical consistency. Years of consistent publishing on the category at a meaningful cadence (at least monthly). Below two years: new. Two to five years: building. Five to ten years: established. Above ten years: legacy anchor.
Original research output. Proprietary studies, surveys, indices, or datasets published per year. Zero: synthesis-only. One to three: contributing. Four to ten: research-driven. Above ten: primary node.
Schema completeness. Coverage of Article, Organization, Person, FAQ, and Breadcrumb schema across the domain, plus clean OpenGraph and canonical tags. Below fifty percent of pages with full schema: weak. Above ninety percent: strong.
Crawl access. Robots.txt and sitemap configuration confirmed open to GPTBot, ClaudeBot, Google-Extended, and PerplexityBot. Binary signal — open or closed.
What brands and publications get wrong
Five recurring failures.
First, overinvesting in volume. A hundred new pages per month does not move authority if the pages are thin synthesis without primary signal. Volume without research dilutes the entity record rather than reinforcing it.
Second, underinvesting in the entity record. Brands optimize their website and ignore the off-site surfaces — Wikipedia, Wikidata, Crunchbase, LinkedIn — that the engines use to disambiguate the entity. The on-site work compounds five times faster when the off-site record is dense and consistent.
Third, treating AI visibility as an SEO project. AI authority and Google authority are correlated but not the same thing. An SEO team trained on the 2015-era ranking playbook will not produce the schema density, the cross-engine convergence, or the entity record that the AI engines actually weight.
Fourth, ignoring primary research. The synthesis tier of the web is overpopulated. The primary research tier is undersupplied. A single proprietary study with a clear methodology, transparent data, and a defensible finding generates more citation graph weight than a year of synthesized commentary on other people's research.
Fifth, blocking the crawlers. Inherited robots.txt rules, CDN-level blocks, and aggressive bot-detection systems are removing brand surfaces from the AI index at the same time the brand's marketing team is investing in the same surfaces. The two functions need to talk to each other.
The implication
Authority is no longer the brand's claim about itself. It is the engine's claim about the brand, derived from the signals the engine can read. Public relations, in the AI era, is the discipline of generating those signals — primary research the engines want to cite, entity records the engines can resolve, topical consistency the engines can verify, schema the engines can parse, citations from high-trust nodes the engines already trust. Every output is either a deposit into the trust ledger or a missed opportunity to make one.
The publishers and brands that understand this are building permanent authority assets — the kind that survive algorithm changes because the underlying trust signals are real. The ones that do not are running 2018 SEO playbooks against a 2026 retrieval pipeline and wondering why their visibility metrics are decoupling from their growth metrics.
The answer layer is the new ranking. Authority signals are the new score.