GEO checklist: how to get cited by ChatGPT, Claude and Perplexity in 2026
Most 'AI SEO' guides recycle 2010s tactics with new vocabulary. This is what actually moves the dial in 2026 — split into things you control on your site, things that happen off your site, and how to measure what's working.
When someone asks ChatGPT, Claude, Perplexity, Bing Copilot or Google AI Overviews a question, the answer often cites two to five specific sources. Generative Engine Optimisation — GEO — is the work of being one of those sources.
It overlaps heavily with classical SEO. Both reward clear writing, sound structure, and machine-readable signals. But there are AI-specific moves that classic SEO doesn’t cover, and there are some old SEO moves that LLMs simply don’t care about. Here’s the pragmatic 2026 version, split into what’s on your site, what’s off it, and how to know if it’s working.
On-site: the boring fundamentals first
These are the things you control. None of them are exotic. Most sites still get them wrong.
robots.txt should explicitly allow AI crawlers
A surprising number of sites silently block ChatGPT, Claude or Perplexity by leaving the default. Each of the major LLM providers has a separate user agent — and some have two (one for training, one for live retrieval). Allow them deliberately:
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-User
Allow: /
User-agent: Google-Extended
Allow: /
Block the ones you don’t want — but don’t block them by accident. A WordPress security plugin or a CDN’s “block bots” rule can quietly take you out of the citation pool.
llms.txt — the cheat sheet for LLMs
llms.txt is a proposed standard (from Jeremy Howard) that puts a curated, plain-text summary of your site at /llms.txt. Think of it as the elevator pitch for an LLM that has thirty seconds to decide if you’re worth citing.
It’s not yet universally read, but the cost of having one is near-zero and the upside is real. Keep it short, factual, and link-rich. Cover: what you do, who you serve, your services with one-line descriptions, recent work, how to engage, prices if you’re brave enough.
Schema.org markup that actually means something
Most schema is decorative. The bits that matter for AI surfaces:
Organization/LocalBusiness/ProfessionalServicewith@id,geo,address,priceRange,knowsAbout,sameAsto your social profiles. ThesameAsarray is one of the strongest authority signals you can give.Personfor the founder or principal author, linked from the Organization viafounderand from articles viaauthor. LLMs trust content with named, traceable authors more than anonymous content. This is E-E-A-T (Experience, Expertise, Authoritativeness, Trust) in schema form.Servicefor each service you offer, withserviceTypeandareaServed. Bonus: includeOfferblocks with explicit prices when you can.FAQPageon pages where customers actually have questions. LLMs pull FAQ answers as direct citations more often than long-form prose.BlogPosting/Articleon every published article, withdatePublished,author,keywordsandmainEntityOfPage.BreadcrumbListon every nested page so LLMs understand site structure.
Validate everything in Schema.org Validator and Google’s Rich Results Test. One broken JSON-LD block can poison the rest.
Content that’s quotable, not just rankable
LLMs cite passages, not pages. Three rules for writing citation-friendly content:
- Lead with the specific claim. “Most Google Ads accounts waste 20-40% of spend on conversion tracking errors” is citable. “Conversion tracking is important for paid search performance” is not.
- Use named patterns. Give things names. “PMax cannibalisation,” “branded search bloat,” “GCLID drop-off” — LLMs latch onto labels because users search for them.
- Include numbers and methodology. Sources with concrete data (“from 50+ audits, ~25% of spend is leaking”) beat hand-wave sources every time. If you can say how you arrived at the number, even better.
Headings, structure, and the “extractable answer” pattern
LLMs love content where the question is in the heading and the first sentence answers it. Structure articles so any H2 + first paragraph stands alone as a complete, citable mini-answer.
Bad: H2 “Conversion tracking” → first paragraph “There are many things to think about with conversion tracking…”
Good: H2 “What’s the most common Google Ads tracking error?” → first paragraph “The most common error is firing the conversion tag on page load instead of form submit. This causes the algorithm to optimise for page views, not actual conversions.”
Sitemap, canonical URLs, and clean internal linking
These are old SEO hygiene that still matter. A live sitemap-index.xml. One canonical URL per page. Internal links between related content using descriptive anchor text. No orphan pages.
Off-site: the part most “GEO guides” skip
This is where GEO actually gets won or lost. LLMs cite sources the broader web cites. A site can have flawless schema and zero citations if no one else mentions it.
Structured directory listings
Some directories disproportionately feed LLM training and retrieval data. The high-impact ones in 2026:
- Clutch and DesignRush for agencies
- Google Business Profile + Bing Places (Bing feeds Copilot, ChatGPT search and DuckDuckGo)
- Crunchbase for company data
- Wikidata — an underrated free win. Create an entry for your company with structured properties: founded, HQ, services, sameAs links. LLMs read it.
Each directory listing should match your site’s Organization schema exactly — same legal name, same address, same VAT/registration number. Inconsistencies hurt.
Reddit, Stack Exchange, and forums you’d never think of
LLMs are trained on, and retrieve from, Reddit at disproportionate weight. A handful of helpful, non-spammy answers in the relevant subreddits — under a consistent username with your company in the bio — gets cited far more often than a polished blog post on your own site.
Pick three or four subreddits where your customers genuinely hang out. r/PPC, r/SEO, r/web_design, r/smallbusinessuk for our industry. Answer questions properly. Don’t pitch. Over six months you’ll start seeing your username (and sometimes your site) come up in LLM citations on related queries.
Editorial mentions and bylines
A guest post on Search Engine Journal, Smashing Magazine, or your industry’s equivalent is worth ten posts on your own site for citation purposes. Reason: LLMs already trust those domains, and your byline + author bio + link create a chain of association.
Aim for two or three a year. Quality over quantity.
Podcast appearances with written show notes
Audio isn’t directly indexed, but the show notes on the podcast’s site usually are — and they typically include your name, company, links, and the topics you covered. A 30-minute appearance often produces a higher citation lift than a 2,000-word article you wrote yourself.
Press, even small
A mention in The Drum, Campaign UK, Prolific London, or your local business journal is worth chasing. LLMs weight news domains heavily because their training pipelines use them as authoritative sources.
Monitoring: how to know it’s working
You can’t optimise for LLM citation if you don’t measure it. The pragmatic stack:
Manual baseline
Pick eight to ten queries you’d want to rank on. Examples for an agency:
- “best Google Ads agency in London for £2k/month budget”
- “who does PPC audits for UK businesses”
- “London web design studios that build with Astro”
Run them in ChatGPT (with web search on), Claude, Perplexity, Bing Copilot, and Google AI Overviews. Screenshot the results. Diff month-over-month. This sounds primitive but it’s the most honest signal you’ll get.
Automated tools
Profound, AthenaHQ, and Otterly.ai track LLM citations across providers. Pick one if you want it on autopilot. Worth the £100-200/month if AI search is a meaningful channel.
GA4 referrer traffic
ChatGPT, Perplexity, and Copilot now send identifiable referrer traffic. Check Acquisition → Traffic acquisition for chatgpt.com, perplexity.ai, copilot.microsoft.com, google.com/search/ai. Volume here grew ~5x year-over-year for most B2B sites we monitor.
Server logs for crawler hits
If you have access to raw server logs, grep for the AI crawler user agents. Frequency of GPTBot, ClaudeBot, and PerplexityBot hits correlates loosely with how much fresh content these models are pulling from your site. A sudden drop is a flag.
What to skip
Some “GEO” advice circulating in 2026 that we haven’t seen actually move anything:
- Stuffing FAQ schema with twenty questions per page. Three to six well-answered questions beats twenty thin ones.
- Writing “for AI” with hidden text or alt-text dumps. LLMs don’t read hidden content any more than humans do, and Google penalises it.
- Buying AI-generated backlinks. Same as it ever was. LLMs are better at spotting low-quality networks than search engines, not worse.
- Obsessing over
llms.txtformatting micro-details. Have one. Make it factual. Move on.
The thirty-day version
If you want a tight starter list:
- Audit
robots.txt— explicitly allow GPTBot, ClaudeBot, OAI-SearchBot, PerplexityBot, Google-Extended. - Publish or update
llms.txt. - Upgrade your
Organizationschema toProfessionalServiceorLocalBusinesswith full address, geo, hours, sameAs. - Add
Personschema for your principal author, linked from articles. - Validate everything in Schema.org Validator and Google Rich Results Test.
- Claim or update Google Business Profile, Bing Places, Clutch, Wikidata.
- Pick three subreddits relevant to your customers; answer one question a week, properly, no pitch.
- Publish two long, opinionated articles with named patterns and concrete numbers.
- Set a manual baseline across ChatGPT/Claude/Perplexity/Copilot/Google AI for ten queries.
- Recheck in thirty days.
That’s enough to know whether GEO is moving anything for you within a quarter.
A deep dive on llms.txt
We mentioned llms.txt briefly above. It’s worth a longer look, because it’s the one piece of GEO hygiene that’s near-zero cost and most sites still don’t have one.
The proposal, originally from Jeremy Howard, is simple: serve a plain-text file at /llms.txt (and optionally a fuller /llms-full.txt) that gives an LLM a curated, machine-friendly summary of your site. No markup, no schema, no clever tricks — just a structured Markdown file an LLM can read in under a second and understand what you do.
The format itself is loose. The convention that’s settling in:
# Company name
> A one-line description of what you do, in plain English.
## About
- [About page](https://example.com/about): What the company does and who runs it.
- [Studio](https://example.com/studio): How we work.
## Services
- [Service 1](https://example.com/services/one): One-line description.
- [Service 2](https://example.com/services/two): One-line description.
## Recent work
- [Project name](https://example.com/work/project): What we shipped.
## Articles
- [Article title](https://example.com/articles/slug): What it covers.
What good llms.txt files have in common:
- They’re short. Two hundred to six hundred lines is plenty. The longer ones get skimmed and the signal dilutes.
- They’re factual. No marketing copy. The LLM is reading this to decide if you’re a worthwhile source, not to be sold to.
- They link generously. Every entry points at a canonical URL the LLM can fetch for more depth.
- They include prices, if you have them. Concrete numbers anchor a citation. Vague ones don’t.
- They’re up to date. A dead link in
/llms.txtis worse than no/llms.txtat all — it tells the model the site is unmaintained.
A real one to look at: ours sits at /llms.txt. It’s not perfect — we update it every few weeks as services and work change — but it’s the shape we think holds up. Steal liberally.
What llms.txt doesn’t do: it’s not yet universally read by every LLM, and it doesn’t replace good schema, good content, or directory listings. Treat it as one piece of the puzzle, written in an afternoon, costing nothing to maintain.
The schema types AI search engines actually parse
Most schema markup is decorative. The bits that AI search engines — ChatGPT, Claude, Perplexity, Copilot, Google AI Overviews — actually parse and reuse in citations is a much shorter list. Here’s what we see referenced in answers and what we see ignored.
Organization / ProfessionalService / LocalBusiness
The single highest-value block to get right. AI engines use it to establish entity identity — who you are, where you operate, how to contact you, what you’re known for. The fields that matter:
@id— a stable identifier (we usehttps://nerdster.design/#organization). LLMs use this to deduplicate references across pages.nameandlegalName(if different).- Full
addresswithaddressCountry. Don’t skipaddressCountry; it disambiguates UK businesses from US namesakes. geowith latitude and longitude. Cheap, effective.priceRange— even ’£££’ is better than nothing.knowsAbout— an array of the topics you’re authoritative on. Read by Perplexity and ChatGPT for topical relevance.sameAs— links to your LinkedIn, Crunchbase, X, Instagram, Clutch, Wikidata page if you have one. Single strongest authority signal in the block.hasCredentialfor certifications (Cyber Essentials, ISO 27001, etc).
Person for principals and authors
LLMs trust content with named, traceable authors more than anonymous content. Person schema on a /studio or /about page, linked from the Organization via founder, and from articles via author, builds the E-E-A-T chain. Include jobTitle, worksFor, and sameAs to a LinkedIn profile.
Service with Offer and explicit prices
For each service, a Service block with serviceType, areaServed, and — this is the underrated bit — an Offer block with explicit price and priceCurrency. AI engines lift these into ‘how much does X cost’ answers verbatim. We see it happen.
BlogPosting / Article
On every published article: datePublished, dateModified, author (linked to Person), keywords, mainEntityOfPage, headline, description, image. Skip none of these — each one is a hook for an extractor.
BreadcrumbList
On every nested page. Cheap, mechanical, helps LLMs reason about your site’s structure. We’ve seen Perplexity use breadcrumbs to phrase ‘this page is part of [section]’ citations.
FAQPage — but with restraint
LLMs do extract from FAQPage blocks, but quality matters more than quantity. Three to six well-answered questions beats twenty thin ones. The ‘questions’ should be things real users actually ask, not keyword stuffing dressed up as a question.
What we see ignored
A few schema types we put a lot of effort into in the 2010s that LLMs do not appear to weight in 2026:
Review/AggregateRating— useful for Google rich snippets, not for AI citations.Eventfor one-off webinars, unless you also have proper press coverage.Recipe,HowTo— outside their narrow surfaces, irrelevant for B2B.
The validator pass: run everything through Schema.org Validator and Google’s Rich Results Test. One broken JSON-LD block can poison the parsing of the rest.
Structured FAQs the right way
FAQPage schema deserves its own subsection because it’s the most-abused schema type on the web and the one with the biggest gap between ‘doing it’ and ‘doing it well’.
Where to put FAQs
Not on a generic /faq page. On the specific service or product page they belong to. An LLM reading a question like ‘how much does a PPC audit cost’ is much more likely to land on /services/google-ads than a generic FAQ page — and the FAQ schema needs to be where the relevant context is.
How many
Three to six per page. Genuinely. If you have twenty real questions, you have two pages worth of content, not twenty FAQs.
What makes a good FAQ
Three rules.
The question is something a user would actually type. Read your search-console queries. Read your inbox. The questions that arrive in your customers’ words — not the questions your marketing team wishes they were asking — are the ones AI engines find natural to lift.
The first sentence answers the question completely. LLMs cite passages, not pages. If your answer is ‘There are several factors to consider…’ before you get to the point, no one will quote it. Lead with the answer; expand after.
The answer is honest about edge cases. Hedged, accurate answers (‘usually 4 weeks, though enterprise builds can run to 12’) are more citable than absolute ones (‘always 4 weeks’). LLMs are trained to reward calibrated confidence.
A worked example
Bad:
Q: What are your prices? A: Our pricing varies depending on the scope of your project. Please get in touch for a quote.
Good:
Q: How much does a marketing site cost? A: A typical bespoke marketing site at Nerdster Design is sized to the project and launches in around four weeks. Larger jobs — apps, e-commerce, custom integrations — sit in a higher band. We always send a flat number before any work starts.
Same question. The second one is citable. We’ve watched ChatGPT quote almost that exact phrasing on ‘web design pricing London’ queries.
The schema itself
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "How much does a marketing site cost?",
"acceptedAnswer": {
"@type": "Answer",
"text": "A typical bespoke marketing site at Nerdster Design is sized to the project and launches in around four weeks. Larger jobs — apps, e-commerce, custom integrations — sit in a higher band."
}
}
]
}
</script>
Validate. Ship. Don’t stuff. Move on.
If you’d like the same checklist run against your site — what’s already in place, what’s missing, what’s worth fixing first — we offer it as a flat scope. Email mail@nerdster.design with your URL and we’ll come back within one working day.