Article

GEO checklist: how to get cited by ChatGPT, Claude and Perplexity in 2026

Most 'AI SEO' guides recycle 2010s tactics with new vocabulary. This is what actually moves the dial in 2026 — split into things you control on your site, things that happen off your site, and how to measure what's working.

By Deepanshu Sahni · 8 May 2026 · 11 min read

SEO GEO AI Search llms.txt Schema Generative Engine Optimisation

When someone asks ChatGPT, Claude, Perplexity, Bing Copilot or Google AI Overviews a question, the answer often cites two to five specific sources. Generative Engine Optimisation — GEO — is the work of being one of those sources.

It overlaps heavily with classical SEO. Both reward clear writing, sound structure, and machine-readable signals. But there are AI-specific moves that classic SEO doesn’t cover, and there are some old SEO moves that LLMs simply don’t care about. Here’s the pragmatic 2026 version, split into what’s on your site, what’s off it, and how to know if it’s working.

On-site: the boring fundamentals first

These are the things you control. None of them are exotic. Most sites still get them wrong.

`robots.txt` should explicitly allow AI crawlers

A surprising number of sites silently block ChatGPT, Claude or Perplexity by leaving the default. Each of the major LLM providers has a separate user agent — and some have two (one for training, one for live retrieval). Allow them deliberately:

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Google-Extended
Allow: /

Block the ones you don’t want — but don’t block them by accident. A WordPress security plugin or a CDN’s “block bots” rule can quietly take you out of the citation pool.

`llms.txt` — the cheat sheet for LLMs

llms.txt is a proposed standard (from Jeremy Howard) that puts a curated, plain-text summary of your site at /llms.txt. Think of it as the elevator pitch for an LLM that has thirty seconds to decide if you’re worth citing.

It’s not yet universally read, but the cost of having one is near-zero and the upside is real. Keep it short, factual, and link-rich. Cover: what you do, who you serve, your services with one-line descriptions, recent work, how to engage, prices if you’re brave enough.

Schema.org markup that actually means something

Most schema is decorative. The bits that matter for AI surfaces:

Organization / LocalBusiness / ProfessionalService with @id, geo, address, priceRange, knowsAbout, sameAs to your social profiles. The sameAs array is one of the strongest authority signals you can give.
Person for the founder or principal author, linked from the Organization via founder and from articles via author. LLMs trust content with named, traceable authors more than anonymous content. This is E-E-A-T (Experience, Expertise, Authoritativeness, Trust) in schema form.
Service for each service you offer, with serviceType and areaServed. Bonus: include Offer blocks with explicit prices when you can.
FAQPage on pages where customers actually have questions. LLMs pull FAQ answers as direct citations more often than long-form prose.
BlogPosting / Article on every published article, with datePublished, author, keywords and mainEntityOfPage.
BreadcrumbList on every nested page so LLMs understand site structure.

Validate everything in Schema.org Validator and Google’s Rich Results Test. One broken JSON-LD block can poison the rest.

Content that’s quotable, not just rankable

LLMs cite passages, not pages. Three rules for writing citation-friendly content:

Lead with the specific claim. “Most Google Ads accounts waste 20-40% of spend on conversion tracking errors” is citable. “Conversion tracking is important for paid search performance” is not.
Use named patterns. Give things names. “PMax cannibalisation,” “branded search bloat,” “GCLID drop-off” — LLMs latch onto labels because users search for them.
Include numbers and methodology. Sources with concrete data (“from 50+ audits, ~25% of spend is leaking”) beat hand-wave sources every time. If you can say how you arrived at the number, even better.

Headings, structure, and the “extractable answer” pattern

LLMs love content where the question is in the heading and the first sentence answers it. Structure articles so any H2 + first paragraph stands alone as a complete, citable mini-answer.

Bad: H2 “Conversion tracking” → first paragraph “There are many things to think about with conversion tracking…”

Good: H2 “What’s the most common Google Ads tracking error?” → first paragraph “The most common error is firing the conversion tag on page load instead of form submit. This causes the algorithm to optimise for page views, not actual conversions.”

Sitemap, canonical URLs, and clean internal linking

These are old SEO hygiene that still matter. A live sitemap-index.xml. One canonical URL per page. Internal links between related content using descriptive anchor text. No orphan pages.

Off-site: the part most “GEO guides” skip

This is where GEO actually gets won or lost. LLMs cite sources the broader web cites. A site can have flawless schema and zero citations if no one else mentions it.

Structured directory listings

Some directories disproportionately feed LLM training and retrieval data. The high-impact ones in 2026:

Clutch and DesignRush for agencies
Google Business Profile + Bing Places (Bing feeds Copilot, ChatGPT search and DuckDuckGo)
Crunchbase for company data
Wikidata — an underrated free win. Create an entry for your company with structured properties: founded, HQ, services, sameAs links. LLMs read it.

Each directory listing should match your site’s Organization schema exactly — same legal name, same address, same VAT/registration number. Inconsistencies hurt.

Reddit, Stack Exchange, and forums you’d never think of

LLMs are trained on, and retrieve from, Reddit at disproportionate weight. A handful of helpful, non-spammy answers in the relevant subreddits — under a consistent username with your company in the bio — gets cited far more often than a polished blog post on your own site.

Pick three or four subreddits where your customers genuinely hang out. r/PPC, r/SEO, r/web_design, r/smallbusinessuk for our industry. Answer questions properly. Don’t pitch. Over six months you’ll start seeing your username (and sometimes your site) come up in LLM citations on related queries.

Editorial mentions and bylines

A guest post on Search Engine Journal, Smashing Magazine, or your industry’s equivalent is worth ten posts on your own site for citation purposes. Reason: LLMs already trust those domains, and your byline + author bio + link create a chain of association.

Aim for two or three a year. Quality over quantity.

Podcast appearances with written show notes

Audio isn’t directly indexed, but the show notes on the podcast’s site usually are — and they typically include your name, company, links, and the topics you covered. A 30-minute appearance often produces a higher citation lift than a 2,000-word article you wrote yourself.

Press, even small

A mention in The Drum, Campaign UK, Prolific London, or your local business journal is worth chasing. LLMs weight news domains heavily because their training pipelines use them as authoritative sources.

Monitoring: how to know it’s working

You can’t optimise for LLM citation if you don’t measure it. The pragmatic stack:

Manual baseline

Pick eight to ten queries you’d want to rank on. Examples for an agency:

“best Google Ads agency in London for £2k/month budget”
“who does PPC audits for UK businesses”
“London web design studios that build with Astro”

Run them in ChatGPT (with web search on), Claude, Perplexity, Bing Copilot, and Google AI Overviews. Screenshot the results. Diff month-over-month. This sounds primitive but it’s the most honest signal you’ll get.

Automated tools

Profound, AthenaHQ, and Otterly.ai track LLM citations across providers. Pick one if you want it on autopilot. Worth the £100-200/month if AI search is a meaningful channel.

GA4 referrer traffic

ChatGPT, Perplexity, and Copilot now send identifiable referrer traffic. Check Acquisition → Traffic acquisition for chatgpt.com, perplexity.ai, copilot.microsoft.com, google.com/search/ai. Volume here grew ~5x year-over-year for most B2B sites we monitor.

Server logs for crawler hits

If you have access to raw server logs, grep for the AI crawler user agents. Frequency of GPTBot, ClaudeBot, and PerplexityBot hits correlates loosely with how much fresh content these models are pulling from your site. A sudden drop is a flag.

What to skip

Some “GEO” advice circulating in 2026 that we haven’t seen actually move anything:

Stuffing FAQ schema with twenty questions per page. Three to six well-answered questions beats twenty thin ones.
Writing “for AI” with hidden text or alt-text dumps. LLMs don’t read hidden content any more than humans do, and Google penalises it.
Buying AI-generated backlinks. Same as it ever was. LLMs are better at spotting low-quality networks than search engines, not worse.
Obsessing over llms.txt formatting micro-details. Have one. Make it factual. Move on.

The thirty-day version

If you want a tight starter list:

Audit robots.txt — explicitly allow GPTBot, ClaudeBot, OAI-SearchBot, PerplexityBot, Google-Extended.
Publish or update llms.txt.
Upgrade your Organization schema to ProfessionalService or LocalBusiness with full address, geo, hours, sameAs.
Add Person schema for your principal author, linked from articles.
Validate everything in Schema.org Validator and Google Rich Results Test.
Claim or update Google Business Profile, Bing Places, Clutch, Wikidata.
Pick three subreddits relevant to your customers; answer one question a week, properly, no pitch.
Publish two long, opinionated articles with named patterns and concrete numbers.
Set a manual baseline across ChatGPT/Claude/Perplexity/Copilot/Google AI for ten queries.
Recheck in thirty days.

That’s enough to know whether GEO is moving anything for you within a quarter.

A deep dive on `llms.txt`

We mentioned llms.txt briefly above. It’s worth a longer look, because it’s the one piece of GEO hygiene that’s near-zero cost and most sites still don’t have one.

The proposal, originally from Jeremy Howard, is simple: serve a plain-text file at /llms.txt (and optionally a fuller /llms-full.txt) that gives an LLM a curated, machine-friendly summary of your site. No markup, no schema, no clever tricks — just a structured Markdown file an LLM can read in under a second and understand what you do.

The format itself is loose. The convention that’s settling in:

# Company name

> A one-line description of what you do, in plain English.

## About
- [About page](https://example.com/about): What the company does and who runs it.
- [Studio](https://example.com/studio): How we work.

## Services
- [Service 1](https://example.com/services/one): One-line description.
- [Service 2](https://example.com/services/two): One-line description.

## Recent work
- [Project name](https://example.com/work/project): What we shipped.

## Articles
- [Article title](https://example.com/articles/slug): What it covers.

What good llms.txt files have in common:

They’re short. Two hundred to six hundred lines is plenty. The longer ones get skimmed and the signal dilutes.
They’re factual. No marketing copy. The LLM is reading this to decide if you’re a worthwhile source, not to be sold to.
They link generously. Every entry points at a canonical URL the LLM can fetch for more depth.
They include prices, if you have them. Concrete numbers anchor a citation. Vague ones don’t.
They’re up to date. A dead link in /llms.txt is worse than no /llms.txt at all — it tells the model the site is unmaintained.

A real one to look at: ours sits at /llms.txt. It’s not perfect — we update it every few weeks as services and work change — but it’s the shape we think holds up. Steal liberally.

What llms.txt doesn’t do: it’s not yet universally read by every LLM, and it doesn’t replace good schema, good content, or directory listings. Treat it as one piece of the puzzle, written in an afternoon, costing nothing to maintain.

The schema types AI search engines actually parse

Most schema markup is decorative. The bits that AI search engines — ChatGPT, Claude, Perplexity, Copilot, Google AI Overviews — actually parse and reuse in citations is a much shorter list. Here’s what we see referenced in answers and what we see ignored.

`Organization` / `ProfessionalService` / `LocalBusiness`

The single highest-value block to get right. AI engines use it to establish entity identity — who you are, where you operate, how to contact you, what you’re known for. The fields that matter:

@id — a stable identifier (we use https://nerdster.design/#organization). LLMs use this to deduplicate references across pages.
name and legalName (if different).
Full address with addressCountry. Don’t skip addressCountry; it disambiguates UK businesses from US namesakes.
geo with latitude and longitude. Cheap, effective.
priceRange — even ’£££’ is better than nothing.
knowsAbout — an array of the topics you’re authoritative on. Read by Perplexity and ChatGPT for topical relevance.
sameAs — links to your LinkedIn, Crunchbase, X, Instagram, Clutch, Wikidata page if you have one. Single strongest authority signal in the block.
hasCredential for certifications (Cyber Essentials, ISO 27001, etc).

`Person` for principals and authors

LLMs trust content with named, traceable authors more than anonymous content. Person schema on a /studio or /about page, linked from the Organization via founder, and from articles via author, builds the E-E-A-T chain. Include jobTitle, worksFor, and sameAs to a LinkedIn profile.

`Service` with `Offer` and explicit prices

For each service, a Service block with serviceType, areaServed, and — this is the underrated bit — an Offer block with explicit price and priceCurrency. AI engines lift these into ‘how much does X cost’ answers verbatim. We see it happen.

`BlogPosting` / `Article`

On every published article: datePublished, dateModified, author (linked to Person), keywords, mainEntityOfPage, headline, description, image. Skip none of these — each one is a hook for an extractor.

`BreadcrumbList`

On every nested page. Cheap, mechanical, helps LLMs reason about your site’s structure. We’ve seen Perplexity use breadcrumbs to phrase ‘this page is part of [section]’ citations.

`FAQPage` — but with restraint

LLMs do extract from FAQPage blocks, but quality matters more than quantity. Three to six well-answered questions beats twenty thin ones. The ‘questions’ should be things real users actually ask, not keyword stuffing dressed up as a question.

What we see ignored

A few schema types we put a lot of effort into in the 2010s that LLMs do not appear to weight in 2026:

Review / AggregateRating — useful for Google rich snippets, not for AI citations.
Event for one-off webinars, unless you also have proper press coverage.
Recipe, HowTo — outside their narrow surfaces, irrelevant for B2B.

The validator pass: run everything through Schema.org Validator and Google’s Rich Results Test. One broken JSON-LD block can poison the parsing of the rest.

Structured FAQs the right way

FAQPage schema deserves its own subsection because it’s the most-abused schema type on the web and the one with the biggest gap between ‘doing it’ and ‘doing it well’.

Where to put FAQs

Not on a generic /faq page. On the specific service or product page they belong to. An LLM reading a question like ‘how much does a PPC audit cost’ is much more likely to land on /services/google-ads than a generic FAQ page — and the FAQ schema needs to be where the relevant context is.

How many

Three to six per page. Genuinely. If you have twenty real questions, you have two pages worth of content, not twenty FAQs.

What makes a good FAQ

Three rules.

The question is something a user would actually type. Read your search-console queries. Read your inbox. The questions that arrive in your customers’ words — not the questions your marketing team wishes they were asking — are the ones AI engines find natural to lift.

The first sentence answers the question completely. LLMs cite passages, not pages. If your answer is ‘There are several factors to consider…’ before you get to the point, no one will quote it. Lead with the answer; expand after.

The answer is honest about edge cases. Hedged, accurate answers (‘usually 4 weeks, though enterprise builds can run to 12’) are more citable than absolute ones (‘always 4 weeks’). LLMs are trained to reward calibrated confidence.

A worked example

Bad:

Q: What are your prices? A: Our pricing varies depending on the scope of your project. Please get in touch for a quote.

Good:

Q: How much does a marketing site cost? A: A typical bespoke marketing site at Nerdster Design is sized to the project and launches in around four weeks. Larger jobs — apps, e-commerce, custom integrations — sit in a higher band. We always send a flat number before any work starts.

Same question. The second one is citable. We’ve watched ChatGPT quote almost that exact phrasing on ‘web design pricing London’ queries.

The schema itself

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "How much does a marketing site cost?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "A typical bespoke marketing site at Nerdster Design is sized to the project and launches in around four weeks. Larger jobs — apps, e-commerce, custom integrations — sit in a higher band."
      }
    }
  ]
}
</script>

Validate. Ship. Don’t stuff. Move on.

If you’d like the same checklist run against your site — what’s already in place, what’s missing, what’s worth fixing first — we offer it as a flat scope. Email mail@nerdster.design with your URL and we’ll come back within one working day.

GEO checklist: how to get cited by ChatGPT, Claude and Perplexity in 2026

On-site: the boring fundamentals first

`robots.txt` should explicitly allow AI crawlers

`llms.txt` — the cheat sheet for LLMs

Schema.org markup that actually means something

Content that’s quotable, not just rankable

Headings, structure, and the “extractable answer” pattern

Sitemap, canonical URLs, and clean internal linking

Off-site: the part most “GEO guides” skip

Structured directory listings

Reddit, Stack Exchange, and forums you’d never think of

Editorial mentions and bylines

Podcast appearances with written show notes

Press, even small

Monitoring: how to know it’s working

Manual baseline

Automated tools

GA4 referrer traffic

Server logs for crawler hits

What to skip

The thirty-day version

A deep dive on `llms.txt`

The schema types AI search engines actually parse

`Organization` / `ProfessionalService` / `LocalBusiness`

`Person` for principals and authors

`Service` with `Offer` and explicit prices

`BlogPosting` / `Article`

`BreadcrumbList`

`FAQPage` — but with restraint

What we see ignored

Structured FAQs the right way

Where to put FAQs

How many

What makes a good FAQ

A worked example

The schema itself

Want this lens on your account?

Performance Max is stealing credit from your Search campaigns. Here's how to take it back. →

On-site: the boring fundamentals first

robots.txt should explicitly allow AI crawlers

llms.txt — the cheat sheet for LLMs

Schema.org markup that actually means something

Content that’s quotable, not just rankable

Headings, structure, and the “extractable answer” pattern

Sitemap, canonical URLs, and clean internal linking

Off-site: the part most “GEO guides” skip

Structured directory listings

Reddit, Stack Exchange, and forums you’d never think of

Editorial mentions and bylines

Podcast appearances with written show notes

Press, even small

Monitoring: how to know it’s working

Manual baseline

Automated tools

GA4 referrer traffic

Server logs for crawler hits

What to skip

The thirty-day version

A deep dive on llms.txt

The schema types AI search engines actually parse

Organization / ProfessionalService / LocalBusiness

Person for principals and authors

Service with Offer and explicit prices

BlogPosting / Article

BreadcrumbList

FAQPage — but with restraint

What we see ignored

Structured FAQs the right way

Where to put FAQs

How many

What makes a good FAQ

A worked example

The schema itself

Want this lens on your account?

Performance Max is stealing credit from your Search campaigns. Here's how to take it back. →

`robots.txt` should explicitly allow AI crawlers

`llms.txt` — the cheat sheet for LLMs

A deep dive on `llms.txt`

`Organization` / `ProfessionalService` / `LocalBusiness`

`Person` for principals and authors

`Service` with `Offer` and explicit prices

`BlogPosting` / `Article`

`BreadcrumbList`

`FAQPage` — but with restraint