most AI search engines — Perplexity, ChatGPT search, Gemini — are pulling from the same web. the difference is how they decide what to cite.
its not just backlinks anymore. its structure, clarity, and whether a machine can actually parse what your page is about in under 3 seconds.
use semantic HTML, not div soup
AI crawlers dont guess at your content hierarchy. they read h1, h2, h3, article, section, main tags.
if your site is built on divs with class names like "wrapper" and "container-inner", you have a problem. the crawler sees noise. it cant determine what the primary content is, what the headings are, or whether this page is a listicle or a how-to.
fix: audit your HTML. every page should have 1 h1, logical h2-h3 structure, and a main tag wrapping the article body.
write for direct answer extraction
AI search works by pulling a paragraph or sentence that directly answers a query. if your content never states a direct answer — just talks around it — it wont get cited.
bad: "there are many factors to consider when thinking about site structure for search engines"
good: "to rank in AI search, your pages need structured data, semantic HTML, and content that directly answers a single question per page"
1 page = 1 question = 1 direct answer in the first 100 words.
add schema markup
JSON-LD schema is the clearest signal you can send to an AI crawler. Article, HowTo, FAQPage, BreadcrumbList.
Perplexity cites sources with clear metadata. if your page has no author, no publish date, no article schema — it looks like noise compared to a site that does.
at minimum, every post needs:
Article schema with author, datePublished, headline
BreadcrumbList for site context
FAQPage if you have a Q&A section
internal linking is a context map
AI crawlers dont just read your page in isolation. they follow internal links to understand what your site is actually about.
if your blog posts link to nothing, the crawler treats them as orphan content. no context, no cluster signal, no citation.
every post should link to at least 2-3 related posts on the same topic. build topical clusters, not disconnected articles.
page speed is a filter, not a bonus
if your page loads in over 3 seconds, some AI crawlers deprioritise it. its not that they penalise slow pages — they just move on to the next source faster.
Core Web Vitals matter here. LCP under 2.5s, CLS under 0.1. if your site is on shared hosting running 14 plugins, you likely have a problem.

