OAI-SearchBot and Robots.txt: The Visibility Baseline
The five-minute audit that decides whether ChatGPT can see your products at all. Why OAI-SearchBot is not GPTBot, the Cloudflare default-block trap, the JSON-LD quality dial, and how to verify visibility before spending a dollar on OpenAI Ads.
Chapter 1The Five-Minute Audit
Open your site's robots.txt file. Search for the string OAI-SearchBot. If it is not present, your products are visible to ChatGPT by default. That is the right state. If OAI-SearchBot is present under a Disallow directive, your products are structurally invisible to every ChatGPT recommendation, including the paid ones you are about to bid on. Brands have spent six figures on OpenAI Ads while the bot that delivers the recommendation was blocked at the file that takes ten seconds to fix.
This is the most common visibility failure on the channel, and it does not show up in any analytics tool. CPC numbers look normal. Spend draws down. CAC reads high. The cause is a one-line config decision made years ago to block AI crawlers by default. The validator below evaluates your live file in place. Paste, read the per-crawler verdict, fix the one that matters.
Interactive
Robots.txt visibility check
Paste your live robots.txt. The validator evaluates each AI crawler against your file. The default content shows the most common DTC failure: OAI-SearchBot explicitly disallowed.
OAI-SearchBot
BLOCKEDIndexes pages for ChatGPT retrieval and product recommendations. Block this and ChatGPT cannot recommend you.
User-agent: OAI-SearchBot → Disallow: /
GPTBot
BLOCKEDCollects training data for OpenAI foundation models. Most DTC brands prefer to block this (separate from SearchBot).
User-agent: GPTBot → Disallow: /
ClaudeBot
ALLOWEDAnthropic crawler. Allow if you want surfacing in Claude's product recommendations.
User-agent: *. No rule matches /. Default allow.
PerplexityBot
ALLOWEDPerplexity crawler. Allow if you want surfacing in Perplexity answers.
User-agent: *. No rule matches /. Default allow.
Googlebot
ALLOWEDGoogle's primary crawler. Reference point; almost certainly already allowed.
User-agent: *. No rule matches /. Default allow.
Critical: OAI-SearchBot is blocked.
Your products are structurally invisible to ChatGPT recommendations, including paid placements. Remove the Disallow under User-agent: OAI-SearchBot before spending on the channel.
Chapter 2OAI-SearchBot Is Not GPTBot
OpenAI operates two crawlers and they do different jobs. GPTBot collects data to train OpenAI's foundation models. OAI-SearchBot indexes pages for retrieval-time use in ChatGPT, including shopping responses and ad recommendations. Brands that read the AI-crawler-blocking discourse from 2023 to 2024 often defaulted to blocking both, on the reasonable theory that they did not want their content used to train competitors' models.
That logic still applies to GPTBot. It does not apply to OAI-SearchBot. Blocking GPTBot and allowing OAI-SearchBot is the configuration most DTC brands actually want. Your product pages do not get scraped into training data. They do get indexed for live ChatGPT recommendations.
The right default for most DTC brands
Chapter 3The CDN-Level Trap
Robots.txt is necessary but not always sufficient. The second place visibility leaks is the CDN. Many bot-management defaults at Cloudflare, Akamai, and Fastly classify OAI-SearchBot as part of the broader AI-crawler category and block it at the WAF layer regardless of what your robots.txt says.
The check is straightforward. Find your CDN's bot-management or WAF rules. Look for any rule that blocks OAI-SearchBot, GPTBot collectively, or “AI bots” as a category. Explicitly allow OAI-SearchBot. The robots.txt fix means nothing if the CDN drops the request before it reaches your origin.
Cloudflare specifically
Chapter 4JSON-LD and Server-Side Rendering
Robots.txt is the binary switch. JSON-LD Product schema with server-side rendering is the quality dial. The crawler can reach your page. What it sees when it gets there is the next problem.
OAI-SearchBot reads structured product data the same way every modern crawler does, through the JSON-LD blob embedded in the page HTML. If your product pages are client-rendered (React or Vue mounted into an empty div), the crawler sees the empty shell and walks away with nothing usable. Server-side rendering is not optional for products you want surfaced. Next.js App Router does this by default. Older SPA implementations often do not.
Minimum JSON-LD fields the indexer reads
Product name and brand (canonical names, not marketing copy).
Description (server-rendered, complete, not truncated).
Image URL (canonical, accessible without auth).
Price and price currency (current, not promotional).
Availability (in stock, out of stock, preorder).
GTIN, MPN, or SKU where applicable. Identifiers fix many ambiguity errors.
Aggregate rating and review count. Recommendation rate correlates with review density.
Missing fields do not block indexing, but they reduce the likelihood of a recommendation against a competitor whose schema is complete. The product-feed optimization guide covers the schema-completeness work in detail.
Chapter 5Verifying Visibility
After the robots.txt fix and the JSON-LD audit, the only honest test of visibility is to ask ChatGPT directly. Pick five questions a buyer in your category would ask. Run them through ChatGPT. Note which brands surface, in what order, with what reasoning. Repeat the same five queries a week later. The variance gives you a sense of how stable your recommendation slot is.
If competitor brands surface and yours does not despite a clean robots.txt and complete JSON-LD, the issue is brand authority, not technical. The model recommends the brands it has more signal density for. Reviews, expert citations, editorial coverage, and third-party validation are the inputs that fix this. The technical baseline is necessary but not sufficient. The agent-visibility playbook covers the brand-authority side in detail.
Chapter 6Other Invisibility Traps
Four failure modes show up after robots.txt is in place. Each is meaningful drag on recommendation rate. Each is fixable inside a week.
The post-robots.txt checklist
Stale sitemap.xml lastmod dates. The crawler reads cached content as authoritative when the lastmod says nothing changed. Update sitemap.xml on every product change.
Aggressive CDN cache headers. Long max-age values tell the crawler your page has not changed. If you ship daily catalog updates, your TTL should reflect that.
JSON-LD that contradicts the visible page (price mismatch, availability mismatch, stale review count). The crawler reads JSON-LD as canonical; mismatches lose the recommendation.
Missing structured-data fields the indexer specifically benefits from. Aggregate rating, GTIN, and brand entity are the three most-often-missed.
If you are walking through this list against your own site and finding two or three issues, that is normal. Most DTC brands start from a default-blocked or partially-blocked state because the 2023 AI-crawler discourse told operators to block aggressively, and nobody revisited the decision when OpenAI started recommending products in late 2025. The five-minute audit catches the legacy decision before it costs you another quarter of spend. If your unit economics are the question, the ChatGPT Ads unit-economics guide runs the math.
The robots.txt audit is the start. Cresva runs the same visibility check continuously across every AI surface (ChatGPT, Claude, Perplexity, Gemini), surfaces config drift, and flags new CDN bot-management rules before they cost you a quarter of spend.
Agent Visibility Playbook: Getting Recommended by AI
How to monitor, measure, and improve your brand's visibility across ChatGPT, Perplexity, Claude, and Gemini. From tracking agent mentions to optimizing for recommendation.
Optimizing Your Product Feed for AI Agents
A step-by-step playbook for structuring product titles, descriptions, attributes, and schema markup so AI agents can accurately parse, evaluate, and recommend your products over competitors.