Agent Commerce11 min read6 chapters

OAI-SearchBot and Robots.txt: The Visibility Baseline

The five-minute audit that decides whether ChatGPT can see your products at all. Why OAI-SearchBot is not GPTBot, the Cloudflare default-block trap, the JSON-LD quality dial, and how to verify visibility before spending a dollar on OpenAI Ads.

Cresva TeamMay 2026

Chapter 1The Five-Minute Audit

Open your site's robots.txt file. Search for the string OAI-SearchBot. If it is not present, your products are visible to ChatGPT by default. That is the right state. If OAI-SearchBot is present under a Disallow directive, your products are structurally invisible to every ChatGPT recommendation, including the paid ones you are about to bid on. Brands have spent six figures on OpenAI Ads while the bot that delivers the recommendation was blocked at the file that takes ten seconds to fix.

This is the most common visibility failure on the channel, and it does not show up in any analytics tool. CPC numbers look normal. Spend draws down. CAC reads high. The cause is a one-line config decision made years ago to block AI crawlers by default. The validator below evaluates your live file in place. Paste, read the per-crawler verdict, fix the one that matters.

Robots.txt is binary. Either OAI-SearchBot is allowed and ChatGPT can recommend your products, or it is not and no amount of ad spend changes that. The fix takes ten seconds once the diagnosis is right.

Interactive

Robots.txt visibility check

Paste your live robots.txt. The validator evaluates each AI crawler against your file. The default content shows the most common DTC failure: OAI-SearchBot explicitly disallowed.

OAI-SearchBot

BLOCKED

Indexes pages for ChatGPT retrieval and product recommendations. Block this and ChatGPT cannot recommend you.

User-agent: OAI-SearchBot → Disallow: /

GPTBot

BLOCKED

Collects training data for OpenAI foundation models. Most DTC brands prefer to block this (separate from SearchBot).

User-agent: GPTBot → Disallow: /

ClaudeBot

ALLOWED

Anthropic crawler. Allow if you want surfacing in Claude's product recommendations.

User-agent: *. No rule matches /. Default allow.

PerplexityBot

ALLOWED

Perplexity crawler. Allow if you want surfacing in Perplexity answers.

User-agent: *. No rule matches /. Default allow.

Googlebot

ALLOWED

Google's primary crawler. Reference point; almost certainly already allowed.

User-agent: *. No rule matches /. Default allow.

Critical: OAI-SearchBot is blocked.

Your products are structurally invisible to ChatGPT recommendations, including paid placements. Remove the Disallow under User-agent: OAI-SearchBot before spending on the channel.

Chapter 2OAI-SearchBot Is Not GPTBot

OpenAI operates two crawlers and they do different jobs. GPTBot collects data to train OpenAI's foundation models. OAI-SearchBot indexes pages for retrieval-time use in ChatGPT, including shopping responses and ad recommendations. Brands that read the AI-crawler-blocking discourse from 2023 to 2024 often defaulted to blocking both, on the reasonable theory that they did not want their content used to train competitors' models.

That logic still applies to GPTBot. It does not apply to OAI-SearchBot. Blocking GPTBot and allowing OAI-SearchBot is the configuration most DTC brands actually want. Your product pages do not get scraped into training data. They do get indexed for live ChatGPT recommendations.

The right default for most DTC brands

User-agent: GPTBot, Disallow: / on one block. User-agent: OAI-SearchBot, Allow: / on a second block. Two distinct User-agent groups, two distinct rules. The validator above flags any file that conflates the two.

If your robots.txt blocks both bots indiscriminately, you opted out of the channel at the file system level. The fix is additive: keep GPTBot blocked if you prefer, but add an explicit OAI-SearchBot Allow block above any wildcard rule.

Chapter 3The CDN-Level Trap

Robots.txt is necessary but not always sufficient. The second place visibility leaks is the CDN. Many bot-management defaults at Cloudflare, Akamai, and Fastly classify OAI-SearchBot as part of the broader AI-crawler category and block it at the WAF layer regardless of what your robots.txt says.

The check is straightforward. Find your CDN's bot-management or WAF rules. Look for any rule that blocks OAI-SearchBot, GPTBot collectively, or “AI bots” as a category. Explicitly allow OAI-SearchBot. The robots.txt fix means nothing if the CDN drops the request before it reaches your origin.

Cloudflare specifically

The default “AI Scrapers and Crawlers” rule on Cloudflare blocks OAI-SearchBot. If you enabled that rule any time after summer 2024, this is almost certainly your situation. The rule predates OAI-SearchBot's split from GPTBot, and most operators have not revisited it.

Chapter 4JSON-LD and Server-Side Rendering

Robots.txt is the binary switch. JSON-LD Product schema with server-side rendering is the quality dial. The crawler can reach your page. What it sees when it gets there is the next problem.

OAI-SearchBot reads structured product data the same way every modern crawler does, through the JSON-LD blob embedded in the page HTML. If your product pages are client-rendered (React or Vue mounted into an empty div), the crawler sees the empty shell and walks away with nothing usable. Server-side rendering is not optional for products you want surfaced. Next.js App Router does this by default. Older SPA implementations often do not.

Minimum JSON-LD fields the indexer reads

Product name and brand (canonical names, not marketing copy).
Description (server-rendered, complete, not truncated).
Image URL (canonical, accessible without auth).
Price and price currency (current, not promotional).
Availability (in stock, out of stock, preorder).
GTIN, MPN, or SKU where applicable. Identifiers fix many ambiguity errors.
Aggregate rating and review count. Recommendation rate correlates with review density.

Missing fields do not block indexing, but they reduce the likelihood of a recommendation against a competitor whose schema is complete. The product-feed optimization guide covers the schema-completeness work in detail.

Chapter 5Verifying Visibility

After the robots.txt fix and the JSON-LD audit, the only honest test of visibility is to ask ChatGPT directly. Pick five questions a buyer in your category would ask. Run them through ChatGPT. Note which brands surface, in what order, with what reasoning. Repeat the same five queries a week later. The variance gives you a sense of how stable your recommendation slot is.

If competitor brands surface and yours does not despite a clean robots.txt and complete JSON-LD, the issue is brand authority, not technical. The model recommends the brands it has more signal density for. Reviews, expert citations, editorial coverage, and third-party validation are the inputs that fix this. The technical baseline is necessary but not sufficient. The agent-visibility playbook covers the brand-authority side in detail.

The technical baseline gets the crawler in. Brand-authority density determines whether the model picks you over a competitor with the same baseline. Most DTC brands optimizing for ChatGPT recommendation are missing the technical layer; the ones with the technical layer in place often miss the authority layer next.

Chapter 6Other Invisibility Traps

Four failure modes show up after robots.txt is in place. Each is meaningful drag on recommendation rate. Each is fixable inside a week.

The post-robots.txt checklist

Stale sitemap.xml lastmod dates. The crawler reads cached content as authoritative when the lastmod says nothing changed. Update sitemap.xml on every product change.
Aggressive CDN cache headers. Long max-age values tell the crawler your page has not changed. If you ship daily catalog updates, your TTL should reflect that.
JSON-LD that contradicts the visible page (price mismatch, availability mismatch, stale review count). The crawler reads JSON-LD as canonical; mismatches lose the recommendation.
Missing structured-data fields the indexer specifically benefits from. Aggregate rating, GTIN, and brand entity are the three most-often-missed.

If you are walking through this list against your own site and finding two or three issues, that is normal. Most DTC brands start from a default-blocked or partially-blocked state because the 2023 AI-crawler discourse told operators to block aggressively, and nobody revisited the decision when OpenAI started recommending products in late 2025. The five-minute audit catches the legacy decision before it costs you another quarter of spend. If your unit economics are the question, the ChatGPT Ads unit-economics guide runs the math.

The robots.txt audit is the start. Cresva runs the same visibility check continuously across every AI surface (ChatGPT, Claude, Perplexity, Gemini), surfaces config drift, and flags new CDN bot-management rules before they cost you a quarter of spend.

Join Early Access See OpenAI Ads

Agent Commerce

Agent Visibility Playbook: Getting Recommended by AI

How to monitor, measure, and improve your brand's visibility across ChatGPT, Perplexity, Claude, and Gemini. From tracking agent mentions to optimizing for recommendation.

9 min5 chapters

Agent Commerce

Optimizing Your Product Feed for AI Agents

A step-by-step playbook for structuring product titles, descriptions, attributes, and schema markup so AI agents can accurately parse, evaluate, and recommend your products over competitors.

10 min7 chapters

Written by the Cresva Team. Questions? Email us.

Share Share