Pilot live: ACP for AI commerce.Explore ACP
Skip to content
Back to Blog

OAI-SearchBot, Robots.txt, and Why Most Brands Are Invisible to ChatGPT

The binary visibility switch most DTC brands do not know about. The robots.txt audit, the JSON-LD baseline, and the verification check that takes five minutes.

7 min readStrategy

Open your site's robots.txt file. Search for the string `OAI-SearchBot`. If it is not there, your products are visible to ChatGPT by default, which is good. If it is there as a `Disallow`, your products are structurally invisible to every ChatGPT recommendation, including the paid ones you are about to bid on. We have seen brands spend six figures on OpenAI Ads while the bot that delivers the recommendation is blocked at the file that takes ten seconds to fix.

This is the most common visibility failure on the channel, and it does not show up in any analytics tool. Your CPC numbers look normal, your spend draws down, your CAC reads high, and the cause is a one-line config decision made years ago to block AI crawlers by default. This post is the five-minute audit every DTC brand should run before spending another dollar.

OAI-SearchBot is not GPTBot. The distinction matters.

OpenAI operates two crawlers and they do different jobs. `GPTBot` collects data to train OpenAI's foundation models. `OAI-SearchBot` indexes pages for retrieval-time use in ChatGPT, including in shopping responses and ad recommendations. Brands that read the AI-crawler-blocking discourse from 2023-2024 often defaulted to blocking both, on the reasonable theory that they did not want their content used to train competitors' models. That logic still applies to GPTBot. It does not apply to OAI-SearchBot.

Blocking `GPTBot` and allowing `OAI-SearchBot` is the configuration most DTC brands actually want. Your product pages do not get scraped into training data. They do get indexed for live ChatGPT recommendations. If your robots.txt blocks both indiscriminately, you opted out of the channel at the file system level. The OpenAI merchant documentation covers the distinction publicly, but most operators have not read it because the merchant docs only landed in late 2025.

Two crawlers, two jobs

Same operator, different purposes. The configuration most DTC brands want is Allow on one, Disallow on the other.

Allow

OAI-SearchBot

Indexes pages for retrieval inside ChatGPT, including shopping responses and ad recommendations.

This is the crawler that makes you visible to OpenAI Ads.

Disallow

GPTBot

Collects data to train OpenAI's foundation models. Not involved in live recommendations.

Blocking this does not affect your visibility on the ad channel.

The three-line robots.txt audit

Run this against your live site's robots.txt right now. Replace `yourbrand.com` with your domain.

curl -s https://yourbrand.com/robots.txt | grep -i 'oai-searchbot\|gptbot\|user-agent'

Read the output carefully. The right configuration looks like this:

User-agent: OAI-SearchBot
Allow: /

User-agent: GPTBot
Disallow: /

Three things to check. First, OAI-SearchBot is explicitly allowed. Second, GPTBot is explicitly disallowed if you do not want your content used in foundation-model training. Third, there is no blanket `User-agent: ` with a `Disallow: /` above either block that would override them. The wildcard rule applies to any bot not matched by a more specific rule; if your file has `User-agent: ` followed by `Disallow: /`, every crawler including OAI-SearchBot reads that as a hard no.

If you are using Cloudflare or another CDN-level bot-management product, this is the second place visibility leaks. Many bot-management defaults block `OAI-SearchBot` as part of the broader AI-crawler category. Check your Cloudflare bot-management rules and explicitly allow OAI-SearchBot at the WAF level. The robots.txt fix is necessary but not sufficient if the CDN is dropping the request before it reaches your origin.

JSON-LD Product schema and server-side rendering

If robots.txt is the binary switch, JSON-LD Product schema with server-side rendering is the quality dial. The crawler can reach your page. What it sees when it gets there is the next problem.

OAI-SearchBot reads structured product data the same way every other modern crawler does, through the JSON-LD blob embedded in the page HTML. If your product pages are client-rendered (React or Vue mounted into an empty div), the crawler sees the empty shell and walks away with nothing usable. Server-side rendering is not optional for products you want surfaced. Next.js App Router does this by default. Older SPA implementations often do not.

The minimum JSON-LD payload OpenAI's indexer benefits from includes the product name, brand, description, image, price, availability, GTIN where applicable, and review aggregate. The OpenAI merchant page is explicit about the fields they parse. Missing fields do not block indexing, but they meaningfully reduce the likelihood of a recommendation against a competitor whose schema is complete. Schema completeness is one of the variables that turns up in the math piece's decision rules under the Amazon-presence section, because brands with poor schema lose to Amazon listings whose schema is always complete.

How to verify your visibility (the practical check)

After the robots.txt fix and the JSON-LD audit, the only honest test of visibility is to ask ChatGPT directly. Pick five questions a buyer in your category would ask. Run them through ChatGPT. Note which brands surface, in what order, with what reasoning. Repeat the same five queries a week later. The variance gives you a sense of how stable your recommendation slot is.

Things to watch for. If competitor brands surface and yours does not despite a clean robots.txt and complete JSON-LD, the issue is likely brand authority, not technical. The model is recommending the brands it has more signal density for. Reviews, expert citations, editorial coverage, and third-party validation are the inputs that fix this; the technical baseline is necessary but not sufficient.

If your brand surfaces with stale information (old pricing, outdated SKUs, last year's positioning), your product feed is fresh but the crawler is reading a cached or out-of-date copy. Check your sitemap.xml lastmod dates and confirm your CDN cache headers are not telling crawlers to ignore updates. The technical surface for this is identical to the SEO crawl-freshness work most operators already do; the bot you are optimizing for is just different.

If you want the same loop running continuously rather than as a quarterly audit, the OpenAI Ads page shows what Cresva agents do here. The mechanical version of the check is the same; the operator just does not have to remember to run it.

Beyond robots.txt: the other invisibility traps

Four failure modes show up after the robots.txt fix is in place. Each is a meaningful drag on recommendation rate and each is fixable inside a week.

Four post-robots.txt traps to audit

Each fixable inside a week. Run through all four before concluding the channel is working or not working for you.

01

Missing GTINs

OpenAI's indexer prefers products with globally unique identifiers (UPC, EAN, ISBN) so it can disambiguate your product from competitor SKUs. Founder-led and private-label brands often ship without them and get fuzzy-matched. Add GTINs where you can; document absence with `gtin: ''` rather than omitting the field.

02

Stale or thin inventory feeds

If your product feed updates daily but the crawler's snapshot is three weeks old, recommendations point at out-of-stock or repriced SKUs. Check CDN cache headers on robots.txt and sitemap.xml (max 24h), and confirm your feed serves `Last-Modified` and `ETag`.

03

Cloudflare or WAF-level AI-bot blocks

Many bot-management products block `OAI-SearchBot` by default as part of the broader AI-crawler category. The robots.txt fix does not override WAF rules. Audit Cloudflare bot-management and explicitly allow OAI-SearchBot at the WAF level.

04

Heavy JavaScript-driven content

The crawler reads first-paint HTML; it does not execute client-side scripts. If price, description, or reviews only render after JavaScript runs, server-side rendering is the only fix. Common on older Shopify themes that overuse client-side personalization.

If you are walking through this list against your own site and finding two or three issues, that is normal. Most DTC brands are starting from a default-blocked or partially-blocked state because the AI-crawler discourse of 2023 told operators to block aggressively, and nobody went back to revisit the decision when OpenAI started recommending products. The five-minute audit is what catches that legacy decision before it costs you another quarter of spend.

Robots.txt is binary. JSON-LD plus SSR is the quality dial. WAF-level bot rules are the second binary switch nobody checks. Run the three audits in this post and you have done the visibility baseline that most brands paying for OpenAI Ads have not done. If you want to fold this into the same continuous monitoring loop your team already runs on Meta and Google, you can try Cresva for free.

Cresva monitors your AI-crawler visibility continuously, not as a quarterly audit. Robots.txt config drift, schema gaps, WAF-level bot rules. The technical baseline that decides whether the model sees your brand at all.

Frequently asked questions

Should I block GPTBot if I am running OpenAI Ads?
Yes, if you want to prevent your content from being used in foundation-model training. The two crawlers do different jobs. GPTBot trains models, OAI-SearchBot serves recommendations in ChatGPT including paid ones. Blocking GPTBot does not affect your visibility on the ad channel. The most common DTC configuration is `Disallow GPTBot` and `Allow OAI-SearchBot`.
How long after I fix robots.txt does my visibility actually update?
Variable. OAI-SearchBot recrawls active pages roughly weekly for high-traffic sites and less often for lower-traffic ones. Force a recrawl by updating your sitemap.xml with current lastmod dates and submitting it through any SEO tool you use; the same submission paths most search engines respect. Visibility updates in ChatGPT recommendations typically follow the recrawl by a few days.
What if I use Shopify and do not control robots.txt directly?
Shopify exposes a robots.txt.liquid file in your theme that lets you add custom rules. The default Shopify robots.txt does not block OAI-SearchBot, but if your team customized it or a third-party app added rules, the custom file is where to check. Look in your theme settings under Code Editor → robots.txt.liquid and confirm no `Disallow` rule applies to OAI-SearchBot or to the bot's user-agent pattern.
Does any of this affect SEO or Google rankings?
No, in either direction. OAI-SearchBot is OpenAI's own crawler; Googlebot is separate and unaffected. Your Google rankings stay the same regardless of how you configure OAI-SearchBot access. Conversely, allowing OAI-SearchBot does not surface your products in Google AI Overviews; that is a separate crawler (`Googlebot-Extended`) governed by a separate set of rules.
Is this technical work worth doing if my AOV does not clear the math test from the cost post?
Yes. Even if paid OpenAI Ads does not work for your unit economics today, the same technical baseline determines whether you surface in organic ChatGPT recommendations, which is free. The cost math (covered in the real cost of ChatGPT ads) decides whether you bid. The visibility audit decides whether you exist on the surface at all.

Written by the Cresva Team

Have a question? Email us