Shopify generates both your robots.txt and sitemap.xml automatically, and you can't delete or fully replace either one. That's not a limitation. For most stores it's actually the right default. But "automatic" doesn't mean "optimal," and in 2026 there are crawlers hitting your store that didn't exist two years ago.
This guide covers what Shopify's defaults actually do, which rules you can change, how to customize robots.txt for both traditional search engines and AI crawlers, and the sitemap structure that controls what gets indexed. No fluff. Just the technical details you need.
If you're not sure whether search engines and AI models are actually finding your store, run a free AI authority check first. It'll show you exactly how visible your store is to ChatGPT, Perplexity, and other AI systems that are now driving purchase recommendations.
How Shopify's Default robots.txt Works
Every Shopify store has a robots.txt file at yourstore.com/robots.txt. Shopify generates it automatically, and until mid-2021 you couldn't touch it. Now you can customize it, but you need to understand the defaults before you start changing things.
The default robots.txt does two jobs: it tells search engine crawlers which parts of your store to index, and it points them to your sitemap. Here's what Shopify blocks by default:
| Blocked Path | Why It's Blocked | Can You Override? |
|---|---|---|
/admin | Store backend, no public value | No |
/cart | Dynamic per-user, creates duplicate content | No |
/checkout | Secure checkout process, not indexable | No |
/orders | Customer order data, private | No |
/account | Customer account pages, private | No |
/collections/*+* | Filtered/sorted collection URLs, duplicate content | Yes (via robots.txt.liquid) |
/*/collections/*+* | Same filters under locale paths | Yes (via robots.txt.liquid) |
/blogs/*/*.atom | Blog RSS feeds, not useful for indexing | Yes (via robots.txt.liquid) |
/search | Internal search results, thin/duplicate content | Yes (via robots.txt.liquid) |
/*?*variant=* | Product variant URLs, duplicate of main product | Yes (via robots.txt.liquid) |
The non-overridable blocks (/admin, /cart, /checkout, /orders, /account) make sense. These are private pages that should never appear in search results. Shopify hardcodes them regardless of what you put in your robots.txt.liquid file.
The overridable blocks are where you have room to make decisions. The collections/*+* filter blocks, for instance, prevent Google from indexing filtered collection pages like /collections/shoes?color=red&size=10. That's usually the right call. But if you've built SEO-specific filtered landing pages, you might want those indexed.
How to Edit robots.txt on Shopify
Shopify uses a Liquid template to generate robots.txt. Here's the process:
- Go to Online Store > Themes
- Click Actions > Edit Code (or the three-dot menu > Edit Code)
- Under the Templates folder, click Add a new template
- Select robots.txt from the template type dropdown
- Shopify creates
robots.txt.liquidpre-filled with the default rules - Edit the rules, then save
The template uses Liquid syntax. The key line is {{ content_for_header }} which outputs Shopify's base rules. Everything you add below that line stacks on top of the defaults. You can add new Disallow rules, new Allow rules, and additional Sitemap references.
My opinion: most Shopify stores should leave the default robots.txt alone. The defaults are well-thought-out and handle the common duplicate content issues. The only stores that need customization are those dealing with AI crawler control, international storefronts with complex locale paths, or very large catalogs where crawl budget is a real concern.
AI Crawlers and robots.txt in 2026
This is where things get interesting. Two years ago, the only crawlers you cared about were Googlebot and Bingbot. Now there's a growing list of AI crawlers hitting your store, and Shopify's default robots.txt says nothing about them.
Here are the AI crawlers you should know about:
| Crawler | Operated By | Purpose | Respects robots.txt? |
|---|---|---|---|
| GPTBot | OpenAI | Training data and browsing for ChatGPT | Yes |
| ChatGPT-User | OpenAI | Real-time browsing when users ask ChatGPT to visit a URL | Yes |
| ClaudeBot | Anthropic | Training data for Claude | Yes |
| PerplexityBot | Perplexity AI | Real-time web search for AI answers | Yes |
| Google-Extended | Training data for Gemini and AI Overviews | Yes | |
| Bytespider | ByteDance | Training data for TikTok's AI features | Inconsistent |
| Amazonbot | Amazon | Training data for Alexa and Amazon AI | Yes |
| FacebookBot | Meta | Training data for Meta AI features | Yes |
Here's the decision you need to make: do you want AI systems to index your store or not?
If you block GPTBot, ChatGPT can't browse your product pages. That means when someone asks "what's the best [your product category]?" ChatGPT won't have fresh data from your store to inform its answer. You become invisible to a growing purchase channel.
My strong opinion: most ecommerce stores should allow all AI crawlers. The upside of AI visibility far outweighs the theoretical downside of AI training on your product descriptions. Your product data isn't a trade secret. It's the same information customers see. And every AI crawl is a chance for your store to end up in AI-powered purchase recommendations.
For a deeper look at how AI search engines use your data, read our guide on what GEO means for Shopify stores.
The exception is content publishers or stores with extensive proprietary editorial content. If your blog content is your competitive moat, you might want to allow ChatGPT-User (real-time browsing that links back to you) while blocking GPTBot (training data extraction). That lets ChatGPT cite your content in real-time answers without using it for training.
Shopify's Sitemap Structure Explained
Your sitemap lives at yourstore.com/sitemap.xml. Unlike robots.txt, you can't customize it at all. Shopify generates and maintains it automatically.
The main sitemap.xml is actually a sitemap index file that links to child sitemaps:
| Child Sitemap | Contains | Auto-included? |
|---|---|---|
sitemap_products_1.xml | All published product URLs | Yes, if you have products |
sitemap_collections_1.xml | All published collection URLs | Yes, if you have collections |
sitemap_pages_1.xml | All published static pages | Yes, if you have pages |
sitemap_blogs_1.xml | All published blog post URLs | Yes, if you have blog posts |
Each child sitemap caps at 5,000 URLs. If you have more than 5,000 products, Shopify creates sitemap_products_2.xml, and so on. Each entry includes the URL and a lastmod timestamp that updates whenever the page content changes.
The sitemap also includes image references for products. Every product image gets a <image:image> tag inside the product sitemap entry, which helps Google Image Search index your product photos. This happens automatically. You don't need to do anything.
What the Sitemap Excludes
Shopify automatically excludes several page types from the sitemap:
- Password-protected pages. If your store or specific pages are behind a password, they won't appear.
- Pages set to "Hidden from search engines." The visibility toggle in the Shopify page editor controls this.
- Out-of-stock products (if configured). If you've set products to unpublish when out of stock, they'll drop from the sitemap.
- Duplicate product URLs. Shopify products can be reached via
/products/itemand/collections/category/products/item. Only the canonical/products/itemURL appears in the sitemap. - Cart, checkout, account, and admin pages. These match the robots.txt blocks.
This automatic exclusion of duplicate product URLs is one of the most underrated things Shopify does. Duplicate content is a common technical SEO problem, and Shopify handles it at the sitemap level without you lifting a finger.
Are search engines and AI actually finding your store?
A clean robots.txt and sitemap are table stakes. The real question is whether Google, ChatGPT, Perplexity, and other AI systems are recommending your products when customers ask. Run your store through True Margin's free AI Authority Checker to see exactly where you stand across every major AI platform.
Crawl Budget: Does It Matter for Shopify?
Crawl budget is how many pages a search engine will crawl on your site in a given period. It's a real concern for sites with millions of pages. For most Shopify stores? It's not something you need to worry about.
Google has said explicitly that crawl budget is only a concern for sites with more than a few thousand URLs that change frequently. If your store has under 10,000 products, your crawl budget is fine. Shopify's default robots.txt blocks the main sources of crawl waste (filtered collections, variant URLs, search pages), so the pages Google does crawl are the ones that actually matter.
Where crawl budget does matter on Shopify:
- Stores with 50,000+ products. At this scale, Google may not crawl every product page regularly. Make sure your highest-value products have internal links and aren't orphaned deep in collections.
- Stores generating thousands of tag pages. Shopify creates a URL for every product tag. If you use 200 tags across 100 collections, that's 20,000 thin tag pages that soak up crawl budget without providing SEO value. Block
/collections/*/tagin your robots.txt.liquid. - Stores with complex internationalization. Multiple locales with Shopify Markets means every page has multiple URL versions. Make sure hreflang tags are correct so Google doesn't crawl every locale as a separate page.
Sitemap Gotchas That Quietly Hurt Your SEO
Because you can't edit Shopify's sitemap directly, most store owners assume it's fine. Usually it is. But there are a few scenarios where the auto-generated sitemap creates problems.
1. Unpublished Products Still Showing
If you unpublish a product from your Online Store sales channel but leave it published on other channels (like the Buy Button or Wholesale), the product page still exists and may still appear in the sitemap. Confirm the product is set to "Hidden" on all sales channels if you want it fully removed.
2. Redirect Chains in the Sitemap
When you change a product's URL handle, Shopify creates a 301 redirect from the old URL to the new one. But the sitemap updates to the new URL immediately. The problem comes when you change the handle again. You might end up with redirect chains (old URL > intermediate URL > current URL) that slow down crawling. Check Google Search Console's coverage report for redirect issues periodically.
3. Blog Posts Without Dates in URLs
Shopify blog URLs follow the pattern /blogs/news/post-title. There's no date component, which means a post from 2022 and a post from 2026 look structurally identical to crawlers. The lastmod tag in the sitemap helps, but it updates every time you edit a post, not based on original publish date. For time-sensitive content, this can confuse freshness signals.
Connecting robots.txt and Sitemap to AI Visibility
Here's what most Shopify SEO guides miss: your robots.txt and sitemap don't just affect Google rankings. They directly impact whether AI systems can access, index, and cite your store.
The connection works like this. AI systems need to read your pages to recommend them. If your robots.txt blocks an AI crawler, that AI literally cannot see your product pages. Your sitemap tells these crawlers which pages exist and when they were last updated. Together, these two files are the gatekeepers for both traditional search and AI discovery.
The stores that will win in 2026 are treating their robots.txt as a strategic document, not a default they never look at. Here's how GEO differs from traditional SEO and why that matters for technical decisions like crawler access.
AI crawlers are especially reliant on structured data that's embedded in your product pages. Your sitemap brings them to the page. Your schema markup gives them the structured information they need to generate accurate product recommendations. These two things work together. A perfect sitemap is useless if the pages it points to have no structured data.
Practical robots.txt Customizations for 2026
My opinion: there are exactly four robots.txt customizations worth making on Shopify in 2026. Everything else is either handled by the defaults or irrelevant for stores under 100,000 pages.
1. Explicitly Allow AI Crawlers
Even though Shopify's default robots.txt doesn't block AI crawlers (because it only addresses User-agent: *), adding explicit allow rules makes your intent clear and future-proofs against any changes Shopify might make.
2. Block Tag Pages
If you use product tags, add Disallow: /collections/*/tagged/ to prevent crawl waste on thin tag-filtered pages. These pages rarely rank and eat crawl budget on larger stores.
3. Block Internal Search
Shopify's default blocks /search, but if you use a third-party search app that creates /pages/search-results or similar paths, block those too. Internal search results pages are thin content.
4. Add External Sitemaps
If you use a headless CMS or external blog platform alongside Shopify, you can add additional sitemap references in your robots.txt.liquid. This is useful when your content lives across multiple systems but you want everything discoverable from a single robots.txt.
How to Verify Everything Is Working
After making any robots.txt changes, verify them. Don't just save and assume.
- Visit
yourstore.com/robots.txtdirectly. Confirm your changes appear. Shopify caches this file, so clear your cache or use an incognito window. - Use Google's robots.txt Tester. In Search Console, go to Settings > robots.txt and test specific URLs against your rules. This shows you exactly what Google can and can't crawl.
- Check your sitemap for errors. Visit
yourstore.com/sitemap.xmland click through each child sitemap. Look for missing products, unexpected pages, or very oldlastmoddates. - Monitor Google Search Console's coverage report. After changes, watch for spikes in "Excluded" pages. A sudden jump means you might be blocking something you didn't intend to.
- Test AI visibility. Run an AI authority check to see if ChatGPT and Perplexity can access and cite your product pages. If you recently allowed AI crawlers, it may take a few weeks before changes reflect in AI responses.
Understanding how AI engines discover and cite your content is just as important as the technical setup. For context on how Reddit and third-party mentions factor into AI recommendations, see our guide on Reddit's role in AI citations.
Common Mistakes to Avoid
I see the same robots.txt and sitemap mistakes on Shopify stores over and over. Here are the ones that actually cost you traffic.
- Blocking Googlebot from CSS/JS files. Some older SEO advice says to block crawlers from your theme assets. Don't. Google needs to render your pages to understand them. Blocking CSS and JavaScript files means Google sees a broken page.
- Adding noindex and Disallow to the same pages. If you Disallow a page in robots.txt, Google can't crawl it, which means it can't see the noindex tag. Use one or the other. For pages you want permanently de-indexed, noindex is stronger. For pages you just want to save crawl budget on, Disallow is appropriate.
- Forgetting to submit the sitemap. Yes, Google will find it through the robots.txt reference. But manual submission in Search Console gives you immediate crawl data and error reporting. It takes 30 seconds. Do it.
- Blocking all bots while the store is in development. If you launch with a password page and then remove it, the robots.txt updates automatically. But if you added a blanket
Disallow: /in robots.txt.liquid during development and forgot to remove it, your live store is invisible to every crawler. I've seen this happen more times than I'd like to admit. - Ignoring AI crawlers entirely. In 2026, not making a decision about AI crawlers is itself a decision. The default is to allow them. If that's what you want, great. But know that it's happening, and make it intentional.
Frequently Asked Questions
Can you edit robots.txt on Shopify?
Yes. Since June 2021, Shopify lets you customize robots.txt through a theme file called robots.txt.liquid. Navigate to Online Store > Themes > Edit Code, then create a new template named robots.txt.liquid. You can add or remove Disallow rules, but Shopify enforces certain blocks (like /admin and /checkout) that can't be overridden.
Where is the Shopify sitemap located?
Every Shopify store has an auto-generated sitemap at yourstore.com/sitemap.xml. It's a sitemap index that links to child sitemaps for products, collections, pages, and blog posts. Shopify updates it automatically whenever you publish or remove content.
Does Shopify robots.txt block AI crawlers?
No. Shopify's default robots.txt does not block AI crawlers like GPTBot, ClaudeBot, or PerplexityBot. If you want to block specific AI crawlers, add custom rules through robots.txt.liquid. If you want to allow them (which I recommend for ecommerce), the defaults already do that.
How often does Shopify update the sitemap?
Shopify regenerates the sitemap automatically whenever you add, edit, or remove products, collections, pages, or blog posts. Changes typically show up within minutes. You don't need to manually trigger updates, though submitting the sitemap in Google Search Console speeds up discovery.
Should I submit my Shopify sitemap to Google Search Console?
Absolutely. While Google will find your sitemap through robots.txt, manually submitting it gives you crawl data, error reporting, and faster initial indexing. Submit yourstore.com/sitemap.xml once in Search Console. Google re-checks it automatically after that.
What pages does Shopify exclude from the sitemap?
Shopify excludes password-protected pages, pages marked as hidden from search engines, checkout and cart pages, admin pages, and duplicate product URLs created by collection paths. Only published, indexable content appears. Out-of-stock products are also excluded if you've configured them to unpublish automatically.

