🍱 Lunchbox Hands

robots.txt Generator

Build robots.txt files with user-agent rules and presets

Build your robots.txt file with a visual form. Add user-agent rules, allow and disallow paths, and use presets like Block AI Bots. Copy or download the result. Everything runs locally in your browser.

User-agent: *
Allow: /

What a robots.txt file does (and doesn't do)

A robots.txt file sits at the root of your domain (https://example.com/robots.txt) and tells crawlers which paths they may fetch. Google, Bing, and most engines check it before crawling. A missing file won't cause a penalty, but a careless one can accidentally block your whole site — surprisingly common right after a staging site goes live.

Crucial distinction: Disallow blocks crawling, not indexing. If Google has already indexed a URL and you later Disallow it, the page won't drop from results on its own. To remove a page you need a noindex meta tag or HTTP header — and Google must be able to crawl the page to see it. Blocking crawling of a page you want removed is a classic mistake.

Once generated, declare your sitemap with a Sitemap: directive — build one with the Sitemap Generator. Then run a full check with the SEO Analyzer, which verifies your robots.txt is reachable and well-formed. For the wider context, see the complete on-page SEO checklist.

How to use this tool

  1. Choose a preset (Allow All, Block AI Bots, Block Everything) or start from scratch.
  2. Add user-agent rules: * for all bots, or named crawlers like Googlebot or GPTBot.
  3. Add Disallow paths for anything you don't want crawled (e.g. /admin/, /staging/).
  4. Add an Allow rule if you need to open a specific path inside a disallowed directory.
  5. Add your Sitemap URL at the bottom, then copy or download and upload to your domain root.

Common rules and when to use them

  • Block AI training crawlers. Add User-agent: GPTBot with Disallow: / (and CCBot, anthropic-ai, etc.) to keep your content out of model training. The Block AI Bots preset handles it.
  • Block admin areas. Disallow: /wp-admin/ or /admin/ keeps login pages out of the index.
  • Block internal search. Result pages like /search?q= create thin duplicate content — disallow the search path.
  • Reference your sitemap. Add Sitemap: https://example.com/sitemap.xml so every crawler finds it, not just the ones you submit to manually.

Frequently asked questions

Where do I upload my robots.txt file?

It must live at the root of your domain: https://yourdomain.com/robots.txt — it cannot be in a subdirectory. On most servers that means the public root (public_html, www, or dist). On Cloudflare Pages, Vercel, and Netlify, place it in /public or /static and it is served from the root automatically.

Does Disallow remove a page from Google's index?

No. Disallow blocks crawling — it stops Google fetching the page. But an already-indexed URL (or one found via a sitemap or backlink) can stay in the index. To remove a page from results, use a noindex meta tag or HTTP header, and make sure the page is NOT blocked from crawling so Google can read that tag.

How do I block AI crawlers in robots.txt?

Add a separate User-agent block for each AI bot you want to block — GPTBot (OpenAI), CCBot (Common Crawl), anthropic-ai, Google-Extended, and PerplexityBot are the common ones — with Disallow: / under each. The generator’s "Block AI Bots" preset writes these rules for you.

Can I have more than one User-agent block?

Yes. You can stack multiple User-agent lines before a shared set of rules, or write separate blocks per bot. Googlebot, Bingbot, and GPTBot each follow their own block; use * as a catch-all for bots not explicitly listed.

How do I check that my robots.txt is working?

Google Search Console has a robots.txt report under Settings. You can also just visit https://yourdomain.com/robots.txt in a browser. To verify a specific page is crawlable after editing, use the URL Inspection tool in Search Console.

Get weekly dev tools and tips