HomeDeveloper Toolsrobots.txt Generator

control crawlers. one file.

generate a robots.txt file with custom rules, sitemap URL, and crawl-delay. presets for common configurations.

robots.txt preview
User-agent: *
Allow: /

What this tool does

Build a robots.txtfile by toggling which paths crawlers can and cannot index. Outputs the plain-text file ready to drop in your domain's root.

What robots.txt does and does not do

The robots.txt file at example.com/robots.txt tells web crawlers which paths they are allowed to fetch. It is the polite-request system of the open web — well-behaved crawlers (Googlebot, Bingbot, the major SEO bots) follow it. Malicious or aggressive scrapers ignore it entirely.

That distinction is critical. robots.txt is not a security mechanism. Anything in your Disallow list is fully reachable by anyone who guesses or discovers the URL. Sensitive content needs authentication, not just a robots exclusion.

The basic syntax

text
User-agent: *
Disallow: /admin/
Disallow: /draft/
Allow: /draft/public-preview

Sitemap: https://example.com/sitemap.xml
  • User-agent targets which crawler the rules apply to. * means all crawlers.
  • Disallow lists paths the crawler should not fetch.
  • Allow creates an exception to a broader Disallow.
  • Sitemap points crawlers at your sitemap.xml location. Independent of the allow/disallow logic.

Disallow vs noindex: a critical distinction

robots.txt“Disallow” and HTML <meta name="robots" content="noindex"> sound like they do the same thing. They do not.

Disallow tells the crawler not to fetch the URL. Without fetching it, the crawler cannot read its contents — but the URL might still appear in search results based on links pointing to it, with no preview text and a generic description.

noindex tells the crawler to fetch the page but not include it in the search index. Counterintuitive but important: a page with noindexstays out of search results because the crawler read the directive. A page Disallow'd might still appear with no useful metadata.

For pages you want truly absent from search results, use noindexin the page's HTML head. Reserve robots.txt Disallow for paths whose content should not be fetched at all (private API endpoints, search result pages, infinitely-paginated archives).

Common mistake

Adding Disallow: /admin/ to a production robots.txt does not protect your admin pages. It just announces the path to anyone reading robots.txt — which is a public file. If you actually need to keep search engines out of admin URLs, use auth and let the redirect-to-login do the work.

Per-bot rules

You can target specific crawlers with their own rules:

text
User-agent: *
Disallow: /private/

User-agent: Googlebot
Allow: /private/preview
Disallow: /private/

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

The last two examples are increasingly common in 2026 — blocking AI training crawlers (OpenAI's GPTBot, Common Crawl's CCBot, Anthropic's anthropic-ai, Google's Google-Extended) from ingesting site content into model training corpora. Each AI company that respects robots.txt publishes a crawler name; the list grows monthly.

Where it goes

At the root of your domain: example.com/robots.txt. Not in a subdirectory, not at any other path. Crawlers always check that exact location before crawling anything else on the domain.

What this tool does

Build a robots.txtfile by toggling which paths crawlers can and cannot index. Outputs the plain-text file ready to drop in your domain's root.

What robots.txt does and does not do

The robots.txt file at example.com/robots.txt tells web crawlers which paths they are allowed to fetch. It is the polite-request system of the open web — well-behaved crawlers (Googlebot, Bingbot, the major SEO bots) follow it. Malicious or aggressive scrapers ignore it entirely.

That distinction is critical. robots.txt is not a security mechanism. Anything in your Disallow list is fully reachable by anyone who guesses or discovers the URL. Sensitive content needs authentication, not just a robots exclusion.

The basic syntax

text
User-agent: *
Disallow: /admin/
Disallow: /draft/
Allow: /draft/public-preview

Sitemap: https://example.com/sitemap.xml
  • User-agent targets which crawler the rules apply to. * means all crawlers.
  • Disallow lists paths the crawler should not fetch.
  • Allow creates an exception to a broader Disallow.
  • Sitemap points crawlers at your sitemap.xml location. Independent of the allow/disallow logic.

Disallow vs noindex: a critical distinction

robots.txt“Disallow” and HTML <meta name="robots" content="noindex"> sound like they do the same thing. They do not.

Disallow tells the crawler not to fetch the URL. Without fetching it, the crawler cannot read its contents — but the URL might still appear in search results based on links pointing to it, with no preview text and a generic description.

noindex tells the crawler to fetch the page but not include it in the search index. Counterintuitive but important: a page with noindexstays out of search results because the crawler read the directive. A page Disallow'd might still appear with no useful metadata.

For pages you want truly absent from search results, use noindexin the page's HTML head. Reserve robots.txt Disallow for paths whose content should not be fetched at all (private API endpoints, search result pages, infinitely-paginated archives).

Common mistake

Adding Disallow: /admin/ to a production robots.txt does not protect your admin pages. It just announces the path to anyone reading robots.txt — which is a public file. If you actually need to keep search engines out of admin URLs, use auth and let the redirect-to-login do the work.

Per-bot rules

You can target specific crawlers with their own rules:

text
User-agent: *
Disallow: /private/

User-agent: Googlebot
Allow: /private/preview
Disallow: /private/

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

The last two examples are increasingly common in 2026 — blocking AI training crawlers (OpenAI's GPTBot, Common Crawl's CCBot, Anthropic's anthropic-ai, Google's Google-Extended) from ingesting site content into model training corpora. Each AI company that respects robots.txt publishes a crawler name; the list grows monthly.

Where it goes

At the root of your domain: example.com/robots.txt. Not in a subdirectory, not at any other path. Crawlers always check that exact location before crawling anything else on the domain.

more free tools

PDF utilities, image tools, developer helpers — all free, no signup.

Something wrong?