Where do I put the robots.txt file?

Save it as 'robots.txt' in the root directory of your website (e.g., https://example.com/robots.txt).

Does robots.txt block pages from Google?

It tells crawlers not to access certain paths, but it does not remove pages from search results. For removal, use the noindex meta tag.

Can I block AI crawlers?

Yes. The tool includes a preset that blocks common AI training crawlers like GPTBot, CCBot, Google-Extended, and others.

What happens if I do not have a robots.txt?

Without a robots.txt, all crawlers assume they are allowed to access all public URLs on your site.

control crawlers. one file.

generate a robots.txt file with custom rules, sitemap URL, and crawl-delay. presets for common configurations.

Rules

Sitemap URL

Crawl Delay (seconds, optional)

robots.txt preview

User-agent: *
Allow: /

What this tool does

Build a robots.txtfile by toggling which paths crawlers can and cannot index. Outputs the plain-text file ready to drop in your domain's root.

What robots.txt does and does not do

The robots.txt file at example.com/robots.txt tells web crawlers which paths they are allowed to fetch. It is the polite-request system of the open web — well-behaved crawlers (Googlebot, Bingbot, the major SEO bots) follow it. Malicious or aggressive scrapers ignore it entirely.

That distinction is critical. robots.txt is not a security mechanism. Anything in your Disallow list is fully reachable by anyone who guesses or discovers the URL. Sensitive content needs authentication, not just a robots exclusion.

The basic syntax

text

User-agent: *
Disallow: /admin/
Disallow: /draft/
Allow: /draft/public-preview

Sitemap: https://example.com/sitemap.xml

User-agent targets which crawler the rules apply to. * means all crawlers.
Disallow lists paths the crawler should not fetch.
Allow creates an exception to a broader Disallow.
Sitemap points crawlers at your sitemap.xml location. Independent of the allow/disallow logic.

Disallow vs noindex: a critical distinction

robots.txt“Disallow” and HTML <meta name="robots" content="noindex"> sound like they do the same thing. They do not.

Disallow tells the crawler not to fetch the URL. Without fetching it, the crawler cannot read its contents — but the URL might still appear in search results based on links pointing to it, with no preview text and a generic description.

noindex tells the crawler to fetch the page but not include it in the search index. Counterintuitive but important: a page with noindexstays out of search results because the crawler read the directive. A page Disallow'd might still appear with no useful metadata.

For pages you want truly absent from search results, use noindexin the page's HTML head. Reserve robots.txt Disallow for paths whose content should not be fetched at all (private API endpoints, search result pages, infinitely-paginated archives).

Common mistake

Adding Disallow: /admin/ to a production robots.txt does not protect your admin pages. It just announces the path to anyone reading robots.txt — which is a public file. If you actually need to keep search engines out of admin URLs, use auth and let the redirect-to-login do the work.

Per-bot rules

You can target specific crawlers with their own rules:

text

User-agent: *
Disallow: /private/

User-agent: Googlebot
Allow: /private/preview
Disallow: /private/

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

The last two examples are increasingly common in 2026 — blocking AI training crawlers (OpenAI's GPTBot, Common Crawl's CCBot, Anthropic's anthropic-ai, Google's Google-Extended) from ingesting site content into model training corpora. Each AI company that respects robots.txt publishes a crawler name; the list grows monthly.

Where it goes

At the root of your domain: example.com/robots.txt. Not in a subdirectory, not at any other path. Crawlers always check that exact location before crawling anything else on the domain.

What this tool does

Build a robots.txtfile by toggling which paths crawlers can and cannot index. Outputs the plain-text file ready to drop in your domain's root.

What robots.txt does and does not do

The robots.txt file at example.com/robots.txt tells web crawlers which paths they are allowed to fetch. It is the polite-request system of the open web — well-behaved crawlers (Googlebot, Bingbot, the major SEO bots) follow it. Malicious or aggressive scrapers ignore it entirely.

That distinction is critical. robots.txt is not a security mechanism. Anything in your Disallow list is fully reachable by anyone who guesses or discovers the URL. Sensitive content needs authentication, not just a robots exclusion.

The basic syntax

text

User-agent: *
Disallow: /admin/
Disallow: /draft/
Allow: /draft/public-preview

Sitemap: https://example.com/sitemap.xml

User-agent targets which crawler the rules apply to. * means all crawlers.
Disallow lists paths the crawler should not fetch.
Allow creates an exception to a broader Disallow.
Sitemap points crawlers at your sitemap.xml location. Independent of the allow/disallow logic.

Disallow vs noindex: a critical distinction

robots.txt“Disallow” and HTML <meta name="robots" content="noindex"> sound like they do the same thing. They do not.

Disallow tells the crawler not to fetch the URL. Without fetching it, the crawler cannot read its contents — but the URL might still appear in search results based on links pointing to it, with no preview text and a generic description.

noindex tells the crawler to fetch the page but not include it in the search index. Counterintuitive but important: a page with noindexstays out of search results because the crawler read the directive. A page Disallow'd might still appear with no useful metadata.

For pages you want truly absent from search results, use noindexin the page's HTML head. Reserve robots.txt Disallow for paths whose content should not be fetched at all (private API endpoints, search result pages, infinitely-paginated archives).

Common mistake

Adding Disallow: /admin/ to a production robots.txt does not protect your admin pages. It just announces the path to anyone reading robots.txt — which is a public file. If you actually need to keep search engines out of admin URLs, use auth and let the redirect-to-login do the work.

Per-bot rules

You can target specific crawlers with their own rules:

text

User-agent: *
Disallow: /private/

User-agent: Googlebot
Allow: /private/preview
Disallow: /private/

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

The last two examples are increasingly common in 2026 — blocking AI training crawlers (OpenAI's GPTBot, Common Crawl's CCBot, Anthropic's anthropic-ai, Google's Google-Extended) from ingesting site content into model training corpora. Each AI company that respects robots.txt publishes a crawler name; the list grows monthly.

Where it goes

At the root of your domain: example.com/robots.txt. Not in a subdirectory, not at any other path. Crawlers always check that exact location before crawling anything else on the domain.