Learn

Robots.txt Best Practices: Control Crawling Without Blocking SEO

Robots.txt controls crawling, not indexing. Learn how to use it without accidentally hiding important pages from search engines.

Check your site before you start fixing

Run a fresh DomainLens audit and use the report as your priority list.

Run a free SEO audit

What robots.txt actually does

Robots.txt is a crawl-control file. It tells well-behaved crawlers which paths they are allowed to request, but it does not remove URLs from the search index by itself.

That distinction matters. A URL blocked in robots.txt can still be discovered through links and appear in search with limited information. Use noindex or redirects when the goal is index removal.

Safe rules to follow

Keep the file available at /robots.txt and return a 200 response.
Declare your XML sitemap so crawlers can discover canonical URLs faster.
Block crawl traps, internal search pages, and endless parameter combinations.
Do not block CSS, JavaScript, images, or rendered resources Google needs to evaluate the page.

Common mistakes

Leaving Disallow: / from staging after launch.
Blocking pages that should pass canonical or noindex signals.
Assuming robots.txt is a security feature for private URLs.
Testing only the homepage and missing blocked templates deeper in the site.

How to validate changes

After editing robots.txt, run a fresh DomainLens audit and inspect important URLs in Google Search Console. Confirm that the rendered page, canonical target, sitemap, and robots rules agree.

For large sites, review server logs after deployment. A clean robots.txt file should reduce wasted crawling without hiding pages that need to rank.

Robots.txt Best Practices: Control Crawling Without Blocking SEO

What robots.txt actually does

Safe rules to follow

Common mistakes

How to validate changes

Related resources

Missing Meta Description: Why It Matters and How to Fix It

Canonical Tag Issues: Common Mistakes and Fixes

Noindex vs Robots.txt: What’s the Difference?

Redirect Chains and SEO: Why They Hurt Crawlability