Technical SEO

Configure Robots.txt Properly — Step-by-Step Guide

Ensure robots.txt doesn't block important pages. Allow search engine bots and AI retrieval bots access to key content.

Medium Critical Impact 20 min Online Local Hybrid
Pro Tip

Use the robots.txt tester in Search Console (Settings > robots.txt) to verify specific URLs aren't blocked before deploying changes. A single misplaced wildcard can deindex your entire site.

Warning

robots.txt does NOT remove pages from Google's index. It only prevents crawling. If a page is already indexed and you block it in robots.txt, it may stay indexed without content. Use noindex meta tag instead for deindexing.

Step-by-Step Guide

1

Review current robots.txt at yoursite.com/robots.txt

Type your domain followed by /robots.txt in your browser (e.g., example.com/robots.txt). If you see a blank page or 404, you don't have one yet — create a robots.txt file in your site's root directory.

2

Verify important pages aren't accidentally blocked

Check that your homepage, key service pages, blog posts, and sitemap are all accessible. Common mistake: blocking /wp-admin/ too broadly and accidentally blocking /wp-admin/admin-ajax.php (needed for WordPress AJAX).

3

Use the AI prompt to generate an optimized version

Copy the AI prompt from this task, fill in your site details, and generate an optimized robots.txt. Make sure to allow AI retrieval bots (ChatGPT-User, PerplexityBot, ClaudeBot) while optionally blocking AI training bots (GPTBot, Google-Extended).

4

Test with Google's robots.txt tester in Search Console

Go to Search Console > Settings > robots.txt. Enter specific URLs to test whether they're blocked or allowed. Test your key pages, sitemap, CSS, and JS files.

5

Deploy and monitor crawl stats

Upload the new robots.txt to your site root. In Search Console, go to Settings > Crawl stats to monitor how Google is crawling your site after the change. Watch for any unexpected drops in crawl activity.

Video Tutorial

AI Prompt

Generate an optimized robots.txt file for my [WEBSITE TYPE] website.

My website structure:
- CMS: [WordPress/Shopify/Next.js/etc.]
- Important directories to index: [LIST]
- Directories to block: [admin panels, staging, cart/checkout, internal search results]
- Sitemap location: [URL]

Requirements:
1. Allow Googlebot, Bingbot, and AI retrieval bots (ChatGPT-User, PerplexityBot, ClaudeBot)
2. Block AI training bots if desired
3. Block low-value pages (search results, tag archives, author archives)
4. Include sitemap reference
5. Add helpful comments explaining each rule

Tools & Resources

Google Robots.txt TesterRobots.txt Validator — Merkle

Learn More

Robots.txt Guide — AhrefsarticleRobots.txt Specifications — Googleofficial

Do this task in the interactive tool

Track your progress and get guided through every step.

Open Interactive Tool

More in Technical SEO

Enforce HTTPS Sitewide

Easy30 min

Optimize Core Web Vitals

Hard2-5 hrs

Submit & Optimize XML Sitemap

Easy20 min

Implement Canonical Tags

Medium30 min

Fix Broken Links & Redirect Chains

Medium1-2 hrs

Mobile-First Optimization

Medium1-3 hrs

Optimize Page Speed

Hard2-4 hrs

Manage Crawl Budget

Hard1-2 hrs

JavaScript SEO & Rendering

Hard2-4 hrs

Implement Structured Data (Schema)

Medium1-2 hrs

Set Up Hreflang (International Sites)

Hard1-2 hrs

Clean URL Architecture

Medium30 min

Security Monitoring & Headers

Medium30 min