WordPress Robots.txt Tips

One way to control which pages of your site are indexed by the different search engines is to include a robots.txt file at the root of your website. This powerful tool can be intimidating to webmasters because if you don’t know what you’re doing, you may block all pages of your site from being indexed. In this post, I will cover some important tips you need to do to optimize your WordPress site. If your site hasn’t got a robots.txt file yet, read on to learn how to create one. If you already have a robots.txt file, read our tips to make sure that it doesn’t contain errors.

1. One suggestion I can offer is to visit the most popular sites in your niche. Do a google search and see which sites come up in the first page of the results. Then visit each site and have a look at their robots.txt files to give you an idea of how they control their site indexing. To see the robots.txt files, simply visit one of the sites and add /robots.txt to the end of the URL.

2. WordPress sites have a tendency to create duplicate content. Your individual posts also show up under categories, archives, and a few other places. Search engines don’t like duplicate content so it’s a good idea to block these directories if you’re using WordPress.

3. When adding your robots.txt file to your site, make sure to place it in the root of your website.For example, you should place it at seobeginer.com/robots.txt, not seobeginer.com/blog/robots.txt. The more of these files you come across, the more comfortable you will become at building your own perfect robots.txt file.

4. Don’t use an “Allow” command in your robots.txt file if it is not necessary. Only mention files and directories that you don’t want to be indexed. All other files will be indexed automatically if they are linked on your site.

This is an example of a good robots.txt file:

User-agent: Googlebot

Disallow: /wp-content/
Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /feed/
Disallow: /archives/
Disallow: /index.php
Disallow: /*?
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: */feed/
Disallow: */trackback/
Disallow: /page/
Disallow: /tag/
Disallow: /category/

User-agent: Googlebot-Image
Disallow: /wp-includes/
Allow: /wp-content/uploads/

User-agent: Mediapartners-Google*
Disallow:

User-agent: ia_archiver
Disallow: /

User-agent: duggmirror
Disallow: /

Sitemap: http://seobeginer.com/sitemap.xml

Comments

  1. SiRu says:

    Quick Question.

    if you Disallow: /wp-content/, then you are blocking the images as well, right?

    Google does index images if you have set correct ALt text :)

    Your thoughts please.

    SiRu

    • Ricky says:

      Hi Siru, look at this part

      User-agent: Googlebot-Image
      Disallow: /wp-includes/

      This code will let Google Image bot index all the files except for the wp-includes folder. You don’t need to add anything else.

  2. Pankaj Gupta says:

    Thanks for this great tutorial. Will Update my robots file.

  3. H. Marahrens says:

    Disallow: /*?

    Any reason, why search query links should be disallowed for googlebot?

    I´m just curious!

  4. Pankaj Gupta says:

    @H. Marahrens,
    It is not necessary to disallow. If you allow then Google will start showing those search results too. But it may cause duplicate content issue.

  5. Rajesh says:

    Thanks for this great tutorial. I have made my robots.txt
    Disallow tag and category is necessary becuse to prevent duplicate content from Google.

  6. Kostas says:

    Thanks for this tutorial, I changed my robots.txt and I will test my new settings

  7. Nyx says:

    I’m curious… if you’ve had a live WordPress Blog for a while, but are just now adding these disallows to your robots txt, will Google eventually DE-index the duplicate content already caused by NOT having this in the robots txt to begin with? TIA for any opinions!

  8. Quick question: This will prevent bots to crawl pages based on Robots.txt file but how about earlier indexed pages? Will they take care of the same?

    • Ricky says:

      Yeah sure Harsh,
      It will prevent bots to crawl disallowed pages you set in the robots.txt file. However, it will take a while to see the changes on SERPs.

  9. Renu Kanchan says:

    Quick Question,
    I have some pages with an extension .html?lid=1, .html?lid=2 and so on. I want to disallow all those pages in robots.txt file. How can I do this?
    Your Thoughts Please!
    Renu

Speak Your Mind

*