Blog SEO: Get Your Blog out of the Supplemental Index

Little blog SEO tip today, check to see if, like my blog, you have a “supplemental index problem“.

Check out my Google search result, as you can see I have about 106 pages in the supplemental index. Change chrisg.com to your own domain to test your own result.

So what is this “Supplemental Index”?

I like the answers from Tropical SEO best:

  • The Google Supplemental index is the Siberian work camp for web pages.
  • The Google Supplemental index is where they put web pages with little trust.
  • The Google Supplemental index is where they put web pages that aren’t going to rank for anything important.

Essentially Google throws your pages into the supplemental index when it is not sure what to do with it but doesn’t want to throw it away.

Why am I in “Supplemental Index” hell?

In a nutshell, it’s that old SEO monster, “duplicate content“. On my blog the internal linkage and archives were confusing Googlebot by throwing up the same content over and over. Graywolf did a wonderful video on this.

How do I get my blog out of “Supplemental Index”?

The standard answer seems to be to use Robots.txt to stop Google indexing junk pages. I didn’t want to add a robots.txt so looked for a plugin and came across a recommendation from Ogletree. He hacked a Wordpress plugin that seemed ideal, but then in the comments I saw he provided some template code. Just what I was looking for. With a little tweak so it output a comment to explain what was going on, here is what I added to my header template just above the Title tag:


<?php
if((is_single() || is_category() || is_page() || is_home()) && (!is_paged())){
echo "<!-- ok google, index me! -->";
}else{
echo "<!-- google, please ignore - thanks! -->";
echo "<meta name="\"robots\"" content="\"noindex,follow\"">\n";
}
?>

What this does is outputs a special instruction to search engines to tell them to ignore a page if it is not the homepage, single article, static page, category, etc. My main problem was the date archives so hopefully this will sort it, we shall see!


Tags: , , , , , ,

Table of contents for Blog SEO

  1. Blog SEO: Get Your Blog out of the Supplemental Index
  2. Blog SEO: Boost Your Search Rankings With Internal Links
If you found this article useful, bookmark it at Del.icio.us for future reference
Subscribe now with RSS, daily emails or weekly emails to receive more tips, new media news and a FREE ebook!
Business Blogging Ebook

45 Comments so far - add yours now

  1. Ivan Brezak Brkan 

    What pages did Googlebot exactly see as junk pages and basically how can I implement this for example in Expression Engine. Do I need to use robots.txt (and if so, how?). Thanks in advance Chris, this is a very interesting post!

  2. You can see the pages Google didn’t like in the linked query above. I don’t know what you can do with expression engine, I guess though robots.txt would do it but someone else would have to guide you as I wimped out from using it myself ;)

  3. Ivan Brezak Brkan 

    Heh, don’t worry. I read the article on TropicalSEO and basically what I have to add to my “watchlist” is unique meta data for each and every page! :) Now if only I could get Googlebot to spider my blog at night (CET time)…

  4. Heh good luck! :)

  5. Ashwin 

    Another way to complicate this simple thing is to add a robot.txt file which I did. I’m not sure whether it worked as my blog traffic was not stable. I’m gonna try this thing and remove the robot.txt shit.

  6. Well it seems the pro seo technique is still robots.txt but for me it was just one more thing I didn’t want to have to learn

  7. Eran 

    there is a great post on this problem at:
    http://www.seomoz.org/blog/how-i-escaped-googles-supplemental-hell

    maybe it will help…

  8. BSN 

    Any suggestions for Blogger?

    Of ~1250 pages indexed, ~1140 are in the supplemental. It looks like it’s mostly the pages for individual posts. I’m reading that Categories may help, so I’m wondering if going back to archived posts and adding labels is worth the effort? Any other suggestions?

  9. I wouldn’t think adding categories would help, make sure they have unique titles, and you are not showing the same posts in several archive/category/date pages

  10. My main issue seems to be a common one — printer friendly pages. I’ve gotten positive feedback from readers, so I’m keeping them, but will be adding a line to robots.txt to ignore anything in /print/.

  11. Yeah that was why I have avoided the printer friendly thing although I have been informed how to do it with CSS I just never got around to doing it :S

  12. Chris, Thats really a good one. I was looking for this kind of stuff as the date archive was getting indexed too. Can you tell what about tag thing. Like I am using UTW , what will be the syntax for that is_tag?

    awaiting for your reply

  13. Hmm, could be but I haven’t used that plugin myself … I am sure the developer or somebody else could help you out?

  14. mel 

    All my results appear to be my comments. Is this a problem I should look at fixing or does it not matter too much? I don’t think I want folks finding comments when searching in google anyway – it’s my content not my comments I’d be worried about? Or am I missing the point?

  15. I think it is actually your comment feed that is being indexed mel?

  16. mark 

    Modifying the robots.txt file is real easy guys. Here’s a sample:

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /logs/
    Disallow: /wp-admin/
    Disallow: /wp-includes/

    This tells all bots to avoid these directories.

    This article might also help some people as it shows how to (in WordPress) create unique title and meta tags – amongst other things.

    I also think 302 and possibly 301 redirects may temporarily get you tossed into the supplemental index…

  17. Cool thanks Mark

  18. Creating unique title tags for each and every page that you do want indexed is another way to avoid the supplemental index.

    If you’re using Wordpress, install the SEO Title Tag plugin – http://www.netconcepts.com/seo-title-tag-plugin/

    Do keyword research for your blog and put as many different relevant terms as possible. If you have a recurring column, such as recaps or roundups, then put the date in the title tag to make it unique. Here’s a free keyword research tool – http://www.keyworddiscovery.com/search.html – this tool can help give you ideas for posts as well when you see what people are searching for.

  19. mark 

    @Ashisha: That’s a good call, I too use the UTW plugin.

    Add this to your robots.txt file:

    User-agent: *
    Disallow: /tag/

  20. Chris – I was the same way with getting around to a print stylesheet, and got lazy. I’m using the wp-print plugin, with some custom formatting, and it works great. Since it uses a re-write to make the print version a directory path I can block it with:

    User-agent: *
    Disallow: /print/

  21. Colbs 

    Man! I have 10,400 in the supplemental index from my main site. I dont even use wordpress! Gonna have to get that fixed. Thanks for the info!!

  22. @Mark: Need one favor, My blog is in subdiectory, Can u help me out in making the robots.txt file. I can email you what i have made.

  23. mark 

    @Ashish: sure, goto my blog and use the contact form. I’ll see what I can do.

  24. @MArk: Thanks I will do that.

    One more thing, why not just index the single page and ignore rest of them. Everything other tahn single page is duplicate right?

  25. It looks like a great solution I’ve Just implemented it

    It seems that there is variation of this solutions that doesn’t uses the WordPress API and so, it is portable to all CMSs.

    Apply the ‘nofollow’ property on links to archives and other pages with duplicate content.

  26. mark 

    @Ashisha, I don’t think that’s a good idea, you’re using a shotgun approach with that when you should use a sniper rifle and be specific on your target when disallowing.

  27. I checked this out and my site only has 2 pages indexed on Google, the home page and the feed. How can this be possible? I’ve been confused why my PR has been 1 for so long. I’ve been writing since last November and have around 100 posts. How can I get google to index the rest of the pages??

  28. @Justin – You might want to try getting google to index using a sitemap?
    http://www.google.com/webmasters/sitemaps/

  29. Brian 

    One thing I haven’t seen mentioned here is that Google doesn’t index everyone just because “you” want them to. They have like a billion pages to worry about storing on their computers and they have to prioritize everything. They look at your PR rank and the higher you are the more content they’ll put in their primary search results. If your are a PR1 then it doesn’t matter how you clean up your site because they still aren’t going to index more than a few pages. As your PR ranking goes up then they will put more of your pages in their primary ranking. Of course, cleaning up your site, adding Robots.txt, eliminating duplicate content all helps improve your site quality. But if you have a site with lots of pages, then you need a high PR to get Google to rank them all.

  30. bm 

    This is real fun. When Google’s engineer tells you “the main determinant of whether a url is in our main web index or in the supplemental index is PageRank” (http://www.mattcutts.com/blog/infrastructure-status-january-2007/ – 4th paragraph) – there are still people believing in fairies and “old SEO monsters of duplicate content”.

  31. Rob O. 

    2Dolphins has
    http://www.2dolphins.com“>347 supplemental results hits on Google. I wonder if this is because, with a Blogger-based blog, the meta data is the same on every page? Each page does have a unique and (fairly) descriptive title and I am using Blogger labels on most posts…

  32. c_v 

    Chris,

    I find it very kind of you not only to share your tip, but to answer comments as well.

    I do not know you, but wish you the best and hope you have a wonderful life.

    c_v

  33. When I used that modified plugin and set up my robots.txt my site fell a little bit in the rankings for a few days but then came back with even better rankings. I now only have a few pages in the supp index. They are very short posts that I made. It is very important to have more words on your pages. It really helps to have at least 300-500 words of unique content per page. It also helps to have a unique title and description on each page. That along with more links and links from authority sites is what will get you to rank in Google.

    Here are the two blog posts the guy was talking about.

    http://www.ogletreeseo.com/157.html

    http://www.ogletreeseo.com/146.html

  34. cyntax 

    Has anyone used the URL removal tool at http://www.google.com/webmasters/sitemaps/ ?

    I tried to remove my comments and feeds whcih are supplemental results right now, but no luck so far. I’m adding a robots.txt with

    User-agent: *
    Disallow: /wordpress/comments/
    Disallow: /wordpress/feed/

  35. trk 

    That is clever!
    Just want to add that, if like me, you are going to do a copy and paste, take care of the formatting (the quotes).

  36. cyntax 

    also, if you cut and paste the code above, you will get a syntax error because ofthe slanted quotes try this:

    “;
    }else{
    echo “”;
    echo “\n”;
    }
    ?>

  37. There is some great information in your post as well as the comments. I’ve already added a meta and differentiating title to my pages and I’ll probably look into the robots.txt as some of your users have suggested.

  38. mark 

    @Guilherme Zuhlke O’Connor: “Apply the ‘nofollow’ property on links to archives and other pages with duplicate content.”

    That’s not necessary. Archives are nothing more than a different way to get to the same spot or post, that’s not duplicate content.

  39. @cyntax – unfortunately Wordpress just does this, anyone know how I can stop the reformatting happening? :o

  40. It seems all the tags page are considered as good ones and the single post page are considered as supplemental :( Only now I understood why I am getting very less hits from Google as I am currently getting very high number of hits from Yahoo and MSN when compared with Google. Any idea, how to make the single posts back to good posts from supplemental?

  41. All I can suggest is follow the advice above and in the comments, but the main thing would be to try to get good links to the single pages

  42. Another thing that could potentially cause this is your wordpress feed being indexed. For some reason google occasionally does this. You’re best of blocking it in robots.txt

  43. D4n 

    The problem with Blogger seems to be the archive pages. If you have the widget to browse by Year / Month and so on, under this you have the item pages, which are fine. They have their own URL’s and won’t be duplicating any content, providing you don’t have every article on your main page. But, if you click on e.g. a Month name, you’ll see all articles for that month on one page … hence the duplication.

    So the only way around this is to add a meta to the page header for archive pages to get Google to ignore them. Thankfully Blogger allows you to identify archive pages in the template, so it should just be a question of adding the following code somewhere in the section of your template:

    Will have to see if that helps at all the next time Google comes around.

    Very useful article … thanks!

  44. Nathan 

    Chris, Ever since you published this article I’ve been trying to figure out why my site had all its pages listed as supplemental. I’ve had a very robost robots.txt file installed for over a month. The only bad thing was I wasn’t using excerpts on my categories. Then today I ran across 2 articles of interest that really explain what might be going on. I’d be curious to see what you think about these:

    http://www.searchengineguide.com/wallace/2005/0209_dw1.html

    and

    http://www.mattcutts.com/blog/google-hell/

    - Nathan

  45. Jas 

    My site went from tens of thousands of main index to two in matter of what seems weeks. Supp hell for sure. I’m guessing my server migration ended up on a bad neighborhood IP. I’m going to contact google and see what’s up.

Due to the quantity of spam comments this blog attracts, comments close on new posts soon after publishing.

Authority Blogger

Free Gifts

Receive more free blogging, writing and marketing tips to your email inbox or feed reader, plus a free eBook download.

Feed Count Subscribe now with RSS or
get articles by Email

Subscribing is free, and your email will never be shared

About Chris Garrett

Chris Garrett is a blogging and internet marketing consultant. This blog is here to help you make the most out of the web.

Follow me on twitter Read more about Chris and this blog.

View Chris Garrett's profile on LinkedIn FaceBook

Search this site

  • Popular Articles

  • Recommends

  • Categories

      Archives

      Hosting By