Blog SEO: Get Your Blog out of the Supplemental Index

Little blog SEO tip today, check to see if, like my blog, you have a “supplemental index problem“.

Check out my Google search result, as you can see I have about 106 pages in the supplemental index. Change chrisg.com to your own domain to test your own result.

So what is this “Supplemental Index”?

I like the answers from Tropical SEO best:

  • The Google Supplemental index is the Siberian work camp for web pages.
  • The Google Supplemental index is where they put web pages with little trust.
  • The Google Supplemental index is where they put web pages that aren’t going to rank for anything important.

Essentially Google throws your pages into the supplemental index when it is not sure what to do with it but doesn’t want to throw it away.

Why am I in “Supplemental Index” hell?

In a nutshell, it’s that old SEO monster, “duplicate content“. On my blog the internal linkage and archives were confusing Googlebot by throwing up the same content over and over. Graywolf did a wonderful video on this.

How do I get my blog out of “Supplemental Index”?

The standard answer seems to be to use Robots.txt to stop Google indexing junk pages. I didn’t want to add a robots.txt so looked for a plugin and came across a recommendation from Ogletree. He hacked a Wordpress plugin that seemed ideal, but then in the comments I saw he provided some template code. Just what I was looking for. With a little tweak so it output a comment to explain what was going on, here is what I added to my header template just above the Title tag:


<?php
if((is_single() || is_category() || is_page() || is_home()) && (!is_paged())){
echo “<!– ok google, index me! –>”;
}else{
echo “<!– google, please ignore - thanks! –>”;
echo “<meta name=”\”robots\”" content=”\”noindex,follow\”">\n”;
}
?>

What this does is outputs a special instruction to search engines to tell them to ignore a page if it is not the homepage, single article, static page, category, etc. My main problem was the date archives so hopefully this will sort it, we shall see!


Tags: , , , , , ,

Please bookmark or vote!: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit
  • Mixx
  • Propeller
  • Sphinn
  • StumbleUpon
  • TwitThis
If you found this article useful, bookmark it at Del.icio.us for future reference
Articles you might also like:


Subscribe now with RSS, daily emails or weekly emails to receive more tips, new media news and a FREE ebook!

45 Comments so far

  1. Ivan Brezak Brkan April 26th, 2007 8:41 am

    What pages did Googlebot exactly see as junk pages and basically how can I implement this for example in Expression Engine. Do I need to use robots.txt (and if so, how?). Thanks in advance Chris, this is a very interesting post!

  2. Chris Garrett April 26th, 2007 8:44 am

    You can see the pages Google didn’t like in the linked query above. I don’t know what you can do with expression engine, I guess though robots.txt would do it but someone else would have to guide you as I wimped out from using it myself ;)

  3. Ivan Brezak Brkan April 26th, 2007 8:49 am

    Heh, don’t worry. I read the article on TropicalSEO and basically what I have to add to my “watchlist” is unique meta data for each and every page! :) Now if only I could get Googlebot to spider my blog at night (CET time)…

  4. Chris Garrett April 26th, 2007 8:54 am

    Heh good luck! :)

  5. Ashwin April 26th, 2007 11:04 am

    Another way to complicate this simple thing is to add a robot.txt file which I did. I’m not sure whether it worked as my blog traffic was not stable. I’m gonna try this thing and remove the robot.txt shit.

  6. Chris Garrett April 26th, 2007 11:14 am

    Well it seems the pro seo technique is still robots.txt but for me it was just one more thing I didn’t want to have to learn

  7. Eran April 26th, 2007 12:03 pm

    there is a great post on this problem at:
    http://www.seomoz.org/blog/how-i-escaped-googles-supplemental-hell

    maybe it will help…

  8. BSN April 26th, 2007 12:56 pm

    Any suggestions for Blogger?

    Of ~1250 pages indexed, ~1140 are in the supplemental. It looks like it’s mostly the pages for individual posts. I’m reading that Categories may help, so I’m wondering if going back to archived posts and adding labels is worth the effort? Any other suggestions?

  9. Chris Garrett April 26th, 2007 1:00 pm

    I wouldn’t think adding categories would help, make sure they have unique titles, and you are not showing the same posts in several archive/category/date pages

  10. Tony D. Clark April 26th, 2007 1:06 pm

    My main issue seems to be a common one — printer friendly pages. I’ve gotten positive feedback from readers, so I’m keeping them, but will be adding a line to robots.txt to ignore anything in /print/.

  11. Chris Garrett April 26th, 2007 1:08 pm

    Yeah that was why I have avoided the printer friendly thing although I have been informed how to do it with CSS I just never got around to doing it :S

  12. Ashish Mohta April 26th, 2007 1:31 pm

    Chris, Thats really a good one. I was looking for this kind of stuff as the date archive was getting indexed too. Can you tell what about tag thing. Like I am using UTW , what will be the syntax for that is_tag?

    awaiting for your reply

  13. Chris Garrett April 26th, 2007 1:38 pm

    Hmm, could be but I haven’t used that plugin myself … I am sure the developer or somebody else could help you out?

  14. mel April 26th, 2007 1:53 pm

    All my results appear to be my comments. Is this a problem I should look at fixing or does it not matter too much? I don’t think I want folks finding comments when searching in google anyway - it’s my content not my comments I’d be worried about? Or am I missing the point?

  15. Chris Garrett April 26th, 2007 1:55 pm

    I think it is actually your comment feed that is being indexed mel?

  16. mark April 26th, 2007 2:45 pm

    Modifying the robots.txt file is real easy guys. Here’s a sample:

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /logs/
    Disallow: /wp-admin/
    Disallow: /wp-includes/

    This tells all bots to avoid these directories.

    This article might also help some people as it shows how to (in WordPress) create unique title and meta tags - amongst other things.

    I also think 302 and possibly 301 redirects may temporarily get you tossed into the supplemental index…

  17. Chris Garrett April 26th, 2007 2:48 pm

    Cool thanks Mark

  18. Nathania Johnson April 26th, 2007 3:20 pm

    Creating unique title tags for each and every page that you do want indexed is another way to avoid the supplemental index.

    If you’re using Wordpress, install the SEO Title Tag plugin - http://www.netconcepts.com/seo-title-tag-plugin/

    Do keyword research for your blog and put as many different relevant terms as possible. If you have a recurring column, such as recaps or roundups, then put the date in the title tag to make it unique. Here’s a free keyword research tool - http://www.keyworddiscovery.com/search.html - this tool can help give you ideas for posts as well when you see what people are searching for.

  19. mark April 26th, 2007 3:30 pm

    @Ashisha: That’s a good call, I too use the UTW plugin.

    Add this to your robots.txt file:

    User-agent: *
    Disallow: /tag/

  20. Tony D. Clark April 26th, 2007 3:59 pm

    Chris - I was the same way with getting around to a print stylesheet, and got lazy. I’m using the wp-print plugin, with some custom formatting, and it works great. Since it uses a re-write to make the print version a directory path I can block it with:

    User-agent: *
    Disallow: /print/

  21. Colbs April 26th, 2007 4:21 pm

    Man! I have 10,400 in the supplemental index from my main site. I dont even use wordpress! Gonna have to get that fixed. Thanks for the info!!

  22. Ashish Mohta April 26th, 2007 4:42 pm

    @Mark: Need one favor, My blog is in subdiectory, Can u help me out in making the robots.txt file. I can email you what i have made.

  23. mark April 26th, 2007 5:24 pm

    @Ashish: sure, goto my blog and use the contact form. I’ll see what I can do.

  24. Ashish Mohta April 26th, 2007 5:32 pm

    @MArk: Thanks I will do that.

    One more thing, why not just index the single page and ignore rest of them. Everything other tahn single page is duplicate right?

  25. Guilherme Zuhlke O'Connor April 26th, 2007 6:02 pm

    It looks like a great solution I’ve Just implemented it

    It seems that there is variation of this solutions that doesn’t uses the WordPress API and so, it is portable to all CMSs.

    Apply the ‘nofollow’ property on links to archives and other pages with duplicate content.

  26. mark April 26th, 2007 6:10 pm

    @Ashisha, I don’t think that’s a good idea, you’re using a shotgun approach with that when you should use a sniper rifle and be specific on your target when disallowing.

  27. Justin Consuegra April 26th, 2007 8:38 pm

    I checked this out and my site only has 2 pages indexed on Google, the home page and the feed. How can this be possible? I’ve been confused why my PR has been 1 for so long. I’ve been writing since last November and have around 100 posts. How can I get google to index the rest of the pages??

  28. Chris Garrett April 26th, 2007 10:07 pm

    @Justin - You might want to try getting google to index using a sitemap?
    http://www.google.com/webmasters/sitemaps/

  29. Brian April 27th, 2007 12:09 am

    One thing I haven’t seen mentioned here is that Google doesn’t index everyone just because “you” want them to. They have like a billion pages to worry about storing on their computers and they have to prioritize everything. They look at your PR rank and the higher you are the more content they’ll put in their primary search results. If your are a PR1 then it doesn’t matter how you clean up your site because they still aren’t going to index more than a few pages. As your PR ranking goes up then they will put more of your pages in their primary ranking. Of course, cleaning up your site, adding Robots.txt, eliminating duplicate content all helps improve your site quality. But if you have a site with lots of pages, then you need a high PR to get Google to rank them all.

  30. bm April 27th, 2007 1:11 am

    This is real fun. When Google’s engineer tells you “the main determinant of whether a url is in our main web index or in the supplemental index is PageRank” (http://www.mattcutts.com/blog/infrastructure-status-january-2007/ - 4th paragraph) - there are still people believing in fairies and “old SEO monsters of duplicate content”.

  31. Rob O. April 27th, 2007 1:33 am

    2Dolphins has
    www.2dolphins.com“>347 supplemental results hits on Google. I wonder if this is because, with a Blogger-based blog, the meta data is the same on every page? Each page does have a unique and (fairly) descriptive title and I am using Blogger labels on most posts…

  32. c_v April 27th, 2007 2:42 am

    Chris,

    I find it very kind of you not only to share your tip, but to answer comments as well.

    I do not know you, but wish you the best and hope you have a wonderful life.

    c_v

  33. ogletree April 27th, 2007 3:08 am

    When I used that modified plugin and set up my robots.txt my site fell a little bit in the rankings for a few days but then came back with even better rankings. I now only have a few pages in the supp index. They are very short posts that I made. It is very important to have more words on your pages. It really helps to have at least 300-500 words of unique content per page. It also helps to have a unique title and description on each page. That along with more links and links from authority sites is what will get you to rank in Google.

    Here are the two blog posts the guy was talking about.

    http://www.ogletreeseo.com/157.html

    http://www.ogletreeseo.com/146.html

  34. cyntax April 27th, 2007 3:10 am

    Has anyone used the URL removal tool at http://www.google.com/webmasters/sitemaps/ ?

    I tried to remove my comments and feeds whcih are supplemental results right now, but no luck so far. I’m adding a robots.txt with

    User-agent: *
    Disallow: /wordpress/comments/
    Disallow: /wordpress/feed/

  35. trk April 27th, 2007 3:10 am

    That is clever!
    Just want to add that, if like me, you are going to do a copy and paste, take care of the formatting (the quotes).

  36. cyntax April 27th, 2007 3:16 am

    also, if you cut and paste the code above, you will get a syntax error because ofthe slanted quotes try this:

    “;
    }else{
    echo “”;
    echo “\n”;
    }
    ?>

  37. Jeff L. April 27th, 2007 3:19 am

    There is some great information in your post as well as the comments. I’ve already added a meta and differentiating title to my pages and I’ll probably look into the robots.txt as some of your users have suggested.

  38. mark April 27th, 2007 4:13 am

    @Guilherme Zuhlke O’Connor: “Apply the ‘nofollow’ property on links to archives and other pages with duplicate content.”

    That’s not necessary. Archives are nothing more than a different way to get to the same spot or post, that’s not duplicate content.

  39. Chris Garrett April 27th, 2007 8:31 am

    @cyntax - unfortunately Wordpress just does this, anyone know how I can stop the reformatting happening? :o

  40. Mr.Byte April 28th, 2007 6:30 am

    It seems all the tags page are considered as good ones and the single post page are considered as supplemental :( Only now I understood why I am getting very less hits from Google as I am currently getting very high number of hits from Yahoo and MSN when compared with Google. Any idea, how to make the single posts back to good posts from supplemental?

  41. Chris Garrett April 28th, 2007 10:46 am

    All I can suggest is follow the advice above and in the comments, but the main thing would be to try to get good links to the single pages

  42. Motorcycle Guy April 29th, 2007 1:55 pm

    Another thing that could potentially cause this is your wordpress feed being indexed. For some reason google occasionally does this. You’re best of blocking it in robots.txt

  43. D4n April 29th, 2007 4:49 pm

    The problem with Blogger seems to be the archive pages. If you have the widget to browse by Year / Month and so on, under this you have the item pages, which are fine. They have their own URL’s and won’t be duplicating any content, providing you don’t have every article on your main page. But, if you click on e.g. a Month name, you’ll see all articles for that month on one page … hence the duplication.

    So the only way around this is to add a meta to the page header for archive pages to get Google to ignore them. Thankfully Blogger allows you to identify archive pages in the template, so it should just be a question of adding the following code somewhere in the section of your template:

    Will have to see if that helps at all the next time Google comes around.

    Very useful article … thanks!

  44. Nathan May 3rd, 2007 3:20 pm

    Chris, Ever since you published this article I’ve been trying to figure out why my site had all its pages listed as supplemental. I’ve had a very robost robots.txt file installed for over a month. The only bad thing was I wasn’t using excerpts on my categories. Then today I ran across 2 articles of interest that really explain what might be going on. I’d be curious to see what you think about these:

    http://www.searchengineguide.com/wallace/2005/0209_dw1.html

    and

    http://www.mattcutts.com/blog/google-hell/

    - Nathan

  45. Jas May 4th, 2007 3:29 pm

    My site went from tens of thousands of main index to two in matter of what seems weeks. Supp hell for sure. I’m guessing my server migration ended up on a bad neighborhood IP. I’m going to contact google and see what’s up.

Consulting

Speaking & Events

SOBCon08 SpeakerMeet me at SOBCon08 Chicago May 2-4

Subscribe

Receive more blogging, writing and marketing tips, plus a FREE eBook.

Feed Count Subscribe now with RSS or
Subscribe by Email

About Chris Garrett

Chris Garrett is a blogging and internet marketing consultant. This blog is here to help you make the most out of the web.

Follow me on twitter Read more about Chris and this blog.

Search this site

  • Popular Articles

  • Recommends

  • Categories