Blog SEO: Get Your Blog out of the Supplemental Index

Little blog SEO tip today, check to see if, like my blog, you have a “supplemental index problem“.

Check out my Google search result, as you can see I have about 106 pages in the supplemental index. Change chrisg.com to your own domain to test your own result.

So what is this “Supplemental Index”?

I like the answers from Tropical SEO best:

  • The Google Supplemental index is the Siberian work camp for web pages.
  • The Google Supplemental index is where they put web pages with little trust.
  • The Google Supplemental index is where they put web pages that aren’t going to rank for anything important.

Essentially Google throws your pages into the supplemental index when it is not sure what to do with it but doesn’t want to throw it away.

Why am I in “Supplemental Index” hell?

In a nutshell, it’s that old SEO monster, “duplicate content“. On my blog the internal linkage and archives were confusing Googlebot by throwing up the same content over and over. Graywolf did a wonderful video on this.

How do I get my blog out of “Supplemental Index”?

The standard answer seems to be to use Robots.txt to stop Google indexing junk pages. I didn’t want to add a robots.txt so looked for a plugin and came across a recommendation from Ogletree. He hacked a WordPress plugin that seemed ideal, but then in the comments I saw he provided some template code. Just what I was looking for. With a little tweak so it output a comment to explain what was going on, here is what I added to my header template just above the Title tag:


<?php
if((is_single() || is_category() || is_page() || is_home()) && (!is_paged())){
echo "<!-- ok google, index me! -->";
}else{
echo "<!-- google, please ignore - thanks! -->";
echo "<meta name="\"robots\"" content="\"noindex,follow\"">\n";
}
?>

What this does is outputs a special instruction to search engines to tell them to ignore a page if it is not the homepage, single article, static page, category, etc. My main problem was the date archives so hopefully this will sort it, we shall see!


Tags: , , , , , ,

Table of contents for Blog SEO

  1. Blog SEO: Get Your Blog out of the Supplemental Index
  2. Blog SEO: Boost Your Search Rankings With Internal Links
View Comments to Blog SEO: Get Your Blog out of the Supplemental Index
  1. Ivan Brezak Brkan
    April 26, 2007 | 8:41 am

    What pages did Googlebot exactly see as junk pages and basically how can I implement this for example in Expression Engine. Do I need to use robots.txt (and if so, how?). Thanks in advance Chris, this is a very interesting post!

  2. Chris Garrett
    April 26, 2007 | 8:44 am

    You can see the pages Google didn’t like in the linked query above. I don’t know what you can do with expression engine, I guess though robots.txt would do it but someone else would have to guide you as I wimped out from using it myself ;)

  3. Ivan Brezak Brkan
    April 26, 2007 | 8:49 am

    Heh, don’t worry. I read the article on TropicalSEO and basically what I have to add to my “watchlist” is unique meta data for each and every page! :) Now if only I could get Googlebot to spider my blog at night (CET time)…

  4. Chris Garrett
    April 26, 2007 | 8:54 am

    Heh good luck! :)

  5. Ashwin
    April 26, 2007 | 11:04 am

    Another way to complicate this simple thing is to add a robot.txt file which I did. I’m not sure whether it worked as my blog traffic was not stable. I’m gonna try this thing and remove the robot.txt shit.

  6. Chris Garrett
    April 26, 2007 | 11:14 am

    Well it seems the pro seo technique is still robots.txt but for me it was just one more thing I didn’t want to have to learn

  7. Eran
    April 26, 2007 | 12:03 pm

    there is a great post on this problem at:
    http://www.seomoz.org/blog/how-i-escaped-googles-supplemental-hell

    maybe it will help…

  8. BSN
    April 26, 2007 | 12:56 pm

    Any suggestions for Blogger?

    Of ~1250 pages indexed, ~1140 are in the supplemental. It looks like it’s mostly the pages for individual posts. I’m reading that Categories may help, so I’m wondering if going back to archived posts and adding labels is worth the effort? Any other suggestions?

  9. Chris Garrett
    April 26, 2007 | 1:00 pm

    I wouldn’t think adding categories would help, make sure they have unique titles, and you are not showing the same posts in several archive/category/date pages

  10. Tony D. Clark
    April 26, 2007 | 1:06 pm

    My main issue seems to be a common one — printer friendly pages. I’ve gotten positive feedback from readers, so I’m keeping them, but will be adding a line to robots.txt to ignore anything in /print/.

  11. Chris Garrett
    April 26, 2007 | 1:08 pm

    Yeah that was why I have avoided the printer friendly thing although I have been informed how to do it with CSS I just never got around to doing it :S

  12. Ashish Mohta
    April 26, 2007 | 1:31 pm

    Chris, Thats really a good one. I was looking for this kind of stuff as the date archive was getting indexed too. Can you tell what about tag thing. Like I am using UTW , what will be the syntax for that is_tag?

    awaiting for your reply

  13. Chris Garrett
    April 26, 2007 | 1:38 pm

    Hmm, could be but I haven’t used that plugin myself … I am sure the developer or somebody else could help you out?

  14. mel
    April 26, 2007 | 1:53 pm

    All my results appear to be my comments. Is this a problem I should look at fixing or does it not matter too much? I don’t think I want folks finding comments when searching in google anyway – it’s my content not my comments I’d be worried about? Or am I missing the point?

  15. Chris Garrett
    April 26, 2007 | 1:55 pm

    I think it is actually your comment feed that is being indexed mel?

  16. mark
    April 26, 2007 | 2:45 pm

    Modifying the robots.txt file is real easy guys. Here’s a sample:

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /logs/
    Disallow: /wp-admin/
    Disallow: /wp-includes/

    This tells all bots to avoid these directories.

    This article might also help some people as it shows how to (in WordPress) create unique title and meta tags – amongst other things.

    I also think 302 and possibly 301 redirects may temporarily get you tossed into the supplemental index…

  17. Chris Garrett
    April 26, 2007 | 2:48 pm

    Cool thanks Mark

  18. Nathania Johnson
    April 26, 2007 | 3:20 pm

    Creating unique title tags for each and every page that you do want indexed is another way to avoid the supplemental index.

    If you’re using WordPress, install the SEO Title Tag plugin – http://www.netconcepts.com/seo-title-tag-plugin/

    Do keyword research for your blog and put as many different relevant terms as possible. If you have a recurring column, such as recaps or roundups, then put the date in the title tag to make it unique. Here’s a free keyword research tool – http://www.keyworddiscovery.com/search.html – this tool can help give you ideas for posts as well when you see what people are searching for.

  19. mark
    April 26, 2007 | 3:30 pm

    @Ashisha: That’s a good call, I too use the UTW plugin.

    Add this to your robots.txt file:

    User-agent: *
    Disallow: /tag/

  20. Tony D. Clark
    April 26, 2007 | 3:59 pm

    Chris – I was the same way with getting around to a print stylesheet, and got lazy. I’m using the wp-print plugin, with some custom formatting, and it works great. Since it uses a re-write to make the print version a directory path I can block it with:

    User-agent: *
    Disallow: /print/

  21. Colbs
    April 26, 2007 | 4:21 pm

    Man! I have 10,400 in the supplemental index from my main site. I dont even use wordpress! Gonna have to get that fixed. Thanks for the info!!

  22. Ashish Mohta
    April 26, 2007 | 4:42 pm

    @Mark: Need one favor, My blog is in subdiectory, Can u help me out in making the robots.txt file. I can email you what i have made.

  23. mark
    April 26, 2007 | 5:24 pm

    @Ashish: sure, goto my blog and use the contact form. I’ll see what I can do.

  24. Ashish Mohta
    April 26, 2007 | 5:32 pm

    @MArk: Thanks I will do that.

    One more thing, why not just index the single page and ignore rest of them. Everything other tahn single page is duplicate right?

  25. Guilherme Zuhlke O'Connor
    April 26, 2007 | 6:02 pm

    It looks like a great solution I’ve Just implemented it

    It seems that there is variation of this solutions that doesn’t uses the WordPress API and so, it is portable to all CMSs.

    Apply the ‘nofollow’ property on links to archives and other pages with duplicate content.

  26. mark
    April 26, 2007 | 6:10 pm

    @Ashisha, I don’t think that’s a good idea, you’re using a shotgun approach with that when you should use a sniper rifle and be specific on your target when disallowing.

  27. Justin Consuegra
    April 26, 2007 | 8:38 pm

    I checked this out and my site only has 2 pages indexed on Google, the home page and the feed. How can this be possible? I’ve been confused why my PR has been 1 for so long. I’ve been writing since last November and have around 100 posts. How can I get google to index the rest of the pages??

  28. Chris Garrett
    April 26, 2007 | 10:07 pm

    @Justin – You might want to try getting google to index using a sitemap?
    http://www.google.com/webmasters/sitemaps/

  29. Brian
    April 27, 2007 | 12:09 am

    One thing I haven’t seen mentioned here is that Google doesn’t index everyone just because “you” want them to. They have like a billion pages to worry about storing on their computers and they have to prioritize everything. They look at your PR rank and the higher you are the more content they’ll put in their primary search results. If your are a PR1 then it doesn’t matter how you clean up your site because they still aren’t going to index more than a few pages. As your PR ranking goes up then they will put more of your pages in their primary ranking. Of course, cleaning up your site, adding Robots.txt, eliminating duplicate content all helps improve your site quality. But if you have a site with lots of pages, then you need a high PR to get Google to rank them all.

  30. bm
    April 27, 2007 | 1:11 am

    This is real fun. When Google’s engineer tells you “the main determinant of whether a url is in our main web index or in the supplemental index is PageRank” (http://www.mattcutts.com/blog/infrastructure-status-january-2007/ – 4th paragraph) – there are still people believing in fairies and “old SEO monsters of duplicate content”.

  31. Rob O.
    April 27, 2007 | 1:33 am

    2Dolphins has
    http://www.2dolphins.com“>347 supplemental results hits on Google. I wonder if this is because, with a Blogger-based blog, the meta data is the same on every page? Each page does have a unique and (fairly) descriptive title and I am using Blogger labels on most posts…

  32. c_v
    April 27, 2007 | 2:42 am

    Chris,

    I find it very kind of you not only to share your tip, but to answer comments as well.

    I do not know you, but wish you the best and hope you have a wonderful life.

    c_v

  33. ogletree
    April 27, 2007 | 3:08 am

    When I used that modified plugin and set up my robots.txt my site fell a little bit in the rankings for a few days but then came back with even better rankings. I now only have a few pages in the supp index. They are very short posts that I made. It is very important to have more words on your pages. It really helps to have at least 300-500 words of unique content per page. It also helps to have a unique title and description on each page. That along with more links and links from authority sites is what will get you to rank in Google.

    Here are the two blog posts the guy was talking about.

    http://www.ogletreeseo.com/157.html

    http://www.ogletreeseo.com/146.html

  34. cyntax
    April 27, 2007 | 3:10 am

    Has anyone used the URL removal tool at http://www.google.com/webmasters/sitemaps/ ?

    I tried to remove my comments and feeds whcih are supplemental results right now, but no luck so far. I’m adding a robots.txt with

    User-agent: *
    Disallow: /wordpress/comments/
    Disallow: /wordpress/feed/

  35. trk
    April 27, 2007 | 3:10 am

    That is clever!
    Just want to add that, if like me, you are going to do a copy and paste, take care of the formatting (the quotes).

  36. cyntax
    April 27, 2007 | 3:16 am

    also, if you cut and paste the code above, you will get a syntax error because ofthe slanted quotes try this:

    “;
    }else{
    echo “”;
    echo “\n”;
    }
    ?>

  37. Jeff L.
    April 27, 2007 | 3:19 am

    There is some great information in your post as well as the comments. I’ve already added a meta and differentiating title to my pages and I’ll probably look into the robots.txt as some of your users have suggested.

  38. mark
    April 27, 2007 | 4:13 am

    @Guilherme Zuhlke O’Connor: “Apply the ‘nofollow’ property on links to archives and other pages with duplicate content.”

    That’s not necessary. Archives are nothing more than a different way to get to the same spot or post, that’s not duplicate content.

  39. Chris Garrett
    April 27, 2007 | 8:31 am

    @cyntax – unfortunately WordPress just does this, anyone know how I can stop the reformatting happening? :o

  40. Mr.Byte
    April 28, 2007 | 6:30 am

    It seems all the tags page are considered as good ones and the single post page are considered as supplemental :( Only now I understood why I am getting very less hits from Google as I am currently getting very high number of hits from Yahoo and MSN when compared with Google. Any idea, how to make the single posts back to good posts from supplemental?

  41. Chris Garrett
    April 28, 2007 | 10:46 am

    All I can suggest is follow the advice above and in the comments, but the main thing would be to try to get good links to the single pages

  42. Motorcycle Guy
    April 29, 2007 | 1:55 pm

    Another thing that could potentially cause this is your wordpress feed being indexed. For some reason google occasionally does this. You’re best of blocking it in robots.txt

  43. D4n
    April 29, 2007 | 4:49 pm

    The problem with Blogger seems to be the archive pages. If you have the widget to browse by Year / Month and so on, under this you have the item pages, which are fine. They have their own URL’s and won’t be duplicating any content, providing you don’t have every article on your main page. But, if you click on e.g. a Month name, you’ll see all articles for that month on one page … hence the duplication.

    So the only way around this is to add a meta to the page header for archive pages to get Google to ignore them. Thankfully Blogger allows you to identify archive pages in the template, so it should just be a question of adding the following code somewhere in the section of your template:

    Will have to see if that helps at all the next time Google comes around.

    Very useful article … thanks!

  44. Nathan
    May 3, 2007 | 3:20 pm

    Chris, Ever since you published this article I’ve been trying to figure out why my site had all its pages listed as supplemental. I’ve had a very robost robots.txt file installed for over a month. The only bad thing was I wasn’t using excerpts on my categories. Then today I ran across 2 articles of interest that really explain what might be going on. I’d be curious to see what you think about these:

    http://www.searchengineguide.com/wallace/2005/0209_dw1.html

    and

    http://www.mattcutts.com/blog/google-hell/

    - Nathan

  45. Jas
    May 4, 2007 | 3:29 pm

    My site went from tens of thousands of main index to two in matter of what seems weeks. Supp hell for sure. I’m guessing my server migration ended up on a bad neighborhood IP. I’m going to contact google and see what’s up.

blog comments powered by Disqus