Sign up right now for email updates and get two free ebooks:

Benefiting from Blog Scrapers

A popular gripe of bloggers is the scraper scum. You know, the people who copy your content then slap ads around it or use it as search engine fodder so they can pimp their spam. It might be apparent that these gits annoy me too.

Today I am installing a solution which I hope will go some way to mitigating the problem. Damian has updated his WP_RssSticky plugin that I use to put the free ebook download link into my feed to allow you to insert the title and URL of the current post.

If wish to insert the post title or post url into the sticky message put @@post_title@@ or @@post_url@@ in the sticky message, this will be replaced when the message is displayed.

Why is this important? Well now when scrapers steal your content at least they will be linking back to you. The more they promote their copied and pasted spam, the more weight those backlinks will provide you. I’m hoping also it might hint to Google and co that your post is the original.


… Becomes …

Post in a feed reader
It won’t stop them scraping, it won’t make you feel better, but at least it might give you a small amount of satisfaction that you are gaining something out of it.

Sign up right now for email updates and get these
two free ebooks

"Creating Killer
Flagship Content"

"Authority Alliances"

Just enter your primary email address in the form below and hit the button!

Before commenting, please read my Comments Policy - thanks!

Comments

  1. I do the same for my feed. I went a step further and also added some digital fingerprint (just some fixed set of characters which is unique in a Google search). This way I can find content scrapers by just searching for this fingerprint in Google.

  2. I do the same for my feed. I went a step further and also added some digital fingerprint (just some fixed set of characters which is unique in a Google search). This way I can find content scrapers by just searching for this fingerprint in Google.

  3. I like the principle, but I’m a bit wary of being associated with the “bad neighbourhoods” Google always talk about.

    In fact, I’m more cross when a scraper takes the whole content and links back to me, than when they just take the text, because it’s a link that could potentially count against me.

    Mind you, if you’re doing it for human recognition rather than SEO, then it could have more benefit.

  4. I like the principle, but I’m a bit wary of being associated with the “bad neighbourhoods” Google always talk about.

    In fact, I’m more cross when a scraper takes the whole content and links back to me, than when they just take the text, because it’s a link that could potentially count against me.

    Mind you, if you’re doing it for human recognition rather than SEO, then it could have more benefit.

  5. @Andreas – Nice idea πŸ™‚

    @Andy – The way I understand the bad neighborhoods, it is only when linking out you have a problem because Google people realize you can not control who links back to you or scrapes your content?

  6. @Andreas – Nice idea πŸ™‚

    @Andy – The way I understand the bad neighborhoods, it is only when linking out you have a problem because Google people realize you can not control who links back to you or scrapes your content?

  7. As long as Googlebot visits your site more often that the scrapers’ then your post should always be selected as the original, right? Meaning that high-ranking, frequently updated sites should be safe from having posts flagged as duplicates of scrapers. The problem is for the smaller blogs – the ones that Googlebot doesn’t visit that often.

    I had recently switched to parital feeds to try and combat the problem – I didn’t really want to do that but it seems like the only solution to avoiding the possibility that my original post may get flagged as a duplicate.

    Putting links back to the original content is fine as long as the scraper script does not remove all links.

  8. As long as Googlebot visits your site more often that the scrapers’ then your post should always be selected as the original, right? Meaning that high-ranking, frequently updated sites should be safe from having posts flagged as duplicates of scrapers. The problem is for the smaller blogs – the ones that Googlebot doesn’t visit that often.

    I had recently switched to parital feeds to try and combat the problem – I didn’t really want to do that but it seems like the only solution to avoiding the possibility that my original post may get flagged as a duplicate.

    Putting links back to the original content is fine as long as the scraper script does not remove all links.

  9. good idea Chris, I’ve been doing similar for a long time now. you should check out Joost’s new plugin for this, its more full featured, with more features in the works

    http://www.joostdevalk.nl/make-the-scrapers-work-for-you/

  10. good idea Chris, I’ve been doing similar for a long time now. you should check out Joost’s new plugin for this, its more full featured, with more features in the works

    http://www.joostdevalk.nl/make-the-scrapers-work-for-you/

  11. Ugh, I hate scrapers like Mal Reynolds hates Reavers. πŸ™‚ (Firefly reference, sorry.)

    Google indexes my blog pretty instantly, but I do still worry about scrapes–so many folks report having their own content downgraded to duplicate while the scraper benefits from the credit for the original content. I wonder if there is some timestamp spoofing going on with that. If Google is doing its job properly it should be able to sort it out, but like so many Google things, it seems to be an endless dance.

    Probably, like Reavers, this is something I should quit losing sleep over . . .

  12. Ugh, I hate scrapers like Mal Reynolds hates Reavers. πŸ™‚ (Firefly reference, sorry.)

    Google indexes my blog pretty instantly, but I do still worry about scrapes–so many folks report having their own content downgraded to duplicate while the scraper benefits from the credit for the original content. I wonder if there is some timestamp spoofing going on with that. If Google is doing its job properly it should be able to sort it out, but like so many Google things, it seems to be an endless dance.

    Probably, like Reavers, this is something I should quit losing sleep over . . .

  13. Hi Chris,

    thanks for the post, I’ve seen a write up on this plugin a few places and I had some general questions in regards to the title of this post and the benefits of the plugin itself.

    In the idea that web scrapers are primitive and only copy content, and all links to other locations on the web, it’s safe to assume that this plugin would work and provide a link back to the orginal content owners website.

    However wouldn’t it defeat the purpose of a scraper if all content scraped contained links back to their original posts and were not forwarded to their own content (or scraped site)?

    Regards,

    evolve

  14. Hi Chris,

    thanks for the post, I’ve seen a write up on this plugin a few places and I had some general questions in regards to the title of this post and the benefits of the plugin itself.

    In the idea that web scrapers are primitive and only copy content, and all links to other locations on the web, it’s safe to assume that this plugin would work and provide a link back to the orginal content owners website.

    However wouldn’t it defeat the purpose of a scraper if all content scraped contained links back to their original posts and were not forwarded to their own content (or scraped site)?

    Regards,

    evolve

  15. Hi Chris,

    I’m the author of the plugin mentioned above, if you want to try it and or have feature requests, let me know!

  16. Hi Chris,

    I’m the author of the plugin mentioned above, if you want to try it and or have feature requests, let me know!

  17. I probably should mention this on Joost’s site but, I don’t really see the point of this. It’s incredibly rare, in my experience, for scrapers to copy links (i.e.: the copy the text, but not the HTML code), so they likely won’t be linking back to you even if you do use this plugin. Or is there something I’m missing?

  18. I probably should mention this on Joost’s site but, I don’t really see the point of this. It’s incredibly rare, in my experience, for scrapers to copy links (i.e.: the copy the text, but not the HTML code), so they likely won’t be linking back to you even if you do use this plugin. Or is there something I’m missing?

  19. @Adam: I’ve got it running for two weeks now, and have gained about 40 links… Yes some scrapers remove links, but loads of them leave all the links inside the article where it is…

  20. @Adam: I’ve got it running for two weeks now, and have gained about 40 links… Yes some scrapers remove links, but loads of them leave all the links inside the article where it is…

  21. @Steve – It has been a little while but these links are showing up so I am glad I did it.

    @Google – I did check it out after reading these comments and it is a cool plugin but Damian’s does all the same things in a way that suits me better. Perhaps try both to see which you prefer?

    @Sonia – I am always in favor of a geek tv reference πŸ™‚

    @evolve – Most don’t do anything sophisticated. I have been quite successful in tracking down the worst offenders using backlinks and this text

    @Joost – Thanks πŸ™‚

    @Adam – You would be surprised, a lot do retain links, especially when you link images, early in the post and in the footer in this way. Try it πŸ™‚

    @Joost – I have seen the same thing πŸ™‚ It’s not a 100% solution but better than nothing, right? πŸ™‚

  22. @Steve – It has been a little while but these links are showing up so I am glad I did it.

    @Google – I did check it out after reading these comments and it is a cool plugin but Damian’s does all the same things in a way that suits me better. Perhaps try both to see which you prefer?

    @Sonia – I am always in favor of a geek tv reference πŸ™‚

    @evolve – Most don’t do anything sophisticated. I have been quite successful in tracking down the worst offenders using backlinks and this text

    @Joost – Thanks πŸ™‚

    @Adam – You would be surprised, a lot do retain links, especially when you link images, early in the post and in the footer in this way. Try it πŸ™‚

    @Joost – I have seen the same thing πŸ™‚ It’s not a 100% solution but better than nothing, right? πŸ™‚