Stopping Content Scrapers

by My Web Writers

Anxious discussions on content scraping fill the Internet. The dishonest practice of taking content from one website and reposting it on another compromises the search engine results page rankings of many a company because it creates duplicate content. Search engines don’t like duplicate content and, for that reason, those sites showing identical content drop in their ranking position. Some content scraping victims have witnessed this precipitous drop in rankings firsthand. If your ranking position unexpectedly drops, include content scraping as one of the possible culprits.

Some instances of copied content could be seen as a compliment – your text is so fantastic that the individual wanted to use it. Some perpetrators have much more malicious intentions. The blog Content Scraping: Is Someone Ripping Your Content AND Good Name – Part I? explains that
Content scrapers are simple programs that scrape content on topic from sites or blogs for posting to the content scraper’s site. The sole purpose of content scrapers is to rip content, post it to a series of junk sites that are slathered with PPC and paid advertising, and make money on clickthroughs.

Far from being complimentary, this tactic boils down to money. Search engines, attempting to thwart these practices, have modified their algorithms. Google has chosen to demote the position of those IP addresses that are not deemed the originator of the content. Traditionally, Google interprets the originator of that content as the web site for which that content was first indexed. Google has also made it a point to target link farms and remove them from their indexes. Despite the penalties attached to content scraping, those who engage in content scraping keep fine-tuning their approach hoping to elude detection, some with success.

Given the ramifications of content scraping to search rankings legitimate online companies have tried to incorporate measures to protect their content. Here are a few ideas.

  • Revisit and revise your content on a regular basis. By keeping it fresh you decrease the likelihood that someone else will have your content. If the thought of returning to and tweeking all of your online content seems overwhelming, hire a content writing company like My Web Writers to do that work for you.
  • Put your personal touch on everything that you write. Do this by mentioning the name of your company within the text. Mention the different levels of services that you provide in the content by name. Identify anything that is unique to your company and include that in your content. Scrapers will most likely leave your site to find someone else whose content is more generic.
  • Include lots of internal links. Scrapers may not want content that always goes back to the originator of that content. If that doesn’t deter the scraper then the silver lining could be that the scraper has provided your website with more link juice. Beware the elated reaction to this possible increase in link juice. Some, more sophisticated content scrapers can run your content through a link stripper program removing all of your hyperlinks, thus drying up the link juice.
  • The most technical defense to content scraping is through coding. Block scrapers from accessing your content by including “Deny from” followed immediately with that scraper’s IP address in your root htaccess file. If you want to stick it to them while getting a good laugh then put in coding that sends the scraper to a “dummy” website. That “dummy” website can be the scrapers own website, a page full of rambling, illogical content, or a site full of pictures. If scrapers give you lemons, make lemonade.

Purveyors of content scraping will constantly be at work trying to work around the defenses put into place by the search providers. As creators of online content and considering the ramifications of scraped content, online writers can’t afford to be lackadaisical in their efforts to counter this offense. If you want to fend off content scrapers, give these strategies a try.



Leave a comment

Filed under Content

Can We Talk Here? Sure Can!

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s