How Google Handles Duplicate Content

by Duncan Heath on March 10, 2010 · 33 comments · Blogging


This a guest post from Duncan, an internet marketer who blogs about everything from on-site optimization to finding the best links on the net.

Duplicate content is a hot topic at the moment, with much speculation about if it can harm your site, or if you can actually benefit from scraping content from other sites and placing it on your own. Most webmasters, bloggers and SEO experts agree that accidental internal dupe content, caused by pagination, categorization etc, won’t harm your site power (apart from reducing internal linking power on the dupe pages), unless it is interpreted to be manipulative duplication, which can lead to penalties.

Aside from legal ramifications, there seems to be little negative effect from content theft – taking content from other people’s sites and publishing it on your own. Indeed, many people have RSS or other feeds from external sites populating their pages, and do not report any ranking problems for the pages of their site that do have unique copy.

One thing that many people are not clear on though is how Google determines which is the original source of the copy, and which are the duplicate versions. Here are some of things it looks at when determining original from duplicate content.

Finders Keepers

Google prides itself on being able to find new content quickly and add it to its index. It claims that its crawlers are getting quicker all the time and constantly striving for more “real-time” search. It stands to reason then that if Google has faith in its crawling speed, it should attribute some weighting to the version of content it comes across first. What helps is that the more authoritative sites tend to get crawled more frequently than lesser sites, who are more likely to be the ones scraping content.

Blog Post Dates

A slightly more speculative factor that Google might look at as part of its dupe content algorithm is post dates. Many blogs include a date above or below the content, indicating when it was published. Therefore if one version of a piece of content was marked with a date two weeks prior to another version of the content, it would suggest that the earlier version was the original.

However, some scraping sites automatically change the date on their posts to a couple of months previous to try and trick the search engines. A game of cat and mouse can sometimes ensue however by doing this, because if Google visits one day and does not find any content, then comes back the next day and finds the content with a date marked a week ago, chances are something is not right with this site.

Blog Post Date

Authority

As well as more regularly crawling powerful sites, Google also attributes more power to them when trying to distinguish the original source of copy. More respected sites rarely take content from other people, instead creating it themselves or having unique content created for them. Who would you suspect stole the money from the safe, the vicar or the local bum? Don’t answer that.

Backlinks

A big clue to Google about where the content originated is backlinks within the copy. A great deal of content is stolen by automated scraper bots who often maintain links contained within it. This is a good argument by the way for always including at least one internal link in your blogs/articles, because if Google sees a link pointing back to a site that has the same content, chances are that content originated at the arrowhead end of the link.

Your Thoughts

What other factors do you think goes into determining the original from duplicate content?



Join the top bloggers and new media experts in the world at BlogWorld Expo 2010
Join Kikolani at Blog World Expo! Use discount code EBIRD for 20% off.

Enjoy this post?

Use the social buttons below to add it on your favorite social sites or send it via email. Also, subscribe to posts via RSS or by email to get the latest on blogging tips, social media, and other information for successful bloggers.

Polariod Delicious IconPolariod Digg IconPolariod Email IconPolariod Facebook IconPolariod Mixx IconPolariod Sphinn IconPolariod StumbleUpon IconPolariod Twitter Icon

Comments are Dofollow, CommentLuv, and KeywordLuv enabled. Click here to ask questions and share your opinions. The website field is optional, so you can leave it blank or use your Facebook, Twitter, LinkedIn or other social network URL for your website link.

Related Posts

  1. Potential Benefits of Duplicate Content
  2. Google PageRank Update – Internal Pages and Top Content Analysis
  3. Is Your Content in Need of a Refresh?
  4. Google Trends, Apple News & My New iPod
  5. 11 WordPress Plugins to Analyze and Increase Traffic

{ 33 comments… read them below or add one }

1 CJ from Online Technical Writing March 10, 2010 at 8:48 am

Thanks for the information. I have been reading that a lot of people will take a post and duplicate it 1 article directory as is. I was always under the impression that duplicate content no matter where or how posted was bad thing. I never publish the same content twice for me or for my clients.

I do like the idea of always adding 1 internal link to the original site. As Google strives to do more and more crawling in real time, I’m just betting that they will also come up with a real time way to figure out which content is original and which is not. I just hope that people stealing content are penalized (eventually) to the point where it just isn’t worth it.
New from CJ @ Online Technical Writing CreativeAce Grand Opening – Online Store My ComLuv Profile

Reply

2 Armen Shirvanian March 10, 2010 at 8:49 am

Hi Duncan.

I had thought about this a little. It’s cool to see some of the main factors, and they make sense. We just assume that our articles will be viewed as our articles, and these various methods help to maintain that.

I wonder if the “blog post dates” part is lost for those who don’t show the date of the article above the post. Maybe the comment dates would help there.
New from Armen Shirvanian Team Up With A Partner To Make Progress My ComLuv Profile

Reply

3 Duncan March 10, 2010 at 9:46 am

An interesting thought Armen. Google may very well look at the comment dates also. Although it is worth considering the original source blog may not have any commenters and the scapping site may get loads very quickly.

Reply

4 Mike from Computer Tips March 10, 2010 at 9:26 am

Good insights. The tip on including at least one or two internal links in your posts is well worth doing. If someone is going to scrape your content then you might at least you pick up a couple backlinks out of the process. Plus the internal links are a good way to potentially get visitors to explore other posts/pages of your site. In addition Google seems to like lots of internal links.
New from Mike @ Computer Tips PCs, Macs and Security, Oh My My ComLuv Profile

Reply

5 Kathleen from Legitimate Work From Home Jobs March 10, 2010 at 11:35 am

This is probably the best written article on this topic yet. You mentioned scraper bots and how it is a good idea to have a link to your site within the article for backlink purposes. I used to get angry when someone scraped my articles until I started to see some (small) backlink benefit. Unfortunately, the benefit is so small it’s hardly worth mentioning as the scraper’s site is usually sub-par.

Reply

6 Tom from Market Samurai Promo Code March 10, 2010 at 1:18 pm

I’d have to agree with Kathleen. I have been trying to find information on duplicate content, but it is really hard to find quality information – which I believe this is!

I’ve had a few MFA sites scrape a couple of my blog posts before but they usually just post a couple of sentences and then link to me so I don’t worry about them.
New from Tom@Market Samurai Promo Code The One Top Affiliate Product For Your Blog! My ComLuv Profile

Reply

7 Udegbunam Chukwudi | StrictlyOnlineBiz May 12, 2010 at 11:26 am

Be wary of those sites cos they sometimes send you trackbacks et pingbacks as well. If you make the mistake of accepting these trackbacks, most of them turn around and remove your content from their blogs creating a one way link from your site to theirs. When Google sees this, it assumes you’re linking to a bad neighborhood and then penalizes you. I’ve been a victim before :-(
New from Udegbunam Chukwudi | StrictlyOnlineBiz Nigerians It’s Time We Forget About Paid Online Surveys My ComLuv Profile

Reply

8 Leo Dimilo March 10, 2010 at 2:09 pm

Hi Duncan,

This is actually a topic I covered in my latest newsletter. You see, I syndicate content on a regular basis and sometimes, my articles (which are originally written for my blog) get reprinted by my permission on some websites that are not only trusted but considered highly authoritative by Google. A lot of times, the syndication happens within hours of my post.

What I have found is that 9 times out of 10, the article on the syndicated site will not only outrank my article, but will cause google to filter the original page out of the index.

Furthermore, these sites also tend to get linked to very quickly as well as “tweeted” (I have had articles tweeted in the hundreds) making it look as though the syndicated website was the original.

Now, I could pooh-pooh this, quit syndicating and move on. But it is all a matter of picking your poison and what you are hoping to achieve. In my case, it is visibility for some of my websites and considering that many syndication networks have a huge reach thus giving me the potential for more eyes to reach my website, it is a good deal, even if the original was filtered out.

Just something to think about. I re-purpose content all the time because I know that the more eyes that are on my material, the greater visibility I will gain from it.
New from Leo Dimilo Ifs, ands, and buts…. My ComLuv Profile

Reply

9 Duncan March 11, 2010 at 1:50 am

Hi Leo,

Thanks for the different angle on the issue. Your absolutely right in saying that it’s not always a bad thing for syndicated content to outrank the original. I hope you manage to get a few nice backlinks from those sites though?

Duncan

Reply

10 Dave Doolin March 15, 2010 at 10:14 pm

Leo, thanks for these observations from your experience.

I started scheduling “syndicated” articles from colleagues who are currently more dormant than not.

My key question is whether to link back to the original article or not. Personally, I would prefer to do that, but if it would ding either the author or myself, then not.

We’ll see. Otherwise feels like a win/win for us.
New from Dave Doolin Top 10 Traits for Finding Your League of Extraordinary Bloggers – Saturday Morning Surfing My ComLuv Profile

Reply

11 Leo Dimilo March 30, 2010 at 10:16 pm

@Duncan,

Nope…it doesn’t ding anything. In fact, I have actually created small mini-nets from duplicate content with absolutely no penalty at all and it has actually been more beneficial than not.

Folks have duplicate content all wrong. If you really want to learn how to benefit from duplicate or syndicated content, then you have to test. This is especially for much debated topics such as this one where opinions about whether it is good or not range so wildly.

Personally, I think that most just take something and run with based on “he said, she said” information…a lot of time this is to the blogger or marketer’s detriment.
New from Leo Dimilo Bowling Balls, Postcards Or Give and You Get My ComLuv Profile

Reply

12 Steve from Lift Chairs March 10, 2010 at 2:21 pm

I hope Google doesn’t put too much weight behind dates, because as far as disingenuous bloggers go, dates are meaningless.

Usually, I try to spend a few days each month checking my articles to see who is scraping them. I used to take it a little too personally, but from my experience, in the long run, Google does a pretty good job of sorting it out…
New from Steve@Lift Chairs Understanding the Lift Chair and its Features My ComLuv Profile

Reply

13 Blog Angel a.k.a. Joella March 10, 2010 at 8:12 pm

This was very informative. I really wasn’t that clear on just how Google goes about determining which content in duplicate situations is the original. I have started making sure to include in post backlinks to my previous works to help distinguish my content as my own. It gives me some measure of comfort to know they are there.

My latest post: What Do Blog Readers Want? Fast Food Content Or A Home Cooked Meal?

Reply

14 Andrew from BloggingGuide March 11, 2010 at 1:19 am

It is really comforting that Google IS doing something with regards to duplicate content. At least we can rest in the comfort of the idea that Google does know and is not allowing it and is doing it’s best to find out the original one.
New from Andrew@BloggingGuide The best niche – should I change? My ComLuv Profile

Reply

15 Kaushik from Instant Fundas March 11, 2010 at 3:49 am

My personal experience tells me google doesn’t look at blog dates, also as @Leo Dimilo says, often syndicated article outranks the original pushing it out of the index. This is really really bad. Instead of trying to be real-time google should start improving it’s crawlers ability to identify original source.
New from Kaushik@Instant Fundas Create a customized, unattended Windows 7 installation disk or USB drive My ComLuv Profile

Reply

16 KS Chen from Google Adsense Tips March 11, 2010 at 6:34 am

I think the authority of domain is the keypoint to determine the original source of the content. The copycat will never be given the authority from others or search engine. In fact, i still do not know what can i do if i found out that my blog content has been copied in others blog without my permission.
New from KS Chen @ Google Adsense Tips Use Google Adwords Keyword Tool for Keyword Research My ComLuv Profile

Reply

17 Gautam Hans from Blog Godown March 11, 2010 at 8:48 am

I thinking In-linking is good for two things, first for seo purposes and secondly for reducing the effect of content scraping.
I have seen a lot of content scraping on my blog too, but usually i contact them.

At one point of time, somebody linked to a blog that was scraping for me and then i contacted the one who linked to me and showed the date of the post and he was convinced that my content is the original one
New from Gautam Hans @ Blog Godown 5 Tips to Increase Productivity while Blogging My ComLuv Profile

Reply

18 chandan from work at home jobs March 11, 2010 at 1:05 pm

I think here is going very nice discussion on duplicate content. This is one really going very big issues for us. In this regard I want to ask all people about my opinion. I am working as one SEO executive and I am doing offpage activities on that firm.

While it come to submit article, then I prefer to submit article to web 2.0 sites like squidoo, hubpages, vox, livejournal etc. and linkback to landing page using proper anchor text. I have made account on 10 best web 2.0 site and submitting article there whenever I get article from content writer. So my question is that is it better than submit article on article directories?

Reply

19 Marc March 11, 2010 at 5:32 pm

Great Post.

I have thought about duplicate content quite a lot. I work for a few vacation rental management companies. If you have several condos, whether its on management company A or B, the copy can be very similar. The events, attractions, and address can be as well. This can sometimes make adding properties very difficult. Both companies I work for have about 100 properties each but I personally know companies that have over 500 units.

I feel as though Google treats the real estate/vacation rental industry slightly different than other sites. Very popular sites like Homeaway or VRBO (part of the same network) can have the same property (including all amenities, description, and property name) and I have never heard of any penalties from either site. Maybe it has to do with authority?

Basically I just told you I have no idea :)

By the way, I received a link on a very popular blogging site. Over the next 48 hours I received no less than 20 ping backs of sites scrapping the blog owners content. So something should be done–not sure what it is though.
New from Marc Can Social Media Take Over The Web? My ComLuv Profile

Reply

20 search engine optimisation March 12, 2010 at 9:43 am

The duplicate content issue is made up of a lot of hype, yes there can be issues, however if your site has a good bit of authority and you set your posts to ping immediately your content is pretty safe. Adding an internal anchor in your posts can be a good idea but I would mix up the anchor text as this can cause penalties for anchor over optimisation.

Syndicated content is a good idea but only if the link points back to the page where the original content is, otherwise Google may index the syndicated content and throw your version into the supplemental index never to be seen again.

It’s good to raise these issues as it can be a problem for newer publishers.

Tim
New from search engine optimisation Google SEO Report Card: Some Important Take Aways My ComLuv Profile

Reply

21 Dennis Edell March 12, 2010 at 12:02 pm

“Most webmasters, bloggers and SEO experts agree that accidental internal dupe content, caused by pagination, categorization etc, won’t harm your site power…”

Interesting, so many people wrong all at once.

Google has stated quite clearly a number of times, THIS is the dupe content to watch out for MOST.
New from Dennis Edell Meet CJ – My Awesome New Designer! My ComLuv Profile

Reply

22 Duncan March 13, 2010 at 8:32 am

Care to provide some references Dennis? As far a my understanding goes, Google cares a great deal more about malicious dupe content and doesn’t expect every webmaster to know about the accidental stuff

Reply

23 Marc March 13, 2010 at 1:22 pm

Duncan,

For the most part I agree with you. It seems as though every blog and website will have some duplicate content. I kinda expect it actually. I would also be interested in some examples….
New from Marc Can Social Media Take Over The Web? My ComLuv Profile

Reply

24 Dennis Edell March 16, 2010 at 2:36 pm

I’ve been thinking about this since I wrote it. I knew someone was going to ask and I wanted to be able to snap one in here. lol

I can’t find them. There was one actually not too long ago…either on their blog or one of their many TOS/FAQ type pages…I’ll keep looking.

They do look at the malicious more for sure, but they also feel a webmaster should take as much care as possible to run things right. This is why they also gave examples of “accidental” duping…

Make sure to no follow/index things like tag pages, archives and all that jazz
One category per article where possible (more then one becomes dupe)
NOT to do full articles on home page (single post page then becomes dupe)
etc., there was more, but that’s the gist.

All on the articles I now can’t remember to find.

You’re right, newbies especially won’t know for some time. Malicious is the worst, but they do emhasize these things pretty highly.

I hope that helps a little.
New from Dennis Edell 48hr Downtime – Lots Done-LOTS More To-Do! My ComLuv Profile

Reply

25 Gail from Support Small Business May 6, 2010 at 5:12 pm

Graywolf published an interesting post about How Google Treats Duplicate Content on Trusted Sites Differently back in 2008. I suspect it is still relevant today.

Google’s official position on Duplicate Content Penalties and the Webmaster Central Blog’s Duplicate Content Penalty Myth post may shed more light on the controversy.

Jill Whalen said there is no duplicate content penalty but that is a rather old (2007) post.

It is easy to see there are a ton of duplicate sites indexed. Just search for a unique title or excerpt. You can see it with spam blog comments. Searches can turn up hundreds to thousands of identical comments still in the index.

The question is: how long do they stay there before they are filtered out?
New from Gail @ Support Small Business Social Media Marketing Monday – SMM Overview My ComLuv Profile

Reply

26 James from Photographic March 16, 2010 at 3:02 pm

I first started wondering about duplicate content in conjunction with a guy who was posting a lot of blog content in a LinkedIn group discussion board I moderate. His blog was basically a Google AdWords click-through farm, and none of the content was original.

They were good, though, at scraping decent content and then generating backlinks… or something else was at play. In every case I checked (ok – it was only three or four checks on my part), the original content ranked lower for SERPs and had lower page rank than the scraper.
New from James @ Photographic Halligan Tavern, Derry, NH My ComLuv Profile

Reply

27 Textile Articles March 23, 2010 at 12:36 am

Peoples who are busy to duplicate other contents they may be benefited for today but ultimately they will be loser.
New from Textile Articles Uses of Computer in Textile & Apparel Industry My ComLuv Profile

Reply

28 Rich from Yoga in Reading March 30, 2010 at 2:17 am

I’ve always assumed that Google was quick enough to crawl new content that it would be fairly accurate at determining the original source. It’s not something that’s ever been a particularly big issue for me, with a small website. But I suppose stick the odd backlink into any copy I write isn’t a bad move.

Another alternative is to put a copy of the article onto a couple of syndication websites, all with links back to the original article so that Google has a pretty clear idea of where the original comes from.

Reply

29 Scott from Forex Robot March 30, 2010 at 7:27 pm

I think that Google will place the content that holds the most weight as far as being on an authority site and the amount of backlinks the individual article has.
New from Scott@ Forex Robot Compare Forex Trading Software My ComLuv Profile

Reply

30 Simon from List of Search Engines April 1, 2010 at 9:48 am

Someone just stole one of my articles, substituting my author bio with their own links.

It’s not fun when that happens, and not much you can do about it when they just post it all over juicer blogs probably not worth the effort to try stopping them.

The thing I hate most about it is they create a lot of duplicate content that my own articles have to compete with, it probably devalues my articles to some degree.

Simon

Reply

31 LoneWolf from WWW Ramblings April 7, 2010 at 8:45 am

If your blog is set to ping when an article is posted then Google should have an accurate time frame to work with in determining an original. I also have the Google XML Sitemaps plugin that notifies Google, Bing and Ask.com when it is updated after content changes (it can notify Yahoo too if you have a Yahoo ID).

I guess there is a difference between duplicate content that you’ve produced by syndicating your articles and that which is blatant plagiarism or scraping. It would be nice if Google could distinguish between the two and punish the scrapers.

On that note, how do you find out if someone is scraping your articles? Is there any software that can make this easier?
New from LoneWolf@WWW Ramblings Internet Marketing Gone Wild My ComLuv Profile

Reply

32 Plus Size Woman June 17, 2010 at 10:02 pm

Great article. However it will become more and more difficult for the Search Engines and other bots to detect duplicate. The reason is, with article spinner software, one can take any interesting article and just substitute works and paragraphs with synonyms. The two documents, the original and the spun version, are actually the same in substance yet different in the “eyes” of search engines.
New from Plus Size Woman Tips To Getting That Profile Picture for Your Dating Site!My ComLuv Profile

Reply

33 Alex June 21, 2010 at 12:18 pm

Good effort! I think the most important factor here is the website authority, the rank. Sites like technorati,CNET reviews will obviously not copy content from an ordinary blogs. Therefore Google should be able to distinguish between the original and duplicate one. Anyhow, I think Google should not index blogs with entirely copied content. Really this is discouraging bloggers to write original and quality content. Google should take it into consideration soon. Thats my message to Google.Thanks!
New from Alex Internet Security Software Comparison and ReviewsMy ComLuv Profile

Reply

Leave a Comment

CommentLuv Enabled

Previous post:

Next post: