eire web design home page contact eire-web design eire web design and development site map
stepping on in business
• SEO Dealing with Duplicate Content

SEO Dealing with Duplicate Content



Search Engines avoids indexing multiple copies of the same content – this is what we call it “Duplicate Content“.
Not only does a search engine not index such pages, but it also penalises a site for having the duplicated content.

Having Duplicate Content will not improve your website rankings in any of the major search engines, therefore should be avoided.

There are two major cases of Duplicate Content:

1. Duplicate content as a result of Site Overall Structure

  ♦ Print-friendly pages
  ♦ Exactly same website content different domains (domain.com -> domain.ie or domain.net)
  ♦ Affiliate pages
  ♦ Navigation links and breadcrumb navigation 
  ♦ Pages with similar content that can be accessed via different URLs
  ♦ Pages with items that are very similar in description and name, but they are different in (e-commerce catalogue) colour, size, etc.
  ♦ Pages with the same Title or Meta Tags values.
  ♦ Using URL-based session IDs
  ♦ Canonicalization problems. (eg. domain.ie versus www.domain.ie or /index.htm versus / )

Example using .htaccess to redirect domain.com to www.domain.com

RewriteEngine onRewriteCond %{HTTP_HOST} !^www\.domain\.com  

RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]

There are times when a  website has to contain duplicate content, as in the case of Printer friendly pages, which can be easily excluded from being indexed by SE using meta tag:

meta name="robots" content="noindex, nofollow"

You can also make use of the robots.txt file to exclude directories and files from being visited by search engines.
The robots.txt file should be placed in the root folder of your site and below you will find few basic things that will help you dealing with duplicated content:

#forbid all robots from your site
User-agent: *
Disallow: /

Disallow any URLs that start with a certain word:

#disallow ggogle from indexing URLs that starts with /blog ( note the leading / )
User-agent: googlebot
Disallow: /blog
#a particular page
Disallow: /page-name.html

You can also use wild-cards to disallow any URLs containing the sub-string of your choice ( in this case “print=”):

User-agent: googlebot
Disallow: /*print=

2. Duplicate content as a result of content theft

CopyScape is a service that helps you find content thieves by scanning for similar content contained by a given page on other pages.

If you are a victim to content theft, and want to take action, first let the individual using the content illicitly know by sending him a “Cease and Desist” letter, using the contact information you can gather from his website or in the WHOIS record of the domain name.

Failing that, the SE have procedures to alert them of stolen content:
  Google: http://www.google.com/dmca.html 
  Yahoo!: http://docs.yahoo.com/info/copyright/copyright.html 
  MSN: http://search.msn.com/docs/siteowner.aspx?t=SEARCH_WEBMASTER_CONC_AboutDMCA.htm

This entry was posted on Thursday, October 18th, 2007 at 7:37 pm and is filed under SEO. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...


  • One Response to “SEO Dealing with Duplicate Content”

fake googlebots gobbling my band width - Page 2 - Irish SEO, Marketing & Webmaster Discussion Says:
January 15th, 2008 at 2:12 pm

[...] wrote an article some time ago about dealing with duplicated content here: SEO Dealing with Duplicate Content | Website Design Ireland, Website Development __________________ :. Web Design & Development Web Design Ireland :. SEO Website Project SEO [...]

  • Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Links...

Chicklets...
  • http://www.eire-webdesign.ie/blog/feed/
    http://www.eire-webdesign.ie/blog/feed/
    Google Reader or Homepage
    Add to My Yahoo!
    Subscribe with Bloglines
    Subscribe in NewsGator Online
    add to msn
    Add to My AOL
    Add to Technorati Favorites!
    pageflakes
    windows live