Duplicate content: what it is and how to avoid SEO issues
Today we are dealing with a particularly thorny issue for SEO, the one about duplicate content: that is, to simplify, content that is repeated identical or broadly similar in various web pages, within the same site or on different sites. This practice may have a deceptive intent, but it is generally the result of poor optimization or laziness, and may lead to a worsening in ranking of pages and, in general, a difficulty in positioning these contents. That is all there is to know about what duplicate content is, how to detect it, how to correct the problem and avoid its reappearance.
SEO duplicate content, a knot to face
Wanting to provide a definition, duplicate content is content that is reproduced as an identical or very similar copy in multiple locations on the Web, within a single Web site or across multiple domains, and therefore each content that is located in more than one single Web address or URL.
More precisely, the Google guide explains that the expression refers to “blocks of important content within or between domains that are identical or very similar”, which can give rise to what we have referred to as a serious and frequent SEO error.
Duplicate portions of text in different languages are not considered to be content, just as quotes (even full paragraphs) are not identified as an error, especially if we use the semantic <cite> markup within the source code.
Why duplicate content represent an issue and an error
While not technically leading to a penalty, duplicate content can still sometimes negatively affect search engine rankings: when faced with multiple parts of “significantly similar” content in more than one location on the Internet, Google has difficulty deciding which version is most relevant for a given search query.
In general, duplicate content is considered a node to be solved because – basically – it does not offer added value to the user’s experience on the pages of the site, which should be the focal point of each content published. Thinking like a user, would we regularly visit a site that presents non-original articles, or will we try to read directly the original source of this information?
In addition to problems in organic search, duplicate content may also be in violation of the policies of Google Adsense Publisher, and this may prevent the use of Google Ads in sites with copyrighted or copyrighted content: or, in pages which copy and republish content from others without adding any original wording or intrinsic value; pages which copy content from others with slight modifications (rewriting it manually, replacing some terms with simple synonyms, or using automatic techniques) or sites dedicated to insert content such as videos, images or other media from other sources, always without adding substantial value to the user.
The several different typologies of duplicate content
Among the examples of non-malicious and non-deceptive duplicate content Google quotes:
- Discussion forums, which can generate both regular and “abbreviated” pages associated with mobile devices.
- Items from an online store displayed or linked via multiple separate URLs.
- Web page versions available for printing only.
IIn fact, it must be made clear that there are two big categories of duplicate content, those on the same site and those on other websites, which of course represent two different problems of order and scale.
Duplicate content on the web
Duplicate content on the Web, or external duplicate content, occurs when an entire content or a portion of it (such as a paragraph) is repeated on several different sites (domain overlapping).
This error can result from a number of factors, and for example it is common in e-commerce that publish as a reprocessing without variations the product sheets provided by the original manufacturer of an article for sale, but sometimes it can also be a manipulative technique in the “attempt to control search engine rankings or acquire more traffic”.
An eventuality that Google knows and tries to punish (penalizing the ranking of the sites involved or even removing from the Index the sites themselves), because “deceptive practices like this can cause an unsatisfactory user experience”, showing visitors “always the same contents repeated in a set of search results”.
Apart from this diversity problem for users, a duplicated external content also embarrasses Googlebot that in the face of identical content in different Urls, does not initially know what the original source is and is therefore forced to take a dec