What is the rel canonical and how to set URL canonization

Put us to the test
Put us to the test!
Analyze your site
Select the database

Sometimes a web page can be reached using more than one URL, or a site has several pages with similar content (typical case of alternative desktop and mobile versions, for example): in these situations, Google may find it difficult to understand which is the priority content and has to make a decision about which version to show in Search, considering all other duplicate versions of that page.

Already from this we should understand why the canonical, i.e., the system for signaling to the search engine the priority of a URL from a set of pages with identical or nearly identical content, is important, and why the rel canonical, the tag element with which we can concretely communicate our choice to Google (which, however, may not even accept it, as we shall see) and correctly suggest canonical URLs, is crucial.

What is the rel canonical tag?

The canonical, rel canonical or canonical link tag is a specific attribute that is manually inserted into the HTML code of a page to specify a priority feature of the page and indicate to search engines that that specific URL represents the main copy of a page, its original and final version as opposed to other duplicate, near-duplicate and similar pages.

The rel=canonical snippet is an attribute of the link tag that tells search engines which, among the several available of similar content, is the most representative URL for indexing a site. In practice, through this command we can set a single URL as the canonical version, asking crawlers to crawl that page as a priority, while all other similar URLs are considered duplicate URLs and crawled less frequently.

What the rel=canonical tag looks like

Canonical tags use simple, consistent syntax and should be placed in the <head> section of a Web page; they basically look like this

<link rel=“canonical” href=“https://example.com/sample-page/” />

and each part of that code has a meaning. To be precise, the first part [link rel=”canonical”] represents the suggestion, the indication of the tag, while the second part [href=”https://example.com/sample-page/”] concretely indicates which URL we intend to signal as the main canonical version of the page.
Come appare il tag rel canonical

What canonization means and what are canonical URLs

Therefore, canonization (sometimes also translated as canonicalization) is called the process of selecting the page and URLs deemed to be a priority for a particular site, which is carried out to solve any redundancy problems, that is, duplication of content on different URLs.

The term is derived from the Latin canone, which basically means rule or criterion to be adopted, and thus canonical means that it conforms to this principle.

Staying on the subject of definitions, it follows that we call a canonical URL the version of the page that Google considers most representative among a set of duplicate pages on the same site, which becomes the main source for evaluating content and quality provided.

What is canonical, Google’s explanations of canonical

In July 2023, Google updated its supporting documentation on canonical, dividing the topic into three separate sections and changing much of the content to provide clearer details on how Google search and canonical work.

In the new version, it states that canonization is the process of selecting the representative or canonical URL for a piece of content. Accordingly, a canonical URL is the URL of a page that Google has chosen as the most representative from a set of duplicate pages. Often called deduplication, this process helps Google show only one version of the otherwise duplicate content in its search results.

There are many reasons why a site may have duplicate content:

  • Regional variants. This is the case, for example, where a single piece of content created for the United States and one for the United Kingdom are accessible from different but essentially similar URLs and in the same language
  • Device variants. Happens when we serve a page with a mobile version and a desktop version.
  • Protocol variants. Can happen with HTTP and HTTPS versions of a site.
  • Site features. They depend, for example, on the results of the sorting and filtering functions of a category page.
  • Accidental variants. This is what can happen, for example, if we accidentally leave the demo version of the site accessible to crawlers.

It should be premised that some duplicate content on a site is normal and not a violation of Google’s anti-spam rules, but the guide points out that having the same content accessible via many different URLs can be a negative user experience and could make it more difficult to monitor content performance in search results.

Canonical and SEO: why it is important to report the canonical URL

First and foremost, the effective use of rel canonical allows us to avoid the presence of duplicate content in the eyes of Google, which can damage a Web site’s search engine ranking, causing the well-known effect of keyword cannibalization or, even more specifically, URL cannibalization, a situation that occurs when the search engine recognizes multiple pages of similar content – or, to put it better, a site offers ambiguous information to the crawler and does not make it easier for it to understand which is the main page to crawl and possibly rank; moreover, if the presence of duplicate content on multiple URLs is very frequent, the search engine may also penalize the site for spam.

Therefore, setting the rel = canonical attribute and checking the consolidation of signals by standardizing URLs and internal linking practices can also help us ensure that search engines direct users to the desired pages, which will benefit SEO.

However, canonization also serves Google and the search engines, because it prevents crawlers from having to crawl the same things over and over again, in light of the fact that Google does not want to crawl or render multiple things unnecessarily, nor serve the same content proposed in different URLs, because these would not be good search results.

The main false myths about canonization

This issue was also the focus of an episode of SEO Mythbusting season 2, Google’s YouTube series addressing the main “false myths” of SEO, which clarified some controversial aspects about canonicalization, first specifying that it does not mean topic grouping, but is a system for prioritizing a URL from a set of pages with identical or nearly identical content to reduce duplication. 

On the occasion, host Martin Splitt’s guest and interlocutor was Rachel Costello (Technical SEO Consultant at Builtvisible and formerly Technical SEO & Content Manager at DeepCrawl, a position she held at the time of registration), who tells how in her experience false myths about this topic include doubts about whether canonization is a signal or a directive, whether it can be used as a redirect, and then again site preferences versus user preferences and more.

Splitt e Costello parlano di canonizzazione

In particular, there are two major misconceptions most prevalent about canonical: first, “people think it’s a directive, set a canonical tag and it will be accepted.” In reality, canonical is an HTML suggestion that a site can set to signal to a search engine which is the main URL to use for a page/content.

Another frequent case of misconception is the use of canonical as a redirection: “If you have a product page that is not available, add a canonical to that category page,” says the expert, adding that “it doesn’t really work that way,” because “the content has to be identical or nearly identical,” as Martin Splitt confirms.

Google’s explanations of rel canonical

And it is Google’s Developer Advocate who clarifies these doubts and explains for good what canonization is: first of all, it is not a directive – that is, an instruction that search engines are required to comply with – but a signal, i.e., a hint, a suggestion, that helps search engines understand what we want to canonicalize (what we want to give importance and priority to in Search), but which the search engines themselves can decide whether to use or not.

Canonization is not a directive

When we talk about canonization, Splitt says, “we’re talking about detecting the same content or very similar content that exists on different addresses and different URLs,” and Google can “do many different things to identify these things“. For example, it can simply crawl multiple pages and find that they deal with the same content, or still see if the URLs use the same links and the same type of context, or just use the canonical tag.

That is, it must be understood that Google uses many different signals to “figure out whether something has the same content or not,” and canonicalization using the canonical tag is just one of them. To make it effective, however, it is necessary to set the canonical tag well: it will not work to “put it on pages that do not have the same content, but it is also not good to put it on each of the identical pages.” Using the canonical tag of a page well avoids completely delegating the choice to Googlebot on the best page to show in the search results: in addition to the specific tag, there are as mentioned other signals that Google takes into account to combine URLs with similar content and operate a deduplication.

Others include inter-page redirects, internal links, outbound links, indications in sitemaps, hreflang, clean or shortened URLs.

Canonization and redirects, SEO theories and reality

Nor should the canonical tag be used to make a redirect, warns Splitt, because it does not serve as a redirect, although there is often confusion about this. This is confirmed by Rachel Costello, who says she has noticed how people try hard to group link equity in one place and page, and then use canonical as a desperate attempt to achieve the goal.

This is another mistake, because — Martin Splitt reiterates — canonical only comes into play and makes sense when “you cross the same content on different platforms or channels in slightly different places, for whatever reason you’re doing it.”

But, in the case of out-of-stock and unavailable products, you simply have to do a redirect “to something similar that makes sense to the user at that point,” or put the page in 404 to tell Google that “this is the current situation but it could come back.”

There are actually some theories in the SEO field that urge using a redirect from duplicate resources to the preferred URL: the 301 redirect actually conveys link equity to the destination page, while there are no official positions about canonical tags. However, in the face of this hypothetical advantage the technique poses some doubts, especially in terms of usability and user experience: with redirects duplicate pages are eliminated, because the user is automatically transferred to the other resource and, for him (and the crawlers) the other page does not exist.

In contrast, rel canonical makes life easier for the crawler (which reads the setting and scans URLs as described), while in practical terms users can visit all URLs, even non-canonical ones.

Canonical and wasted crawl budget

It is important to pay attention to the correct use of the canonical tag, because otherwise we can risk wasting crawl budget.

If we have identical pages and we have not set (or we have done it wrong, or we often reverse the chosen page) canonical, Googlebot will go back to crawling all the content in an unnecessary and detrimental way to the economy of the site, which will also increase its Google crawl load.

Even worse is to use canonical as a redirect, because in this case the search engine is faced with pages marked as identical, but which in fact are not, and so it will continue to pass over all of them.

Duplication and deduplication, the signals for Google

The video then goes on to discuss the technical factors Google takes into account when performing deduplication of content on the same site: these are all automatic signals because the work on duplication and deduplication is done “without a lot of human interaction,” says Splitt, but “Google appreciates content fingerprinting” and tries to understand “what the essence is, what the information is, how it relates to the structure of the site, what is written in the sitemap; in short, we are looking at a number of different factors, mostly technical.”

And, in practice, Google scores on an ongoing basis, so it doesn’t determine these issues just once and always sticks to the same decision: “We always look at the fresh content taken from the crawling, and then we take a look at the page — this changes, this has changed, now it’s very close to the previous version, now something that was a duplication is no longer duplication, because the content has been changed.”

Sometimes, Splitt continues, “especially when virtually everything is shown in the same URL structure and it’s like different language versions of the same thing, but with the same content, then we might end up with a very similar score.” If Google sees two versions, “let’s say a 0.49 and a 0.51 of what we think is a duplicate of the other, then it is really hard to choose which will be the canonical page“.

Complicating matters even more is the fact that the situation can change: Google can crawl differently, or it can change the way the crawler fetches data, and even pages touched before can influence “to sort of jump between these two numbers.”

And then there’s canonical: a clear signal to help search engines and not confuse algorithms engaged in figuring out what the duplication is between analyzed content. “Because if we have two identical pieces of content, how do we know which one to choose?” summarizes Martin Splitt.

Canonization and unique content

Another aspect investigated by this episode concerns the amount of unique content on a page that is necessary for Google to accept it as a canonical version, and according to Splitt, even a small amount of original content that does not exist on other pages may suffice.

However, “if the content is completely different or different enough for the algorithms to decide that it is not a duplicate, then canonical is useless”, says the Googler.

Google and canonical: the search engine may choose a different page

Despite this guidance, however, sometimes Google can still make a different decision and replace the site’s preferred canonical page with one that is better for users.

Another of Mountain View’s public voices (and frequent guest on our blog), John Mueller, discusses this in depth, responding in one of the #AskGoogleWebmasters events to a question posed by a user (@uale75, Italy’s Mariachiara Marsella): “You can indicate your preference on Google using various techniques, but Google may choose a different page as canonical than the one you set, for various reasons. So what are the reasons?” asks the BEM Research co-founder.

John Mueller’s explanation starts with the basics: it is common for a site to have multiple unique URLs that deal with the same topic, and he cites common cases such as those of a site with WWW and no WWW version showing the same content, a home page that is also accessible by adding /index.html, or pages to which URL versions that use lowercase and uppercase letters are directed.

The canonical URL setting

The ideal situation for Google is not to encounter these alternate versions, which is why it is recommended to choose a URL format and use it consistently on the website. For the Google Search system, in fact, it does not make sense to index and show all these versions of pages pointing to the same content, and so Google tries to select and focus on only one.

This is precisely the canonical URL, which can be indicated by the webmaster or chosen automatically by the search engine. More precisely, as the comprehensive official guide also says, Googlebot chooses the page it considers most complete and useful and marks it as canonical.

Canonization according to Google

There are two guidelines the algorithm follows to select a canonical URL:

  • Which URL resembles the one the site wants to use, i.e., what is the site’s preference.
  • Which URL is most useful to users.

I parametri di Google sul canonical

With respect to site preference, Google considers different parameters, such as the annotation rel canonical link that site managers may use, but also redirects, internal links, URLs included in the sitemap, presence of HTTPS protocol (preferred to the old HTTP), all the way down to an, so to speak, aesthetic criteria. That is, Mueller says Google also evaluates which URL seems to be prettier from the point of view of structure and form.

How Google indexes and chooses the canonical URL

When Google indexes a page, it determines the main content (or core element) of each page: if it finds multiple pages that look the same or with very similar main content, it chooses the page that, based on the factors (or signals) gathered from the indexing process is objectively the most complete and useful for search users, marking it as canonical. The canonical page will be crawled more regularly, while duplicates are crawled less frequently to reduce crawling load on sites.

There are a handful of factors that play a role in canonicalization: whether the page is published via HTTP or HTTPS, redirects, presence of the URL in a sitemap, and annotations of rel=”canonical” links. We can indicate our preference to Google using these techniques, but Google might still choose a different page as canonical, for various reasons.

The reason is that indicating a canonical preference is a suggestion, not a rule that Google is obliged to follow.

Google uses the canonical page as the primary source for evaluating content and quality, and a Search result will usually point to the canonical page, unless one of the duplicates is explicitly more suitable for a search user: for example, the result will likely point to the mobile page if the user is on a mobile device, even if the desktop page is marked as canonical.

Evaluations to consolidate URLs

The search engine system considers all these factors for each potential canonical URL and decides which version to canonize, analyzing the one that best puts the parameters together.

Therefore, says the Search Advocate, “if you are a site owner and have a strong preference for the URLs you want to show users in search, you should first make sure you use those preferences consistently throughout your Web site,” Mueller advises.

Methods for signaling a canonization

On a theoretical level and in an ideal situation, “search engines should not come across any of these alternatives: if you have a preference, stick to it“, adds the Googler. As an aside, and back to the real world, limiting problems requires making sure that a site has all these canonical factors aligned similarly: that internal links use the preferred URL format, that the sitemap lists only the preferred URLs, that rel canonical link elements are matched.

Come impostare i canonical sul sito

The more consistent you can be, the higher the likelihood that Google’s systems will follow your directions and choose your preferred URLs as canonical.

How to specify a canonical

Exiting the videos and returning to the new version of the guide, we can then see what suggestions Google invites us to follow to specify a canonical URL on duplicate or very similar pages by indicating preference.

metodi per specificare canonical

Indeed, we have several methods, sorted by how strongly they can affect canonization:

  • Redirects. They are a strong signal that the redirect target should become canonical.
  • Attributes rel=”canonical” to links. They are a strong signal that the specified URL should become canonical.
  • Sitemap inclusion. This is a weak signal that helps URLs included in a sitemap to become canonical.

We can use these solutions in combination, thus making them more effective and increasing the chance that the preferred canonical URL will actually appear in search results.

  1. Using rel=”canonical” link annotations

Google supports canonical rel link annotations provided through two methods. The advice is to choose only one of the paths: although supported, in fact, using both methods simultaneously is more prone to errors (for example, we might provide one URL in the HTTP header and another URL in the rel=”canonical” link element).

  • Link element rel=”canonical” in HTML code.

Also known as a canonical element or canonical element, this is an element used in the head section of HTML to indicate that another page is representative of the page content. For example, if we want https://example.com/dresses/green-dresses to be the canonical URL, even though a variety of URLs can access this content, we can canonize it with these steps:

We add a <link> element with the rel=”canonical” attribute to the <head> section of duplicate pages, pointing to the canonical page, as in the image.

Esempio di inserimenti di canonical

If the canonical page has a mobile variant on a separate URL, we add a rel=”alternate” link element, directing to the mobile version of the page:

Esempio di canonical alternate

We add any hreflang or other element appropriate for the page.

We use absolute paths instead of relative paths with the rel=”canonical” link element. Although relative paths are supported by Google, they can cause problems in the long run (e.g., if we unintentionally allow the test site to be crawled) and therefore are not recommended.

Buon utilizzo di canonical vs cattivo utilizzo

The rel=canonical link element is only accepted if it appears in the <head> section of the HTML, so we need to make sure that at least the <head> section is valid HTML. If we use JavaScript to add the canonical element, we must use the correct injection procedure.

  • HTTP header link rel=”canonical”

If we can change the server configuration, we can use an HTTP rel=”canonical” header instead of an HTML element to indicate the canonical URL for a Search-supported document, including non-HTML documents such as PDF files.

Google currently supports this method only for web search results.

If we publish content in many file formats, such as PDF or Microsoft Word, each on its own URL, we can return an HTTP rel=”canonical” header to indicate to Googlebot what the canonical URL is for non-HTML files. For example, to indicate that the PDF version of the .docx version must be canonical, we could add this HTTP header in the image for the .docx version of the content.

Inserimento canonical header http

As with the rel=”canonical” link element, Google reminds us to use absolute rel=”canonical” URLs in the HTTP header and only double quotes around the URL.

   2. Using a sitemap

In this case, we will choose a canonical URL for each of the pages and submit the list in a sitemap: all pages listed in a sitemap are suggested as canonical, and Google will decide which pages (if any) are duplicate, based on content similarity.

Providing preferred canonical URLs in sitemaps is an easy way to define canonicals for a large site, and sitemaps are a useful way to tell Google which pages we consider most important on the site.

    3. Using redirects

This method is useful for getting rid of existing duplicate pages. All redirect methods (301 and 302, meta-refresh, JavaScript redirects) have the same effect on Google Search, however, the time it takes search engines to notice different redirect methods may differ.

For a faster effect, Google encourages the use of server-side redirects, i.e., HTTP 3xx redirects

If the page can be reached in different ways, as in the example

  • https://example.com/home
  • https://home.example.com
  • https://www.example.com

we can choose one of these URLs as the canonical URL and use redirects to send traffic from the other URLs to that preferred URL.

       4. Other signals

In addition to the explicitly provided methods, Google also uses a number of canonization signals generally based on site configuration: HTTPS preference over HTTP and URLs in hreflang clusters.

  • HTTPS preference to HTTP for canonical URLs.

Google prefers HTTPS pages over the equivalent HTTP pages as canonical, except when there are problems or conflicting signals such as the following:

  1. The HTTPS page has an invalid SSL certificate.
  2. The HTTPS page contains unsafe dependencies (other than images).
  3. The HTTPS page redirects users to or through an HTTP page.
  4. The HTTPS page has a rel=”canonical” link to the HTTP page.

Although Google’s systems prefer HTTPS pages over HTTP pages by default, we can reinforce this behavior by taking one of the following actions:

  1. Add redirects from the HTTP page to the HTTPS page.
  2. Add a rel=”canonical” link from the HTTP page to the HTTPS page.
  3. Implement HST .

To prevent Google from incorrectly canonizing the HTTP page, on the other hand, we need to follow the following steps:

  1. Avoid erroneous TLS/SSL certificates and redirects from HTTPS to HTTP because they cause Google to strongly prefer HTTP. The HSTS implementation cannot ignore this strong preference.
  2. Do not include the HTTP version of pages in the sitemap or hreflang annotations instead of the HTTPS version.
  3. Avoid implementing the SSL/TLS certificate for the wrong host variant – as if example.com serves the certificate for subdomain.example.com. The certificate should match the full URL of the site or be a wildcard certificate that can be used for multiple subdomains on a domain.
  • Prefer URLs in hreflang clusters.

To aid site localization efforts, for canonization purposes Google prefers URLs that are part of hreflang clusters. For example, if https://example.com/de-de/cats and https://example.com/de-ch/cats point to each other with hreflang annotations, but not to https://example.com/de-at/cats, pages for de-de and de-ch will be preferred as canonical instead of the /de-at/ page that does not appear in the hreflang cluster.

When and why to use Canonical Rel

Google’s guidance allows us to understand when to use Rel Canonical: if multiple pages have the same content, if they are optimized for the same keywords (and thus compete with each other), if they have some similar properties such as meta tags, headlines or links in content.

More precisely, you should explicitly choose a canonical page in a set of duplicate or similar pages in these situations and with these goals:

  • To specify the URL to be shown in search results. For example, if we prefer that users reach the page related to green clothes via https://www.example.com/dresses/green/greendress.html instead of https://example.com/dresses/cocktail?gclid=ABCD.
  • To group indicators associated with links for similar or duplicate pages. Allows search engines to group the information they have about individual URLs (e.g., links pointing to URLs) into a single preferred URL. This means that links on other sites that link back to http://example.com/dresses/cocktail?gclid=ABCD are grouped with links that link back to https://www.example.com/dresses/green/greendress.html.
  • To simplify the monitoring of metrics related to a single product or topic. The presence of different URLs makes it more complicated to receive grouped metrics for a specific content.
  • To manage content distributed in syndication. If you syndicate content to publish on other domains, you’ll want to make sure that the URL you prefer appears in search results.
  • To avoid spending time crawling duplicate pages. To optimize site crawling, it is preferable for Googlebot to crawl new (or updated) pages, rather than desktop and mobile versions of the same pages.

The benefits of using canonical should be clear by now: it directly tells Google which page is the most relevant (to us!) to index, it avoids wasted crawl budget because the crawler does not have to crawl duplicate pages, it simplifies tracking metrics related to a single product or topic.

General advice on canonical and canonization

For all canonical methods, Google urges adherence to some simple and intuitive best practices.

  • Do not use the robots.txt file for canonization.
  • Do not use the URL removals tool for canonical because it removes all versions of a URL from Search.
  • Do not specify different URLs as canonical for the same page using the same or different canonization techniques (e.g., do not specify one URL in a Sitemap and another URL for the same page using rel=”canonical”).
  • Do not use noindex to prevent the selection of a canonical page, because this statement is only to exclude the page from the index, not to handle the selection of a canonical page.
  • Specify a canonical page when using hreflang tags, designating a canonical page in the same language or the best substitute language if no canonical page exists for the same language.
  • Use the canonical URL for the link, rather than a duplicate URL, when setting links within the site. Always using the same URL for links helps Google understand what the preference is for the canonical URL.

What happens when Google sets another canonical URL

But what happens in the opposite case, when Google chooses a different URL?

One possible case is, for example, when there is identical content in different languages: if there is a canonical tag pointing to the English version of a page, but the user is in Germany, Google will show the German version of the page.

Such situations only impact, as we note, the URL shown in Google’s search results, and according to John Mueller, “if our systems choose a different URL as canonical, it will rank the same way in Search.”

So, “in the end, it just depends on your evaluations: if you have preferences on URLs, then point them out to the search engines without any possibility of error, but it is also okay to let Google choose.” Also, if it happens that the search engine chooses a different URL “this will not negatively affect the site,” Mueller reassures.

Canonical and page duplication

Duplicate content is a problem that can affect all sites, although it is generally more common for e-Commerce; it is good to remember, first of all, that for a search engine each unique URL represents a single page, and the same homepage could be reachable from different paths (with or without www, with HTTPS or without, with the addition of /index.php and so on) and become a single page for crawlers.

The phenomenon is even more evident when automatically adding tags, which create multiple paths and URLs to get to the same content and multiply URL parameters for searches, sorting, currency options and so on.

Duplicate URLs for eCommerce

The situation is likely to become critical especially for eCommerce sites, where a single product can potentially be reached from multiple different pages based on the users’ browsing path (whether they click from promotional banners on the home page, whether they choose sorting by prices, and so on). Without rel canonical all direct links to these pages are split rather than consolidated, thus generating duplicate content for crawlers.

Valid reasons for having similar or duplicate pages

But it is Google itself that makes it clear that there are valid reasons why a site can have different URLs pointing to the same page or duplicate or very similar pages pointing to different URLs.

Among the most common situations, the guide points out uses:

  • To support multiple device types (typically desktop and mobile):

https://example.com/news/koala-rampage

https://m.example.com/news/koala-rampage

https://amp.example.com/news/koala-rampage

  • To enable dynamic URLs for items such as search parameters or session IDs:

https://www.example.com/products?category=dresses&color=green

https://example.com/dresses/cocktail?gclid=ABCD

https://www.example.com/dresses/green/greendress.html

  • If the blog system automatically saves multiple URLs when the same post is entered in multiple sections:

https://blog.example.com/dresses/green-dresses-are-awesome/

https://blog.example.com/green-things/green-dresses-are-awesome/

  • If the server is configured to publish the same content for www/non-www and/or http/https variants:

http://example.com/green-dresses

https://example.com/green-dresses

http://www.example.com/green-dresses

  • If content proposed in a blog to be syndicated to other sites is copied in part or in full to these domains:

https://news.example.com/green-dresses-for-every-day-155672.html (post distributed in syndication)

https://blog.example.com/dresses/green-dresses-are-awesome/3245/ (original post)

Canonization problems: how to solve the most frequent ones

In some cases, however, incorrect canonization can be problematic for the management of our site: this happens, it should be reiterated, even if we explicitly designate a canonical page, because Google might still choose a different canonical page for various reasons, such as the quality of the content.

Before troubleshooting, however, the official guide invites us to check whether the canonical URL selected by Google makes more sense to our users coming from Google Search than the one we personally set up.

Having done this check, there are several reasons why the canonical URL selected differs from the canonical URL we would prefer to see in Search. The most common problems are:

  1. Language variants without localized annotations

If we have multiple websites that publish essentially the same localized content for different users around the world, we need to follow Google’s guidelines for localized sites: for example, if we have different sites for English-speaking users in the U.S., U.K., and Australia, respectively, but the content is the same, adding hreflang annotations to the pages can help bring up the right pages for users in different regions.

       2. Incorrect canonical elements

Some content management systems (CMS) or CMS plug-ins may incorrectly use canonical techniques to point to URLs on external Web sites. We must therefore check the HTML code with browser developer tools to see if this is the case: if our site indicates an unexpected canonical URL preference, perhaps due to incorrect rel=”canonical” usage or a 3xx redirect, we can contact the CMS provider to report the error.

        3. Incorrectly configured servers

Some incorrect hosting configurations can cause unexpected URL selection between domains. For example:

  • One server may be misconfigured to return example.com content in response to a request for a URL on other.example.
  • Two unrelated web servers may return identical soft 404 pages that Google fails to identify as error pages. If this is the case, we may contact the hosting provider for resolution.

         4. Malicious hacking

Some attacks on Web sites introduce code that returns an HTTP 3xx redirect or inserts an interdomain link rel=”canonical” annotation in the HTML <head> or HTTP header, which typically points to a URL that hosts malicious content or spam. In such cases, Google’s algorithms may choose the malicious or spam-containing URL instead of the URL of the compromised Web site.

         5. An emulated Web site

In rare situations, Google’s algorithms might select a URL from an external site that hosts our content without our permission – copycat sites, i.e., scammy imitators. If we are a victim of this situation-and thus another site is duplicating our content in violation of copyright law-we can contact the site host to request removal and request that Google remove the infringing page from search results by filing a request under the Digital Millennium Copyright Act.

  1. Content distributed in syndication

The canonical link element is not recommended for users who want to avoid duplication by syndication partners – distributed identically across a network of sites as a result of replication agreements that allow copy-pasting-since the pages are often very different. In these cases, the most effective solution is to allow partners to block content indexing, that is, to ask them to set noindex meta robots on the affected pages.

This is one of the most notable technical changes in the July updated version of Google’s official documentation, which previously suggested canonical instead; in fact, as some analysts have noted, the change to the recommendation is something of an admission that the search engine cannot effectively identify duplicate or near-duplicate content between different sites.

The SEO importance of rel canonical on URLs

From what has been written, one should understand the value of canonical and the importance of precisely using rel canonical tags on site URLs: on the one hand, this technique allows the page’s priority to be set for crawlers (shielding it from potential search engine penalty for duplicate content) and, on the other hand, ensures that users from Search arrive on the right page every time.

However, it is true that search engines try to determine which page among the available variants might be the canonical one, but the process does not always work optimally and therefore, from an SEO perspective, it is always best to set a canonical URL in doubtful cases.

Google and canonical URLs

Google’s tools also include specific tools for suggesting canonical URLs for pages on the site, as well as for checking whether the directions have been entered properly. For example, if you are concerned that you have not selected the best canonical URL for your content you can check the page address with Search Console’s URL Inspection tool, which will show the canonization selected by Google.

In any case, when faced with multiple copies of a page on the same site, Google’s systems have to make a choice and take into consideration various parameters to decree which page is canonical: one of the aspects that can make a difference, as revealed by John Mueller, is the length of the URL, because “a shorter, cleaner URL” is more likely to be interpreted as canonical than a longer, more complex address, perhaps with lots of slashes.

Google has removed the info command

The URL Control tool has also been updated over time and now shows every canonical selected by Google for a URL, not just those in the properties managed in the Search Console; at the same time, for the past few years Google has announced the retirement of “info:command“, the info command that was an alternative way to discover canonicals, but was considered little used and less complete than the new system.

What the info command was for

By typing the “info” command in the search engine bar before a domain, Google would show the list of canonical URLs for that specific site. In fact, until 2017, the info command showed not only canonical URLs, but also links to cache, similar pages, links, and other information, possibilities that have been removed precisely now more than five years ago.

Now, to discover canonical URLs one has to use the Search Console and the URL Control tool (and, thus, perform verified access), and there is no longer an option to verify the information in Google’s public search bar.

Iscriviti alla newsletter

Try SEOZoom

7 days for FREE

Discover now all the SEOZoom features!
TOP