How Canonical Tags Work
Last updated: November 20, 2019
Canonicalisation is one of the most misunderstood parts of SEO. We believe there’s a general misconception into what a canonical is, the purpose of canonicals and how and when they should be implemented.
The purpose of this article is to clear up this confusion and to explain in plain English exactly how canonicals work.
What is a Canonical Tag?
Canonicalisation is a mechanism used to instruct search engines to merge the ranking signals from multiple URLs onto a single URL.
What’s the Purpose of Canonicals?
Canonicalisation is normally used to consolidate duplicate URLs when you have multiple versions of the same page. This normally occurs when the same content can be accessed from different URLs. A prime example is when a URL parameter is added onto the end of a URL. This often happens on filtered e-commerce category pages or landing pages with a tracking parameter.
Generally, you don’t want Google to index these URLs. It’s better for Google to apply the SEO signals from a group of identical pages onto a single URL. This is the canonical and is the only URL you want Google to index and rank.
This consolidates all the SEO strength on one page. If Google mistakenly indexes duplicate versions of the same page, then the ranking value of your preferential page could be diluted across the duplicate pages.
How To Implement Canonical Tags?
The most common approach is to use what’s called a canonical link tag. This goes in the <head> of a web page. The ‘href’ attribute references your chosen canonical. If Google crawled https://lucidseo.co.uk?source=email and discovered this canonical, Google should apply all the ranking signals to the URL referenced in the canonical.
<link rel="canonical" href="https://lucidseo.co.uk" />
You can send a rel=canonical via an HTTP header. With this approach, rather than place a tag in the HTML, the canonical is delivered within the server response. Whilst more complicated to implement, this approach can help in specific situations.
A 301 redirect is a canonicalisation signal. The destination of the 301 redirect is being declared as the canonical. For this method, the redirect source and destination of the 301 redirect must have very similar content. Otherwise, Google will treat the redirect as a soft 404.
Google will consider the pages you include and submit in your XML sitemap as canonicals.
As a scenario, say Google crawls 3 different URL that all have near-identical content. If one of those URLs is included in your XML sitemap then Google may choose that URL as the canonical. That being said, there is no guarantee that Google will agree so wouldn’t recommend relying on an XML sitemap for canonicalisation.
How to Check Which URL Google Choses as the Canonical?
Use Google’s cache: operator search on a page and look at the URL listed in the summary at the top. This is Google’s selected canonical.
The Search Console ‘Inspect URL’ tool provides lots of indexation information about a URL. This includes the canonical Google has chosen (Google-Selected Canonical) under Indexing.
How to Find Out if You Have a Canonicalisation Problem?
To check if Google is indexing the correct version of your page, the best place to check is the Coverage report in Search Console. This report details which URLs from your site have been indexed by Google.
You can use this report to see which pages from your site generate impressions on Google. You should dig into this data and understand what pages Google has indexed
The Search Console Coverage report separates pages by those that are, and are not submitted in your XML sitemap. The URLs included in the ‘Indexed, not submitted in sitemap’ section can be very useful to find canonicalisation problems.
Why Google Ignores Your Canonicals?
It’s important to remember that canonicalisation is just a hint to search engines. Google will process your canonical instructions, but won’t necessarily obey them. If there are conflicting signals, Google may choose a different URL as the canonical.
This is why it’s not wise to rely on canonicals to prevent pages from getting Indexed. Some factors that might make Google choose a different canonical:
- When the content between the canonicalisation group isn’t the same. If Google discovers unique content between the pages it may choose to index more than one URL
- If you link internally to the duplicate URL rather than the canonical URL, Google may misunderstand your preference.
The Importance of Consistency
When it comes to canonicalisation, consistency between signals is key. If your site sends conflicting canonical signals to Google, it’s more difficult for Google to understand and apply the correct Canonicalisation.
Ambiguity is one of the biggest causes of SEO problems. Consider this situation. You 301 redirect page A to page B. But page B includes a rel=’canonical’ that references page C. Page C might feature an HTTP header canonical that references page A.
You can quickly see how confusing this situation can become. Especially for an automated algorithm. Unfortunately, scenarios like above are not that uncommon. Especially on huge enterprise websites with hundreds of thousands of URLs that change on a daily basis.
Ensuring consistency between the signals you send to Google will help Google understand your intentions.
The Role of Information Architecture
If your information architecture conflicts with your canonicalisation setup then Google may choose a different canonical. Consider this situation. Page A is more prominent in your information architecture than page B, but you’ve canonicalised page B to page A. Google may choose to ignore your canonical and still rank page A because other signals ‘such as your internal linking’ suggest that page A has importance.
Where Does Canonicalisation Go Wrong?
Canonicalisation is a complex topic and that’s probably why there’s a lot of misleading ideas into how it should be used. Here are some common mistakes and misunderstandings.
- Google will always obey a rel=canonical
This is not true. Rel=canonical is just a ‘hint’ whereas other mechanisms such as a noindex are a ‘directive’. A directive overrides a hint.
- Canonicalisation can keep duplicate content out of the index
Google will decide for itself whether to follow your canonicalisation. If you canonicalise two URLs that don’t share the same content, Google will probably ignore your canonical.
- Blocking the Canonical URLs in Robots.txt
If Google can’t crawl your pages, they can’t process your canonicalisation signals. Don’t block the canonical or the duplicate pages from crawling
- Not Canonicalising to an Indexable URL
Don’t canonicalise to a non-indexable URL. Your canonical is no use if it points at a page that redirects or 404s
Final Canonicalisation Tips
- Don’t rely on canonicalisation to prevent content duplication
- Make sure other signals align with your canonicalisation
- Monitor the ‘Excluded’ report in Search Console
- Canonicalise to the absolute URLs exactly as they appear in the site
- Always link internally to your canonical URLs