AMP pages: how to track user actions between domain and cache
The management of pages and user interactions in AMP pages served with cache has long been one of the critical points of this framework, one of the most perplexing factors in site owners. From the official blog of the project comes a practical guide to be able to solve the problem of tracking and, in particular, to monitor the user’s actions between the original pages and those served with AMP cache.
User state and AMP cache
The article by Ben Morss, Developer Advocate at Google, starts precisely from an aspect relevant to sites that use AMP and that often may appear problematic: if a user visits a page served on a AMP cache and then returns to the original domain, it may not be immediate to recognize that it is the same person.
A concrete example for this situation concerns eCommerce: wanting to reach users better, a site decides to create product pages with AMP and “when web spiders like Google and Bing discover these AMP pages, it stores them in AMP caches and shows them to users in an iframe that passes through a site like google.com or bing.com, publishing the page from an AMP cache, like cdn.ampproject.org or bing-amp.com“.
How to track the actions of users
So far everything in the norm: but what happens if a user discovers the page on a AMP cache, adds products to the cart and later in the day visits the site again by typing the normal domain? Will those products still be in your shopping cart or will you find it empty?
AMP cache, how to avoid problems
As we know, an immediate solution to this problem is already thanks to the Signed Exchange, the certification that allows you to show the URL origin of the site also for pages served by AMP cache, which has extended to various browsers, but is not yet supported by Firefox and Safari and is not yet widespread among sites.
AMP caches help “speed up your web pages while preserving user privacy“, but introduce a further level of complexity: “users can access your site not only on your domain, but also on the cache domain“.
Recognizing the users
And so, in the case of the previous example, the site can follow the standard web practice and keep track of the status of a user by releasing a cookie that contains a session ID: each time the user visits the pages from the original domain, “the server retrieves the cookie, reads the session ID and restores the user status from the data stored on the server associated with that ID“.
When the user visits the product page, sees the items and adds them to the cart by clicking on a button, the site sends the data to the server:
<form action-xhr=”/add-to-cart” method=”POST”>
If the user is of origin, explains Morss, “the request comes with a session cookie, with a request that is partly similar to this:
POST /add-to-cart HTTP/2.0
However, if “the user visits your site on a AMP cache, the one requested from your server may actually come from ampproject.org or bing-amp.com – a different domain!”. The browser associates the cookie to the regular domain, making it in fact a third-party cookie: most browsers will send them quietly together, “but users may have set their browsers to block third-party cookies and some browsers block them under certain circumstances”.
This would make “request similar to this, without cookie header:
POST /add-to-cart HTTP/2.0″.
How to fix the user tracking issue
In short, what is apparently a secondary factor is likely to be a small problem, because the browsers block more and more third-party cookies: there is, however, a solution, explained here in short and in more detail in the original article, essential to allow users to use the site smoothly through source caches and AMP.
We start from identifying users with a session cookie on the site, in the usual manner; we will do the same operation “in the cache and on a browser that accepts third-party cookies”. In other cases, “whenever a user takes an action that changes the state of the application, redirect it immediately to your origin, where you can access or create a cookie stored in your domain and then make the desired change“.
Redirecting users to the origin page
In other words, “if the user wants to add products to his shopping cart and you can’t read his cookies, don’t panic! Just redirect users to your source, where you can change their cart based on the content you prefer”.
This redirection is made possible by a specific AMP HTTP header called AMP-Redirect-To: if a AMP page makes a server request using <amp-form> and the server reply contains this header, AMP will redirect to the desired page.
How to use the HTTP AMP-Redirect-To header
The article describes the entire flow of this process.
- The user navigates the product page: if the user is on the origin, the page sets a session cookie where it is not already present.
- The user performs an action to modify what is in the cart.
- The browser sends the change data to the source via POST XHR.
- The source checks if the request did not contain session cookies and came from the cache
If this is true:
- The answer tells AMP to redirect to a URL on the source, including a query string describing the user’s change.
- When the source sees that query string, reads or creates the cookie, makes the change and redirects it back to a URL on the origin that doesn’t have that annoying query string.
If this is not true:
- We can simply retrieve the user’s session and make the changes. We will be either on the source or in the cache with a browser that allows third-party cookies.
Whether the user starts in the cache or the source, at the end of this process generated a session and its changes reflect on the server.
AMP and ID client, a defective compromise
Those familiar with AMP know that “there is a client ID, which allows analytics packages to trace the path of a user from the cache to the source”, which on the home page – that of the classic domain – is stored in a cookie and lasts a year.
Also in the cache the customer ID is stored in a cookie and, if it is not in the cookie, “it can be created with a call to the client ID API”.
So it might seem “tempting to use this to consistently identify a user” and in fact sites often use this solution which, “unfortunately has some flaws“.
In particular, the client ID “uniquely identifies a single user for certain trips between sources and caches, but not all, and its behavior between sites can be blocked by some browsers”.
You could use the AMP Linker to make more reliable “these cross-site trips, since it retains the customer ID as a query string parameter”. However, this means that the unique identifier of the user will be visible in its URL: Urls tend to access servers and, sometimes, bad actors discover log files. Worse still, the user could very well share their URL publicly, exposing their identifier to the world. In both cases, their session is vulnerable to hacking and this is why “we used POST instead of GET in our examples above”.