Googlebot’s list of IP addresses to verify the crawler’s scans
It’s a topic that can worry site owners, webmasters and SEOs: how to distinguish reliable crawler passes from accesses performed by spammers or other troublemakers claiming to be Googlebots? In addition to the two classic methods of verifying crawls, for some time now Google has decided to officially publish the list of IP addresses used by Googlebots for its views, thus allowing interested parties to have a definitive way to verify that crawler accesses to the site match those claimed.
The list of Googlebot IP addresses
To reveal this coveted list is the new page added to Google’s official documentation, in which it reads that you can now “check whether a web crawler that accesses the server is really a Google crawler, like Googlebot”, so as to prevent any spammers or other disturbers from accessing the site claiming to be Googlebot and doing damage with their scans.
Thanks to this intervention, in fact, it is now possible to “identify Googlebot based on the IP address, matching the IP address of the crawler to the list of Googlebot IP addresses”.
What is the list of both Googlebot and Google IP addresses
We can view the complete list of Googlebot IP addresses in a JSON file published at this link. It should be noted, however, that Google may update this list, and so it would be useful to check the file periodically.
We can also find out all the other Google crawlers by checking whether the IP address found on the server corresponds to one of those present in the complete list of Google IP addresses. Again, the list of addresses may vary over time.
How to verify Google crawlers
So far, the only way to do this was to reverse DNS the addresses that had accessed the server, but now there are two roads available, as the Google guide, again, says.
- Manual method: suitable for one-off searches, based on the use of command line tools and sufficient for most use cases.
- Automatic method: useful for large-scale searches, uses an automatic solution to compare the IP address of a crawler with the list of published Googlebot IP addresses.
The manual process
To perform a manual verification we must then use the command line tools, performing a reverse DNS search on the IP address of access from our logs, using the host command.
The second step is to check that the domain name is googlebot.com or google.com, followed by a Forward DNS lookup on the domain name recovered in the first step, using the host command on the recovered domain name. Finally, we must find that the one obtained is the same as the original access IP address from our logs.
Alongside this classic method, there is now the possibility of identifying Googlebot via the IP address by comparing the IP address of the crawler who accessed the site to the list of Googlebot IP addresses just made public, or extend the verification to all other Google crawlers through the other list. In addition, we also remember that there are third-party services, such as Cloudflare and other software, that help to manage these aspects.
A new opportunity against unwanted access
It is since 2007 that Google introduced for SEO and site owners the reverse control mechanism of DNS to verify that the self-proclaimed Googlebot crawler is actually who it says it is. Now it has decided to publish the list of IP addresses that Googlebot uses to scan each site, which then expands the opportunities to monitor the status of the scans.
It can happen that sites are slowed down and potentially even go offline due to fake bots that scan and site spidering, and clearly none (apart from exceptional cases) has an interest in preventing the real Google from scanning the site, because this can lead to problems of indexing and positioning in Google Search.
In practical terms, then, this list can help us to know for sure which bot is really Google’s and which is not, and thus make it easier to identify which are the possible deceptive bots that we must block.