Removing content from Google and the Web, the guide to getting it right
Can a piece of content be permanently removed from Google? And what is the process for removing the page and URL from the search engine? These are just some of the questions that we may face in the course of a site management activity, related to events that (unfortunately) are not sporadic and may happen in the life of a project, as a result, for example, of the presence of outdated content, changes to the structure or, for the worst cases, a desire to exercise the right to be forgotten and delete from Google or the need to delete hacked pages that are full of spam. It is Google itself that comes to our aid, with various tips and techniques for succeeding in the task of deleting a page from Google and its Index, because in the digital age, online visibility is crucial and “the Internet does not forget”- two reasons that should definitely convince us to avoid circulating information that is outdated, inaccurate, or even harmful to our site and our online reputation.
How to remove content from Google
First of all, let us define what it means to remove content from Google.
In this context, when we talk about “removal” we are referring to the action of removing a web page or a specific URL from Google’s search results: the process, which can be done through different tools, does not physically remove the page from the web, but makes it invisible to search engines, thus affecting SEO as well.
In fact, the one just described is the practice of requesting content de-indexing, which involves precisely the removal of the content from the Search Engine Index, but keeping the URL it belongs to active and functioning (thus, we will not be able to find the information by searching for it on Google, but we can read it if we access the content via direct URL or by browsing the website where it is published). Different is content deletion, which means that the content has been completely removed from the page where it was originally published: in this case, if we try to access the content via its original URL we would find a 404 error message or a redirect to another resource, indicating precisely that the content or page no longer exists. Over time, such content also from Google’s search results but, in some cases, may have been shared or published elsewhere, and so-even though the original page may no longer exist-the problem may still persist.
When and why to request content removal
The choice of a tool or process depends on several factors, and in particular on the type of content-whether it is on our site or somewhere else-and the type of removal, because there is a difference between requesting information from the entire Web or just from Google Search.
In general, there are several situations in which it may be necessary to seek removal of content from Google and the Web: for example, if we have sensitive personal information posted online without our consent, articles with obvious copyright violations, publication of outdated, inaccurate or inappropriate content that could damage our reputation or that of our company, or even presence of duplicate content that could harm SEO or need to have time to make appropriate corrective changes to the text (this is the case with temporary removals).
As StatusLabs reminds us, there is a lot of content online: according to recent estimates, about 3 million news articles, 7 million blogs, and 500 million tweets (or whatever you call messages on X today) are published every day. With so much content uploaded, it is “inevitable that some of it will not paint us in the best light or report information we do not want shared online.”
From a practical standpoint, it is generally possible to request the removal of any content posted online, but this does not necessarily mean that removal attempts are successful; in detail, the types of online content eligible for removal include news articles and features, blog posts, reviews, social media posts and comments, forum discussions, copyrighted content, personal data (such as photos or contact information)
In any case, it is not a process to be taken lightly and, if done superficially or poorly, could even lead to the disappearance of pages relevant to our site, which is why it is important to carefully consider the SEO implications and make sure to follow the correct procedures to avoid future problems.
Basic techniques for removing online content
The experts at reputationX have compiled a quick list of tips and techniques for achieving “total” content removal, useful for removing personal information from the Internet, cleaning up a company’s online profile or even making negative or inaccurate search results invisible or less visible, calling this activity “a key tool in the online reputation management and brand monitoring toolbox.”
The first avenue is very straightforward: ask the webmaster, publisher or author of the content to remove the offending page or edit the text that we believe is harmful to us. Contact can be made by phone or email (usually human contact works best, but case histories are mixed), and sometimes it may serve to leverage the “altruistic spirit” of the person we have on the other end or even grant some kind of “incentive” (from payment to charitable donations, for example). Should we be faced with a wall – and thus a refusal to delete the content – we could at least try to convince the person to put the page in noindex to make it disappear from the search results.
The second avenue is a practical one: that is, unable to get the content removed, we can work to make it less visible through a proactive strategy of promoting other “positive” content about the brand or person whose interests we are curating.
The third avenue, the one we focus on most closely, concerns how to request content removal from Google. In some countries, the company is experimenting with a special tool that allows each individual user to manage information about themselves – it is precisely called “Results about you” – by immediately reporting the presence of privacy-violating details (such as email, phone number, or home address) in a search result by selecting “Remove result” from the menu icon next to the result itself. Again, as clarified earlier, Google cannot remove information from a website, but once removal is achieved, Google usually removes the URL from the list from search results within a few days-and if you’re not on Google, you’re practically invisible, as we often say!
Guide to removing content from Google
Guiding us in depth through the different techniques for removing content from Google Search are more specifically two videos posted on Google’s YouTube channel: in the first, our host is Daniel Waisberg, Search Advocate for the American company, starring in a specific appointment with the Search Central Lightning Talks (formerly Webmaster Conference Lightning Talks), the virtual speeches with which various Googlers give useful advice to site owners and managers, professional SEOs and developers, while the second was run by John Mueller and is part of the #AskGooglebot series.
In both cases, the Googlers give us an overview of the avenues available to us, explaining how the Search Console Removal Tool works, how to request permanent removals, and other useful situations.
Which types of content it is possible to remove and which not
First, it is good to give a brief overview of what we can and cannot remove.
As Waisberg explains, it is possible to remove content from our website quickly and easily, and it is also possible to remove it from Google or from the entire web. In addition, we can ask Google to remove content from its properties (such as YouTube, Blogger, and others) if it violates Mountain View’s strict policies.
Moreover, we may ask for the deletion of content that violates Google’s policies even if they are hosted on third-party websites, but this will only lead to the removal from search engine results.
This mirror – defined decision tree – is useful to understand what kind of possibilities we have concerning the removal request.
The decision-making process of the removal request
The first question we have to ask ourselves is the ownership and control of the content: if it belongs to us, in fact, the removal is much easier as we only have to use the right tools.
Next, we have to ask ourselves if we want to remove (our) content permanently or temporarily: if we need quick and immediate action, you can use the Search Console Temporary Removals tool, that will give us 6 months to work on a different or definitive solution.
If we have no control over the content, instead, the process is more complicated, but not impossible in the face of obvious cases of violations of Google’s rules and policies (and not only). For content hosted on a Google service such as Youtube, Blogger or Ads there are specific troubleshooter tools, while in case of problems with pages of other websites “the best option is to contact the owner of the site”, although in some cases it is still possible to ask (and get) the removal at least from Google Search.
How to perform the permanent removal of the content from Google
Then there are cases when it becomes necessary to permanently remove pages on a site over which we have control from Google Search: in reality, Daniel Waisberg explains, there is no special tool nor can one send a request, but one must implement one or more specific actions directly on the site.
Specifically, we have three ways:
- Remove or refresh the current page content and return an HTTP 404 or 410 status code whenever the content is requested (resource not found and resource unavailable, respectively). Non-HTML files, such as PDFs, must be completely removed from the server.
- protect the content with a password so that it is not accessible by Googlebot or other bots on the web;
- use a noindex meta tag to indicate that the page should not be indexed. Although, a less secure system than the others.
Making a simple redirect 301 of the old page with an answer or using a robots.txt directive may not be enough to remove the old content from the search results on Google.
As it is a process of permanently removing content from the Research, Waisberg suggests that you be careful and carefully evaluate the various steps, which may not be reversible.
How to request Google a temporary removal
In other cases, we may need to simply ask for a temporary removal of content from Google: in practice, “hide” for a limited period of time the page from the search results to correct any problems or errors present in view of a resolution (in a positive sense, with the return of full visibility of the page, or with a total removal).
Two request options are available, which serve to block the URL from the search results on Google for about six months or delete only the copy of the page in the cache. If we delete the cache, the page description snippet will also be removed, which will remain empty until the next crawling: this option can serve when we need to quickly remove sensitive information from a page and want to update the results in Google Search without removing the page itself from the search, but you have to intervene before the new scan of Googlebot or we will simply find the same information in snippets.
Waisberg points out that the Removal tool should not be used, however, in some cases:
- manual action on the page
- hacked website
- transfer of the site to another domain
- removal of pages that are not useful or up to date anymore
- report to Google of the version of a page that we want to be indexed,
all cases requiring other types of intervention.
Request for content of other people’s property
As we were saying, it is more complex (but not impossible) the way to request the removal of content that we do not possess or control.
If that page “lives on a service owned by Google“, you can use the appropriate troubleshooting tools that allow you to “report the content you want to remove from Google services”, providing “as much information as possible to help us investigate” and possibly remove the information.
Among the information that Google might take away from the search results are:
- copyright infringement and other legal factors;
- content concerning sexual abuse of minors;
- non-consensual, explicit or intimate personal images;
- involuntary fake pornography;
- financial, medical and personal identification information
If the content is present on a site that we do not control and is not hosted on Google services, our “best option is to contact the owner of the website to request its removal”, by sending a message via the contact link that is usually available on websites or by searching the email of the website owner through a Google Search search that includes the whois example.com.
Finding the best approach
To sum up, if we want to remove content from Google or the web in general, we have to ask ourselves two important questions: “Who controls the content?” and “do I want to remove the content permanently or only for a period of time?“.
The answers will make us understand what is the best approach to reach our goal and solve the problem.
How to remove a page from the Google Index
A slightly different approach to the problem came recently from John Mueller, who devoted an episode of #AskGooglebot specifically to methods for removing a page from the Google Index, even on a tight deadline.
Specifically, the original question was about the best way to remove old content from the Index, and Mueller gets right into the nitty-gritty by saying that the easiest (but also most time-consuming) way is to simply remove the resource from the site and server in a physical way, and wait for the page to thus automatically exit the Google Systems over time as well. “There is nothing additional you have to do in these cases,” says the Googler, “just make sure that the server returns the correct HTTP status code for a removed page, and most Web sites do this automatically”; in addition, we must also remove references to the page on our Web site, including links and sitemap files. In terms of waiting time, this process can take up to several weeks before Google’s systems update the information and remove the result from the index and SERPs.
On the other hand, when we need quicker action, the recommended solution to hide a page faster from Google is to use the aforementioned Urgent URL Removal Tools in Search Console, which “cover most situations and cases where you may need to remove old pages from your website,” concludes Mueller (who had already explained some techniques for hiding a site from Search and users).
The Removal Tool in Google Search Console
“We are happy to announce a new version of the Removals Tool” in Google Search Console: it is with these words in 2020 Google’s blog greeted the introduction of the renewed tool available to webmasters and SEOs using the Google-branded suite.
The tool works obviously intervenes only on owned and verified sites and serves to:
- Temporarily hide site URLs from search results, temporarily blocking the display of pages from our site.
- See requests for removal of outdated content.
- View URLs blocked by SafeSearch and therefore flagged as pages with adult content.
On the other hand, it is not the right tool to request the permanent removal of our pages or to remove personal information about Google that appears on sites over which we have no control.
How temporary removal works and how long it lasts
Google offers the ability to temporarily and temporarily hide a site’s pages from Search results. This is a useful opportunity in certain circumstances, such as when we want to quickly remove a URL from Google Search (and then find a permanent solution) or when-after removing sensitive content from a page-we intend to remove it from search results without removing the entire page.
We can choose between two types of action: hide the URL from Google Search results and delete the cached copy of the page, or delete the cached page and description snippet until the page is crawled again.
The temporary removal request can be a quick first step to then proceed to permanently remove a page, giving us a long enough time period to find a solution one way (new content and page visible, or locked with password) or the other (permanent removal). The cache cleanup request, on the other hand, can serve to remove sensitive information from a page and update the preview snippets shown by Google (without removing the page itself from search results).
The URL to be removed may relate to a web page or an image, and a successful request takes about six months, a period that should be sufficient to investigate a permanent solution to the problem.
It should be made clear that temporary removal does not remove content from the Internet and is valid only for Google (not for other search engines), but more importantly it does not really remove the URL from the Google Index and only serves to hide it from search results for a six-month period. To remove the URL permanently, another method must be implemented, such as putting the page in 404, using a password, prohibiting indexing with noindex meta tags, and so on.
Important notes on the tool
In addition, the tool guidelines make it clear that blocking a URL does not prevent Google from crawling the page or, therefore, creating a new cache copy before the page is removed or password protected. At the end of the period, the page can reappear in search results. At the same time, if the URL cannot be reached, Google assumes that the page has been removed and the blocking request lapses: all pages identified later will be considered as new pages and may appear in Search results.
Again, the documentation specifies that the Removal Tool is not to be (nor can it be) used for purposes such as:
- Blocking a page on a site that we do not control.
- Permanently removing a URL from Search. Use of the Removal Tool is only one step in this process to permanently delete a URL and, by itself, is not enough.
- Deleting content from the Web. This tool removes content only from Google Search.
- Remove results from other search engines. As before, the tool deletes content only from Google Search.
- Delete outdated or no longer needed content, such as to delete outdated pages that return a 404 error. If we have recently changed the site and so there are some obsolete URLs in the index, Google crawlers will detect them on the next URL crawl; the corresponding pages will naturally disappear from search results, so there is no need to request an urgent update.
- Fix crawling errors reported in the Search Console account. The blocking tool prevents URLs from appearing in Google search results, not in Search Console accounts; there is no need to manually remove URLs from the report because they will disappear over time.
- “Start from scratch” with the site. If we are concerned that manual action may be taken on the site or if we wish to start over after purchasing someone else’s domain, it is best to submit a reconsideration request to notify Google of our concerns and report what has changed.
- Taking the site “offline” following a compromise. If the site has been hacked and we wish to remove invalid URLs that have been indexed, we should use the URL blocking tool to block any new URLs created by the hacker, e.g., http://www.example.com/buy-cheap-cialis-skq3w598.html. However, it is not advisable to block the entire site or URLs for which we may require indexing, but it is better to delete the URLs created by the hacker and leave it up to Google to crawl the site again.
- Have the correct “version” of the site indexed. Many sites offer the same files or HTML content through different URLs. If this is our case and we do not want duplicates to show up in search results, we should follow one of the recommended canonicalization methods. On the other hand, we should not use the URL blocking tool to block URLs that we do not want to be displayed in search results, because this will not cause the preferred version of a page to be retained, but may instead result in the removal of all versions (http/https and www/not www) of a URL.
Managing outdated content
Through the tool, it is then possible to read the history of requests made through GSC’s “Remove Obsolete Content” tool over the past six months, which allows ordinary users (even non-site owners) to report when Google Search shows information that is no longer actually present on a site’s pages. This section allows us, in a nutshell, to find out if there are pages flagged as “obsolete” by any Search user because they no longer contain the information shown by Google; however, the reporting does not force Google to remove such pages, so this tool has an informational function and should not worry us too much, Waisberg says.
From this screen we can then find out the URLs affected, the date and type of request (cache copy removal or removal of the page itself), and most importantly the status of the request.
There are two types of requests: obsolete cache removal indicates that the page still exists, but its content has changed or has been partially removed; Google cleans up the preview snippets in SERP until the next crawl and the page will no longer be shown among the results for queries related to the removed content. Removal of obsolete pages, on the other hand, is used when the page no longer exists and a user has requested its removal from Google.
Pages flagged as adult content
The third feature is related to SafeSearch filtering in Search, which is used to prevent pages with adult or sexually explicit content from appearing among search results; users can report content that should be filtered according to these criteria and, if Google’s verification confirms, the relevant URLs are tagged as adult content and hidden from searches with active filtering.
This feature shows a history of site pages flagged by Google users as adult content through the appropriate suggestions and which Google has, so to speak, confirmed, marking our URLs as “adult content” and hiding them from SERPs for users who have the filter active.
From this screen we can get information about the offending pages and the status of Google’s verification, thus simplifying operations in case of problems with adult content (real or alleged).