Glossary

Robots.txt

The Robots.txt file is essentially a standardized text file that webmasters use to instruct search engine robots (hence the name) how to crawl and index their site.

The file resides in the root directory of a website and tells search engine robots which pages or sections of the site should be excluded from crawling and can be used to prevent robots from indexing certain pages or sections of the site, or to specify how often robots can visit the site. This process is known as Robot Exclusion Protocol or REP.

The Robots.txt file is particularly useful for preventing the indexing of unnecessary or sensitive content, such as administration pages or private data directories, but there are a few things to consider.

First, it is important to note that not all robots comply with the Robots.txt file guidelines, particularly those with malicious intent, so it should not be the only tool used to protect sensitive information.

In addition, improper use of the instructions could cause damage to the site’s visibility on Google – for example, mistakenly blocking indexing of relevant pages.