SEO Training Tools

An SEO Guide to the robots.txt File

An often overlooked element of search engine optimization is the configuration of a site’s robots.txt file. A properly configured and structured file can be a real benefit to the SEO performance of a website by ensuring search engine crawlers are focused on the right content you want indexed.

Mason jars of craft beer in a refrigerator
The robots.txt file can be found by searching your website's URL and adding /robots.txt at the end. For example,

What is the robots.txt file?

Bots are software applications that explore the web automatically, for a variety of reasons, but most commonly to gather information about websites. When they find a website, the first file that they search for is the robots.txt file. This is a protocol with the rules of bot behaviour on that particular site. In most cases the bots will engage with the website in accordance with the instructions in the robots.txt file.

What is the robots.txt file used for?

There are many reasons that one may use a robots.txt file. We are going to focus on the function of directing search engine crawlers which URLs they are allowed to access and how. The two basic elements of a robots.txt file are 'allowed' and 'disallowed' commands for bots.

Robots.txt can target web crawlers in a variety of configurations, from preventing all bots from accessing all content, to only controlling access to specific directories or specific bots. In a way, it is about providing bots with guidance on how to best work through your website.

Giving these specific directions helps bots complete their tasks in a way that is beneficial to your website. For example, you may allow the Bing bot to access all directories, but only let it do so at speed that doesn’t impact website performance for your visitors. The robots.txt file can accomplish this with a crawl delay, ensuring that content is dealt with over a longer period of time.

Robots.txt Files are Guides, not the Law

The robots.txt file is not the law, it is a recommended set of instructions for bots, many of them will follow these directions implicitly, others may ignore them altogether. Because of this, if there are pages on your site that you do not want indexed you should place a no-index tag on them and not rely on the robots.txt file to keep them from being indexed.

Tile mosaic of overhead tram car in Portland Oregon
Your robots.txt file is not law. This means that nefarious bots may ignore your instructions and crawl through your pages anyway.

How can the robots.txt file impact organic search?

One key way that robots.txt files are used, is to provide guidance to search engine crawlers so that they can efficiently crawl and index your site. To help these web bots do so in an effective way, your robots.txt file needs to clearly state which files, folders, or pages should not be crawled or included in the bot's crawl.

This is helpful, as it can prevent internal content from ending up on search engines like Google, Yahoo, and Bing. Additionally, it can keep your site's purpose at the forefront of organic search, preventing any irrelevant material from being included.

How Does the robots.txt Help SEO?

A properly structured robots.txt file can have a positive impact on your organic search performance by directing search engine crawlers to the content that is important. For example, the Googlebot has a crawl budget for each website, so it is only going to spend that budget for your website. Directing it to specific content and not allowing it to crawl low value or no value content ensures that you are using your site budget effectively.

Some common types of low value pages you should block include:

  1. paginated pages
  2. category pages if they are just links to the items in the category
  3. search results pages
  4. css and javascript directories

It is important to keep in mind that robots.txt files, if coded incorrectly, could lead to large amounts of your content being excluded from being indexed. It is important to test thoroughly any allow/disallow statements added to the file. A handy tool for this is Robots.txt Optimization Tool.

When creating or editing your robots.txt file you should test it thoroughly and error on the side of allowing content to be crawled versus being blocked.