In one of the earlier articles, we have discussed about the role of robots.txt file which we use to tell the search engine robots not to crawl certain sections of your website directories and files. The Robots Meta Tag works in conjunction with the robots.txt file to provide additional information to the search engine robots. While the robots.txt provides a more generic guideline to search engine bots and largely deal with blocking full directories, the robots meta tag is a page/file specific instruction. It tells the bot whether to index the current page or not.
A search engine bot enters your website with the objective to crawl all pages of your website and index them. It first checks for existence of a robots.txt file in your website home directory. It it finds one, it will make a note of all directories and files that you have specifically instructed in the robots.txt file to not crawl. Accordingly, the bot will skip the specified directories and files and proceed to crawl through the other (allowed) directories and files. During crawl, the bot will read each page for indexing, but before doing so, it first checks whether the page contains a robots meta tag. If the page does, the bot will follow the instruction in the tag. If the instruction in the tag advises it not to index the page, it will ignore the page and will not index it.
Let us take a look at the syntax of the robots meta tag. A typical robots meta tag will look like as shown below:
<html> <head> ... <meta name="robots" content="index, follow"> ... </head> <body> - your html code in this section - </body> </html>
If you add the above meta tag in the home page (index page) of your website, it tells search engine bots to index all pages of your website. Coupled with the above, if you also have a robots.txt file placed in your website home directory, search engine bots will ignore files in the debarred directories and crawl and index all pages under other directories, including your home directory.
This tag is not case-sensitive. So, the above meta tag can also be written as either of the below:
<meta name="robots" content="INDEX, FOLLOW"> <META NAME="robots" CONTENT="INDEX, FOLLOW"> <META NAME="robots" CONTENT="index, follow">
As you would have already noticed in the above illustration, like any other <META> tag, the robots meta tag too should be placed in the HEAD section of your HTML page. You can put it in every page of your website, or you may choose to put it only in certain specific pages, as per your requirement.
The tag has two attributes. The name attribute always takes the value robots implying that this directive is for search engine robots. Offcourse you can target specific bots by assigning a value of the specific bot instead of the more general robots. For instance, if you want to specify a directive only for google bot, the name attribute should be assigned a value googlebot. In such as case you can have multiple entries of the robots meta tag to target various bots. The content attribute can take a meaningful combination of the following values - index, noindex, follow, nofollow. These values are separated by comma. Valid combinations are -
a. <meta name="robots" content="index, follow"> b. <meta name="robots" content="index, nofollow"> c. <meta name="robots" content="noindex, follow"> d. <meta name="robots" content="noindex, nofollow">
Note that if there is no robots meta tag on a page, it essentially is equivalent to the default, which is -
<meta name="robots" content="index, follow">.
Thus, using the robots meta tag you can specify indexing policy for individual pages of your website. This is particularly useful when you need to stop the search engine bots from indexing duplicate pages so that you do not lose your search engine rank on account of duplicate content.
It may be noted that the instruction you provide in the robots meta tag is only an advise to the search engine robot. Whether the robot follows your advice or not is their prerogative.
Rajeev Kumar is the primary author of How2Lab. He is a B.Tech. from IIT Kanpur with several years of experience in IT education and Software development. He has taught a wide spectrum of people including fresh young talents, students of premier engineering colleges & management institutes, and IT professionals.
Rajeev has founded Computer Solutions & Web Services Worldwide. He has hands-on experience of building variety of websites and business applications, that include - SaaS based erp & e-commerce systems, and cloud deployed operations management software for health-care, manufacturing and other industries.