Robots.txt File and Meta Robots Tag
Robots.txt File is used to give instructions to search engine crawlers (also known as bots, spiders) to allow or disallow some part of website for crawling, This is also known as The Robots Exclusive Protocol. Whenever a search engine bot visits a website it checks for robots.txt file and find instruction given in this file. So if it finds instructions not to crawl website it goes on.
Sytanx for Robots.Txt file:
*In this syntax User-agent specifies name of particular robots or * (for all robots) and Disallow gives the path disallowed for crawling (in our example Root path is given so any robot will not crawl any page of website).
All robots can crawl your website, no folder has been disallowed.
All robots will not crawl content under “cgi-bin” folder.
GoogleBot will not index content under “images” folder.
All bots has been disallowed to visit any part of website
In This example we are allowing only GoogleBot to crawl our website and disallowing all other bots.
Steps for Creating a robots.txt file
- Open any text editor (notepad)
- Write your robots instruction
- Save file as “robots.txt”
- upload robots.txt file to root of your website
Meta Robots Tag
Meta Robots Tag is used to define page specific instructions for search engine robots to index or not to index content of page. This tag should be placed under HEAD tag.
Like other meta tags Robots Meta tag have same attributes but with different values:
Syntax & Examples:
<meta name=”Robots” content=”INDEX, NOFOLLOW”>
<meta name=”ROBOTS” content=”NOINDEX, FOLLOW”>
<meta name=”ROBOTS” content=”NOINDEX, NOFOLLOW”>
Note* If no meta robots tag is given then default content will INDEX,FOLLOW.