Tuesday, February 9, 2016

What is robot.txt

The robots exclusion standard also known as robots exclusion protocol or simply robots.txt, is standard used by website to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about the areas of the website should not be processed or scanned. 

The standard is different from, but can be used with Sitemaps, a robot inclusion standard for websites. 

Each web domain should have its own robots.txt file .

Below are some samples 

 User-agent: *
Allow:

 User-agent: *
Disallow: /

  User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

  User-agent: BadBot # replace 'BadBot' with the actual user-agent of the bot
    User-agent: Googlebot
Disallow: /private/

references:

No comments:

Post a Comment