Internet Spiders and other creapy creatures on the net

Created Mar 19, 2004 | Updated Mar 20, 2004

Internet spider are programs running on a local computer. They do not really 'live' on the internet. They are just programs dedicated to access sites in attemp to find certain information. This can be key words or images for a search engine. That is, the good ones are, there are also spiders or webbots to harvest email addresses.

A decent spider only accesses a site every 30 seconds. There are rules to make these spiders not to overload a site. There are also 'noindex' and 'robots.txt' messages on a site to inform spiders not to access certain areas. The question remains whether or not the program will obey these signs.

These programs are known by several names:

Web Spider
Web Bot
Web Crawler

F82761?thread=280165&post=3515869
F19585?thread=217110&post=250333
F115116?thread=257344&post=3130112

The file 'robots.txt' should reside in your web root directory, that is the directory where a browser comes first.
To disallow any access this file should contain the following lines:

User-agent: *
Disallow: /

This text is copied and pasted to this page, you should also as robots do not have spelling checkers.

The alternative but also wise to be used is a HTML tag in the header section of your document.

If you do not want robots to harvest any links from a page nor index this page you can use this tag.

These do not guarantee anything, they just notify the harvesting program of the desired behavior.

Email addresses not to be seen by robots can be masked in various ways. Here on H2G2 the only possible way is using:

emailname<IDENTITY TYPE="#64"/>hostaddress.domain

This is no guarantee your email address will not be harvested. However you have made it clear it is not your intention to supply the address to spiders.

h2g2 The Hitchhiker's Guide to the Galaxy: Earth Edition

Find h2g2 Entries:

The
Hitchhikers Guide
To The Galaxy

Earth Edition

Internet Spiders and other creapy creatures on the net

Conversations About This Entry

Entry

Infinite Improbability Drive

Read a random Edited Entry

Written and Edited by

Disclaimer

Write an Entry

Help

About Us

Contact Us

Follow Us

Statistics

Other Stuff