Internet Spiders and other creapy creatures on the net

0 Conversations

Internet spider are programs running on a local computer. They do not really 'live' on the internet. They are just programs dedicated to access sites in attemp to find certain information. This can be key words or images for a search engine. That is, the good ones are, there are also spiders or webbots to harvest email addresses.

A decent spider only accesses a site every 30 seconds. There are rules to make these spiders not to overload a site. There are also 'noindex' and 'robots.txt' messages on a site to inform spiders not to access certain areas. The question remains whether or not the program will obey these signs.

These programs are known by several names:

  • Web Spider

  • Web Bot

  • Web Crawler

The file 'robots.txt' should reside in your web root directory, that is the directory where a browser comes first.
To disallow any access this file should contain the following lines:

User-agent: *
Disallow: /

This text is copied and pasted to this page, you should also as robots do not have spelling checkers.

The alternative but also wise to be used is a HTML tag in the header section of your document.

If you do not want robots to harvest any links from a page nor index this page you can use this tag.


These do not guarantee anything, they just notify the harvesting program of the desired behavior.

Email addresses not to be seen by robots can be masked in various ways. Here on H2G2 the only possible way is using:

emailname<IDENTITY TYPE="#64"/>hostaddress.domain

This is no guarantee your email address will not be harvested. However you have made it clear it is not your intention to supply the address to spiders.

Bookmark on your Personal Space

Conversations About This Entry

There are no Conversations for this Entry



Infinite Improbability Drive

Infinite Improbability Drive

Read a random Edited Entry


h2g2 is created by h2g2's users, who are members of the public. The views expressed are theirs and unless specifically stated are not those of the Not Panicking Ltd. Unlike Edited Entries, Entries have not been checked by an Editor. If you consider any Entry to be in breach of the site's House Rules, please register a complaint. For any other comments, please visit the Feedback page.

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more