A Short History of Project Gutenberg and Distributed Proofreading

1 Conversation

Project Gutenberg is an effort to digitally reproduce books that are no longer under copyright. Started in 1971 by Michael Hart it now archives over 6000 etext.

Michael Hart wanted everyone to be able to access the books and documents that the Project archived. On that note, he chose to use ASCII text to save the documents. He also wanted to start with documents that people would want to read and have access to. Thus the first nine documents were all of American historical significance.

The first document to be added into the archive was the U.S. Declaration of Independence in December of 1971. A year later The United States' Bill of Rights was added. By the end of 1979 a those nine documents had been added to the archive.

It was at that time that the real test began. Through the 1980s the project worked to create the etext of the Bible. Both testaments were completed and uploaded in August of 1989.

During 1991, 12 more etexts were loaded including "Alice in Wonderland", "Paradise Lost", and "Aesop's Fables".

Each year during the '90s the number of books added grew exponentially, with over 350 new etexts added in 1999. Some of this increase can be attributed to scanners with optical character recognition (OCR) software that eliminated the need for countless hours of typing, but much of the increase has came from the growth in the number of volunteers to the Project.

Up to that time the process of proofreading and preparing an etext would have been undertaken by one person or a small group of like-minded individuals. In late 2000 a changed occured. A website came online for a different project, an attempt to use the massive numbers of people on the internet to proofread OCR documents. Called Project Gutenberg's Distributed Proofreaders, the group designed software allowing a person, using only a web browser, to download a single page of OCR text and it's matching page image, make changes to the text, and save it. After two such passes the pages are returned to a project manager who fixes any problems noted by the proofers and then submits the etext to Project Gutenberg.

As of March 2003, the Distributed Proofreader website has had over 1100 etexts posted to Project Gutenberg, has over 400 more in process, and has become the main source of new etexts for Project Gutenberg archive.

Michael Hart's original goal of 10,000 books on Project Gutenberg has not yet been achieved, but, with the continued exponental increase in the numbers of volunteers and the number of etexts being added, the goal should be surpassed by the end of 2004.


Bookmark on your Personal Space


Entry

A981182

Infinite Improbability Drive

Infinite Improbability Drive

Read a random Edited Entry


References

External Links

Not Panicking Ltd is not responsible for the content of external internet sites

Disclaimer

h2g2 is created by h2g2's users, who are members of the public. The views expressed are theirs and unless specifically stated are not those of the Not Panicking Ltd. Unlike Edited Entries, Entries have not been checked by an Editor. If you consider any Entry to be in breach of the site's House Rules, please register a complaint. For any other comments, please visit the Feedback page.

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more