A Short History of Project Gutenberg and Distributed Proofreading Content from the guide to life, the universe and everything

A Short History of Project Gutenberg and Distributed Proofreading

2 Conversations

A magnifying glass scanning a keyboard

Project Gutenberg is an effort to digitally reproduce books that are no longer under copyright. Started in 1971 by Michael Hart it now archives over 6000 etexts.

Michael Hart wanted everyone to be able to access the books and documents that the Project archived. On that note, he chose to use ASCII text to save the documents. He also wanted to start with documents that people would want to read and have access to. Thus the first nine documents were all of American historical significance.

The first document to be added into the archive was the US Declaration of Independence in December 1971. A year later the United States' Bill of Rights was added. By the end of 1979, those nine documents had been added to the archive.

It was at that time that the real test began. Through the 1980s the project worked to create the etext of the Bible. Both testaments were completed and uploaded in August 1989.

During 1991, 12 more etexts were loaded including Alice in Wonderland, Paradise Lost, and Aesop's Fables.

Each year during the '90s the number of books added grew exponentially, with over 350 new etexts added in 1999. Some of this increase can be attributed to scanners with optical character recognition (OCR) software that eliminated the need for countless hours of typing, but much of the increase has came from the growth in the number of volunteers to the Project.

Up to that time the process of proofreading and preparing an etext would have been undertaken by one person or a small group of like-minded individuals. In late 2000 a change occurred. A website came online for a different project, an attempt to use the massive numbers of people on the Internet to proofread OCR documents. Called Project Gutenberg's Distributed Proofreaders, the group designed software allowing a person, using only a web browser, to download a single page of OCR text and its matching page image, make changes to the text, and save it. After two such passes the pages are returned to a project manager who fixes any problems noted by the proofers and then submits the etext to Project Gutenberg.

As of March 2003, the distributed proofreader website has had over 1100 etexts posted to Project Gutenberg, has over 400 more in process, and has become the main source of new etexts for Project Gutenberg archive.

Michael Hart's original goal of 10,000 books on Project Gutenberg has not yet been achieved, but, with the continued exponential increase in the numbers of volunteers and the number of etexts being added, the goal should be surpassed by the end of 2004.

Bookmark on your Personal Space

Edited Entry


Infinite Improbability Drive

Infinite Improbability Drive

Read a random Edited Entry

Categorised In:


h2g2 Entries

External Links

Not Panicking Ltd is not responsible for the content of external internet sites

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more