The On-Line Books Page

HOW TO PUT BOOKS ON-LINE

A Guide for Beginners

(This guide is adapted from a guide written for the Celebration of Women Writers. We're also looking for interested transcribers for that project.)

We're looking for volunteers to help put books on-line. It's not difficult for one person to enter their favorite text, and once one person has done this, it becomes available to millions of Internet readers. Here's how to go about it.

Find a suitable text, and tell us about it

For The On-Line Books Page, we're looking for complete, English-language books in any subject. You can choose a book from this list of requests, or choose another book that you would like to transcribe.

Any text you choose must either not be copyrighted, or be approved for free on-line use by the copyright holder. In the United States, any work published before 1923 is no longer copyrighted. (In other countries, copyright usually lasts at least 50 years after the author's death, but laws vary.) Note that revised texts, translations, and other derivative works can get a new copyright from the date of their creation. Check the copyright information (usually on the back of the title page) to see what copyrights are claimed. For more details, see this page.

Once you have a text you want to contribute, and have verified that it can go on-line, tell us about it by sending e-mail to spok+books@cs.cmu.edu, giving the author, title, and your name. We'll annotate the in-progress listing, to indicate that you are interested in submitting a particular work, so that other people know that you're working on it.

If you'd rather not start out with a whole book, but would like to try something smaller first, you might want to try the Build-A-Book Project. We select the books, and send out chapters; you type or scan them in, proofread them, and send them back to us; and then we put them in our archive. See this page for more details.

Scan or type in your material

You then need to get your text into electronic form. You can do this by scanning, or just typing it in. If you have access to a scanner, you'll probably want to use it, since scanning is significantly faster than typing for most people. Flatbed scanners are available in many schools, libraries, and workplaces, and can be bought for as little as a few hundred dollars. Many scanners come with optical character recognition (OCR) software, which is quite accurate once you've adjusted your settings properly. (For best results, lay your book flat on the scanner, and close the top lid as much as possible. Then experiment with the brightness level until you find a level that gets all of the letters and little of the other stray marks found in books.) To give you some idea of time, it took me about 3 hours to scan in all of E. Nesbit's Five Children and It using a Silverscan II with OmniPage Professional software.

You can also type the work in if you prefer, or if a good scanner is not available. The time required depends on your typing speed, and generally is considerably slower than using a scanner. But it can be done by anyone with a computer, without any extra equipment.

If the text includes Greek or other non-Roman characters, you might find it useful to refer to an appropriate font archive.

Check it for accuracy

Errors can -- and inevitably do -- creep into a text, whether you've typed or scanned it in. So you'll want to proofread the electronic copy, or have someone else proofread it, before submitting it.

When academics or professional publishers prepare a research-quality text, they usually have it proofread at least twice, by different people, each carefully comparing the new text with the source text. If you're just planning on supplying the text informally to Internet readers, you don't have to be that rigorous. You should, however, go through the entire text at least once, with the original book handy to check consistency. With scanned works, it may be sufficient just to read the electronic text through at a reasonable speed, checking the book whenever something looks strange and making corrections as needed. Also run the text through a spelling checker for good measure. Errors in a typed text are often less obvious than those in a scanned text, so you may want to be more careful to compare the two texts as you go along. (The proofreading process can be a pleasant opportunity to read or re-read the book yourself.)

Occasionally, you (or your spelling checker) will come across something that looks like an error in the original source text. We recommend being very cautious about correcting any "errors" in the original book. Writers through history use many spellings and idioms that are not familiar to modern American readers or spell-checking programs. Text, particularly dialogue, can intentionally involve non-standard usage or mechanics. For editions meant for research, many scholars prefer that no changes whatsoever be made in the electronic version of a text, or at least that any changes be explicitly noted. If you want your electronic text to be used for scholarly research, or for preservation, Marc Demarest's essay The Responsible Preparation of Electronic Texts describes what many serious scholars look for in electronic versions of previously published books.

If you mean to prepare texts for a casual reader, you needn't be as picky. To us, corrections of obvious typographical or printing errors, or shifts in line breaks (particularly those that split a word) can be useful if done with care. There can also be good reasons to prepare an electronic version of a text that does not exactly match any previous print edition. Choose the policy that makes the most sense to you. In any case, it's a good idea to include some brief transcriber's notes at the start or end of the text, explaining what you've done and giving publication information on the source text(s) you used.

Publish it!

Once you have your text entered and proofread, "publishing" it on the Internet is easy.

If you already have space on a Web, Gopher, or FTP site, you can just place the work there, and tell us how to get to it. We can then include a link in the listings of the On-Line Books Page. It may also qualify for listing in one of our special exhibits.

Or, you can submit the text to one of many book archives on the Net. (There are a number of archives that are looking for all sorts of texts. We can help you find a suitable one.) Then we'll just link to the copy in those archives. To see examples of some of the texts and archives out there, see the list of archives.

For text formats, plain vanilla text or HTML (the hypertext markup language of the Web) are the formats of choice. Just about everybody can read and store plain ASCII text, so this is the most portable format. HTML lets you mark up the text in interesting ways-- such as adding accents and italics, or including hypertext links to related material-- but is not widely recognized outside the Web, and not supported by some text archives. Other formats are less useful, but it may be possible to convert some of them to plain text. Many academic probject now prepare texts in more detailed formats, like TEI (a SGML format). Since these formats are not so widely readable, many of them also provide translations into HTML or plain text.

That's how it works. Please write us if you have any questions, or if you would like to start working on a book.


Home -- About Us -- Get Involved! -- In Progress / Requested -- More Book Links

Books -- News -- Features -- Archives -- The Inside Story

Copyright 1995-1998 by Mary Mark Ockerbloom (mmbt@cs.cmu.edu) and John Mark Ockerbloom (spok+books@cs.cmu.edu)