That moment…

That moment…

…when you want to bang your head against the wall…

I have a copy of a book called Everybody’s Scrap-book of Curious Facts which has been in the family for around 130 years. I remember avidly reading it as a kid, and, even then, it was faded and tattered, with pages fixed together with sticky tape.

Now, it really is falling apart and difficult to read, so I decided to undertake the task of creating a machine-readable copy.

Initially, I started typing it into Word, but gave up last year after 50-odd pages as it was far too time-consuming. Then I recently discovered Boxoft Free OCR, which seems to do a decent job of text extraction, so I’ve been scanning and OCRing another 100 pages, making the necessary (and numerous) corrections, and sticking it into WordPress.

Then, today, while searching to find whether or not “leepreehauns” is really an old form of “leprechauns“, I got hits on a couple of pages that already have digitized versions of the book! AARRGH! One is a pure text OCR page, and the other image-only. The text page has loads of corrections needed, of course, but rather than do the scan-OCR-correct process myself, I reckon I can use the text page to recreate decent copy, and maybe OCR from the image page if necessary. Also, the two pages have allowed me to include four pages which are missing from my copy. All good, but I wish I’d found this earlier!

Ah, bugger! Knew there’d be a catch: it seems that the online edition differs from mine, so there are sections missing and more cross-referencing will be needed…

Comments are closed.