[OpenFontLibrary] processing archive.org specimens

Schrijver eric at authoritism.net
Thu Jun 24 20:07:59 PDT 2010


Like, for example, http://www.archive.org/details/specimenofprinti00casl
(if you are in the book viewer, click view details)

‘View the book’ > all files: HTTP

you can download the biggest file which has all the pages as JP2’s.
for the books that have (c) Microsoft watermarks, you need the one that ends with _raw (they aren’t watermarked yet).

these pages suffer from JPEG 2000 compression.
You can use the GreyCStoration algorithm to remove these artefacts. It used to be available as a separate tool, but now it is part of G’Mic which comes with Gimp I believe.
http://gmic.sourceforge.net/gimp.shtml

now if you want to do automatic tracing, or hand tracing for that matter, you basically need just black and white—a bitmap.

however, a grayscale image has more information than a bitmap, because of the soft edges.

so to accurately convert to a bitmap you want to upsample the greyscale image first.

secondly, there is a lot of visual information, like the variances in coloring of the page itself, that you are not interested in and might influence the conversion. You can discard as much as possible of that by using a high pass filter beforehand.

Those two steps are combined in the mkbitmap program that is bundled with potrace
http://potrace.sourceforge.net/mkbitmap.html

You don’t have to use potrace with it.

You can first edit your resulting bitmap before tracing, or you can trace by hand. (In the end, any automatic tracing will have to be edited by hand too).


More information about the OpenFontLibrary mailing list