[poppler] Combine bounding box data and tiff to create pdf?

Leonard Rosenthol lrosenth at adobe.com
Wed May 7 18:49:36 PDT 2014


If you already have PDFs - why are you also storing images?   PDF is an open international standard (ISO 32000) that offers not only a richer content model (including text, vector and raster) but also metadata, marginalia and more using modern compression methods.  TIFF on the other hand is a proprietary standard (that hasn't been updated since 1992) that only handles raster images & metadata.

Leonard

From: Mark Ehle <markehle at gmail.com<mailto:markehle at gmail.com>>
Date: Wednesday, May 7, 2014 at 8:27 PM
To: "poppler at lists.freedesktop.org<mailto:poppler at lists.freedesktop.org>" <poppler at lists.freedesktop.org<mailto:poppler at lists.freedesktop.org>>
Subject: [poppler] Combine bounding box data and tiff to create pdf?

Folks -

I am using pdtotxt to extract text from pdf file in a digital newspaper archive I am creating for a local public library. So far, it's working great. But - I am using up a far amount of disk space and would like to figure out a way to create an OCR'd pdf from an image and the bounding box data. That way I would not have to store the PDF files as well as the images. Is there a way to do that?

Thanks -

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20140508/5ea53057/attachment.html>


More information about the poppler mailing list