[poppler] Poppler 0.8.2 released
Ross Moore
ross at ics.mq.edu.au
Tue Apr 29 17:10:35 PDT 2008
Hello Albert,
On 30/04/2008, at 7:56 AM, Albert Astals Cid wrote:
> Available from
> http://poppler.freedesktop.org/poppler-0.8.2.tar.gz
> Testing, patches and bug reports welcome.
I joined this list recently, to see whether the Poppler versions
of the Xpdf utilities worked any differently from the non-Poppler
versions.
I'm working on a Mac, with MacOS X v10.4.11, and have successfully
built the utilities from this latest release.
All of pdfinfo, pdffonts, pdftohtml, pdftotext, pdftops, pdftoppm
and pdfimages work fine on a simple 1-page PDF that I created
with pdfTeX.
However, all of these fail with a "Bus error" on more
complicated multi-page PDFs, which you can find here:
http://www.maths.mq.edu.au/~ross/5019-e-cmap.pdf
http://www.maths.mq.edu.au/~ross/5019-e-mmap.pdf
I'm particularly interested in pdffonts, pdftohtml, pdftotext
as I want a free tool to be able to correctly extract the text
from documents such as the above PDFs.
They must extract the *complete* textual contents, using the
CMap font-encoding resources that these PDFs contain.
Non-poppler versions of the utilities; e.g.
rossmoor% pdftotext -v
pdftotext version 3.02
Copyright 1996-2007 Glyph & Cog, LLC
work to some extent, but certainly not completely.
(pdfimages works but the output is incomplete and useless
and pdftoppm also gives a Bus error .)
For example, this is part of the text extracted from 5019-e-mmap.pdf
using pdftotext (v3.02)
Figure 1: The Moebius strip. Consider the two-sheeted covering
\pi : \BbbS 2 \rightar P and the inverse image \pi - 1 (L)
of one of these circles.
It's pretty good, except that \rightarrow has been truncated
to 8 characters. There are many similar instances within the
full text. However, the Poppler version doesn't get far enough
through the document to see this --- at least not for me.
BTW, the text selection in Adobe Reader (versions 7.* & 8.*)
does extract the text more completely; so there is either
a bug or a design flaw within the pdftotext utility.
>
> Albert
Hope this helps,
and that you can help me.
Cheers,
Ross
------------------------------------------------------------------------
Ross Moore ross at maths.mq.edu.au
Mathematics Department office: E7A-419
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia 2109 fax: +61 (0)2 9850 8114
------------------------------------------------------------------------
More information about the poppler
mailing list