[poppler] Branch 'xpdf303merge' - 12 commits - Makefile.am poppler/CairoOutputDev.cc poppler/CairoOutputDev.h poppler/Gfx.cc poppler/OutputDev.cc poppler/OutputDev.h poppler/TextOutputDev.cc poppler/TextOutputDev.h regtest/backends regtest/.gitignore utils/CMakeLists.txt utils/.gitignore utils/Makefile.am utils/pdfextract.1 utils/pdfextract.cc utils/pdfmerge.1
Carlos Garcia Campos
carlosgc at gnome.org
Sat Sep 24 02:39:14 PDT 2011
Excerpts from carlosgc's message of sáb sep 24 11:25:10 +0200 2011:
> New commits:
> commit f62c2f002c782d3a7887525f031d266aca6eb582
> Author: Carlos Garcia Campos <carlosgc at gnome.org>
> Date: Sat Sep 24 11:20:13 2011 +0200
>
> xpdf303: Parse ActualText in Gfx instead of output devices
>
> Remove beginMarkedContent and endMarkedcontent and add beginActualText
> and endActualText. ActualText is parsed in Gfx, that already handles the
> marked content stack, so that text output dev doesn't need to handle it
> too. The text string is passed to beginActualText(). This change is not
> an exact merge of xpdf code, I've tried to keep our implementation.
Albert, this commit gave me differences in pdftotext output for 2 of
my pdf files:
- opt-content/microtype.pdf: It fixes this document, we were
extracting pdf TeX instead of pdfTeX.
- opt-content/publikationen.Document.100193.pdf: it's difficult to
say whether it fixes or breaks this one. The output is weird in both
cases and it doesn't match acroread either.
I'm not sure if you have those documents, maybe with another name
(should we identify the pdfs by its md5sum too?), so let me know if
you want them.
It would be great if you could run the tests with your pdfs to see
whether there are more pdfs giving different output, and if it's
unacceptable for any of them revert or try to fix the commit.
Regards,
--
Carlos Garcia Campos
PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x523E6462
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20110924/26baffe4/attachment.pgp>
More information about the poppler
mailing list