[poppler] pdftohtml patch: restore old "raw" command-line option
Albert Astals Cid
aacid at kde.org
Wed Jan 7 05:20:25 PST 2009
A Diumenge 05 Octubre 2008, Warren Toomey va escriure:
> pdftohtml used to have a "raw" mode which has been removed. In "raw" mode,
> text from a PDF document is processed in the order that it occurs. However,
> the current version of pdftohtml reorders the text to be in increasing
> y-value, i.e. from the top of a page going down to the bottom.
>
> This text reordering plays merry havoc with multi-column pages, as the text
> from the columns becomes interleaved instead of remaining separate.
> The attached patch restores the -raw command-line option to pdftohtml. The
> program retains its current behaviour if the -raw option is not used, but
> reverts to the "text as it appears" behaviour with the -raw option enabled.
I've had a look at all the pdftohtml tarballs present at
http://sourceforge.net/project/showfiles.php?group_id=45839 and none of them
had the raw option enabled for the user to use. Are you sure this is ok to
enable?
Albert
>
> Cheers,
> Warren
More information about the poppler
mailing list