[poppler] poppler util pdftohtml
Jonathan Kew
jfkthame at googlemail.com
Fri Sep 23 04:59:18 PDT 2011
On 23 Sep 2011, at 12:44, Peter A. Kerzum wrote:
> Actually consistent To-Unicode mapping should be a good compromise, as higher
> level software can really segment text into regions of different languages
> based solely on their alphabets and then detect and correct text flow for each
> particular region
>
> This way the example
>
> english WERBEH
>
> should generaly work being decomposed into 2 regions with the latter reversed
But what is the order of those "2 regions"? You cannot tell unless you have some higher-level info... the purely visual presentation is inherently ambiguous.
JK
More information about the poppler
mailing list