[poppler] poppler util pdftohtml

Jonathan Kew jfkthame at googlemail.com
Fri Sep 23 04:59:18 PDT 2011


On 23 Sep 2011, at 12:44, Peter A. Kerzum wrote:

> Actually consistent To-Unicode mapping should be a good compromise, as higher 
> level software can really segment text into regions of different languages 
> based solely on their alphabets and then detect and correct text flow for each 
> particular region
> 
> This way the example
> 
>   english WERBEH
> 
> should generaly work being decomposed into 2 regions with the latter reversed

But what is the order of those "2 regions"? You cannot tell unless you have some higher-level info... the purely visual presentation is inherently ambiguous.

JK



More information about the poppler mailing list