[poppler] poppler util pdftohtml

Leonard Rosenthol lrosenth at adobe.com
Fri Sep 23 05:18:56 PDT 2011


And what is the primary reading order for any document?  That's also
important not just for semantic analysis but for things such as
text-to-speech or screen readers (aka accessibility).

Leonard

On 9/23/11 7:59 AM, "Jonathan Kew" <jfkthame at googlemail.com> wrote:

>On 23 Sep 2011, at 12:44, Peter A. Kerzum wrote:
>
>> Actually consistent To-Unicode mapping should be a good compromise, as
>>higher 
>> level software can really segment text into regions of different
>>languages 
>> based solely on their alphabets and then detect and correct text flow
>>for each 
>> particular region
>> 
>> This way the example
>> 
>>   english WERBEH
>> 
>> should generaly work being decomposed into 2 regions with the latter
>>reversed
>
>But what is the order of those "2 regions"? You cannot tell unless you
>have some higher-level info... the purely visual presentation is
>inherently ambiguous.
>
>JK
>
>_______________________________________________
>poppler mailing list
>poppler at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler



More information about the poppler mailing list