[Poppler-bugs] [Bug 55977] handling of rtl text inversion is too naive

Mon Jul 15 01:09:27 PDT 2013

https://bugs.freedesktop.org/show_bug.cgi?id=55977

--- Comment #37 from alex <alexbodn.groups at gmail.com> ---

hello albert,

(In reply to comment #36)
i didn't want to talk so much and give nothing.

here is a patch that also covers choosing the reordering algorithm in
pdftotext.

> (In reply to comment #35)
> > (In reply to comment #34)
> > 
> > hello albert,
> > 
> > > (In reply to comment #33)
> > > > (In reply to comment #32)
> > > > > (In reply to comment #31)
> > > > > > (In reply to comment #30)
> > > > > > > (In reply to comment #29)
> > > > > > > You are removing lots of code, e.g. the "// Note: This code treats numeric
> > > > > > > characters (European and", why? Can't we use that when ICU is not there?
> > > > > > 
> > > > > > it's not working properly, at all. rather try fribidi.
> > > > > 
> > > > > Trying fribidi is not the correct answer, we have already settled that ICU
> > > > > is better. What I am asking if we should have that code as better than
> > > > > nothing when the dependencies are not there. So in a world where i can't
> > > > > have ICU, does having that code you removed gives better results than no
> > > > > code at all?
> > > > sorry, it's not better than nothing. btw, the code before lastly patched
> > > > could be treated as "better than nothing", but not this one.
> > > 
> > > Hmmm, not sure i understand what you mean here, what is "the code before
> > > lastly patched"?
> > i've used the text output before, and it had a very rudimentary, but fairly
> > decent 
> > rtl approximation. but this got broken by a patch more recently applied,
> > that comes 
> > from xpdf. this situation made me work for a best of breed solution.
> > > 
> > > > > > > 
> > > > > > > > as for finding (a) better place(s) to store reordering_mode, i'd like to
> > > > > > > > consult you.
> > > > > > > 
> > > > > > > Sure, what's the question?
> > > > > > > 
> > > > > > the reordering mode helps taking us back to the logical (typing order) text
> > > > > > that would be visualized in the order shown by the doc. 
> > > > > > on windows systems, ReorderingNumbersSpecial would normally prevail, while
> > > > > > on pure unicode systems: ReorderingLikeDirect.
> 
> I never asked, but does this make sense? Does it mean that while reading a
> given text you get different RTL text on Windows than on Linux? Why would
> anyone want that? 
> 
though i can't say the reason, there is a fact that windows rtl reordering 
algorithm is different from the unicode one. that means, that in corner cases, 
the same text would show differently on these platforms.
the user is usually not aware of this difference, since (s)he would usually 
enter the text by wysiwyg tools that handle saving the text and rendering it.

so, if i've created some text with ms-word, i'd like to render it to pdf as 
word would show it. then, to further edit the same text on a linux machine, 
to see exactly the same i'd need to have a slightly different text.

indeed, it seems too much but it's free, and fully supported in icu.
> 
> > > What do you mean by "systems serving clients on various platforms"?
> > a server application may have clients that run on different os, with
> > different 
> > rtl reordering algorithms.
> 
> Sure, this is a nice theoretical scenario, do you really see poppler used in
> that scenario? I guess it could, I think my biggest brain problem at the
> moment is the one i made in the previous question, why would i want to see
> different text depending if i'm a windows user or not
> 
the reason to choose the algorithm to use is the same reason to choose to use 
no rtl reordering, and show the text in a dumb terminal window.

my own case is a web server application, with clients based on various
platforms.
> 
> > > > yet, maybe the os checking could be not enough, and the document could
> > > > contain additional hints for choosing this default.
> > > 
> > > With could do you mean "I know there is a PDF hint for that" or you mean
> > > "maybe there is a PDF hint for that"?
> > well, it's just maybe. the hints i'm talking about may come from some
> > headers 
> > mentioning the software used to create the pdf etc...
> 
> I don't think this is an option, it means guessing and playing like the
> browser and the "web developers" do, it's prone to break.

after rethinking, there's no obvious reason to check where the pdf was created, 
so i'd leave this to the explicit decision of the user/client.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130715/84e91732/attachment.html>