[Poppler-bugs] [Bug 55977] handling of rtl text inversion is too naive

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sun Mar 24 13:35:08 PDT 2013


https://bugs.freedesktop.org/show_bug.cgi?id=55977

--- Comment #35 from alex <alexbodn.groups at gmail.com> ---
(In reply to comment #34)

hello albert,

> (In reply to comment #33)
> > (In reply to comment #32)
> > > (In reply to comment #31)
> > > > (In reply to comment #30)
> > > > > (In reply to comment #29)
> > > > > You are removing lots of code, e.g. the "// Note: This code treats numeric
> > > > > characters (European and", why? Can't we use that when ICU is not there?
> > > > 
> > > > it's not working properly, at all. rather try fribidi.
> > > 
> > > Trying fribidi is not the correct answer, we have already settled that ICU
> > > is better. What I am asking if we should have that code as better than
> > > nothing when the dependencies are not there. So in a world where i can't
> > > have ICU, does having that code you removed gives better results than no
> > > code at all?
> > sorry, it's not better than nothing. btw, the code before lastly patched
> > could be treated as "better than nothing", but not this one.
> 
> Hmmm, not sure i understand what you mean here, what is "the code before
> lastly patched"?
i've used the text output before, and it had a very rudimentary, but fairly
decent 
rtl approximation. but this got broken by a patch more recently applied, that
comes 
from xpdf. this situation made me work for a best of breed solution.
> 
> > > > > 
> > > > > > as for finding (a) better place(s) to store reordering_mode, i'd like to
> > > > > > consult you.
> > > > > 
> > > > > Sure, what's the question?
> > > > > 
> > > > the reordering mode helps taking us back to the logical (typing order) text
> > > > that would be visualized in the order shown by the doc. 
> > > > on windows systems, ReorderingNumbersSpecial would normally prevail, while
> > > > on pure unicode systems: ReorderingLikeDirect.
> > > > this assumption should guide the text conversion, according to the client
> > > > that is requesting it: either by the very system, in case of a stand alone
> > > > or local process application, or by the remote client's mode in case of a
> > > > server performing the conversion.
> > > > 
> > > > in our case, i'd define a reordering_mode at the TextOutputDev object level,
> > > > with the local system as default, that would be overrideable in the getText
> > > > and findText methods calls, especially when called from different systems.
> > > > should the reordering mode be selectable in pdftotext too?
> > > > 
> > > > should i then move the reordering mode enum to the TextOutputDev object?
> > > 
> > > I still don't see why we really need this enum. If you create an enum, it
> > > means you'll probably end up with someone exposing that as an option in the
> > > UI that lets users choose this, and i don't see any of the programs i use
> > > asking me which RTL handling method I want to use. Can't we just do *the
> > > right thing*?
> > the right thing should be default, based on the os of the system the dumped
> > text is requested on.
> > as i said, this will be determined at compilation time for stand alone, but
> > determined at run time on systems serving clients on various platforms.
> 
> What do you mean by "systems serving clients on various platforms"?
a server application may have clients that run on different os, with different 
rtl reordering algorithms.
> 
> > yet, maybe the os checking could be not enough, and the document could
> > contain additional hints for choosing this default.
> 
> With could do you mean "I know there is a PDF hint for that" or you mean
> "maybe there is a PDF hint for that"?
well, it's just maybe. the hints i'm talking about may come from some headers 
mentioning the software used to create the pdf etc...
> 
> > what i was asking you is your opinion on moving this enum higher in the
> > objects hierarchy, in order to further use it as an optional parameter when
> > creating the TextOutputDev object.
> 
> Probably TextOutputDev makes more sense, I have not much experience in RTL
> (as it shows ;-)) but i don't think it would make sense to have a TextPage
> of a document using one method and the next TextPage of the same document
> using a different method, no?
just right, and i'll perform the change :).

cheers,
alex
> 
> > > 
> > > Cheers,
> > >   Albert
> > > 
> > thanks,
> > alex

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130324/80616d59/attachment.html>


More information about the Poppler-bugs mailing list