[poppler] Vertical or horizontal writing?
mpsuzuki at hiroshima-u.ac.jp
mpsuzuki at hiroshima-u.ac.jp
Sat Aug 14 16:56:14 PDT 2010
On Sat, 14 Aug 2010 21:18:56 +0100
Albert Astals Cid <aacid at kde.org> wrote:
>A Dissabte, 31 de juliol de 2010, mpsuzuki at hiroshima-u.ac.jp va escriure:
>> Sorry for a silence in a while. Checking the source,
>> I found following points.
>> 1) poppler-qt4 page object issue
>> On the other hand, getText() is device specific method,
>> only in TextOutputDev.cc, so changing getText() is
>> easier.
>>
>> 2) TextOutputDev::getText() issue
>> I think, raw-ordered text from MS Office's tricky vertical
>> text can be applicable for text search, but physically-
>> layouted text cannot be applicable for text search.
>WoW, that's a huge mail :D
Sorry, my post was too lengthy to find what is my proposal
to poppler maintainers.
>So my understanding is that "proper" CJK searching is a lot
>of work and you advocate for just exposing the raw text to
>the upper layers (users of poppler-qt4) so they can do the
>work if they need it?
Yes. I think exposing the raw text to the upper layers would
be the reasonable starting point for various non-left-to-right
scripts, because it is script-independent.
# about the insertion of the space (U+0020) between the words,
# still I've not decided what is good.
Also I've written a preliminary patch to modify TextPage::findText()
in TextOutputDev to support the device created in rawOrder mode
(if required, I will post here). Now I'm waiting for Cobra's feedback
to see if it works for his purpose.
Regards,
mpsuzuki
More information about the poppler
mailing list