[poppler] Vertical or horizontal writing?

mpsuzuki at hiroshima-u.ac.jp mpsuzuki at hiroshima-u.ac.jp
Sat Aug 14 16:56:14 PDT 2010


On Sat, 14 Aug 2010 21:18:56 +0100
Albert Astals Cid <aacid at kde.org> wrote:

>A Dissabte, 31 de juliol de 2010, mpsuzuki at hiroshima-u.ac.jp va escriure:
>> Sorry for a silence in a while. Checking the source,
>> I found following points.

>> 1) poppler-qt4 page object issue

>> On the other hand, getText() is device specific method,
>> only in TextOutputDev.cc, so changing getText() is
>> easier.
>> 
>> 2) TextOutputDev::getText() issue
 
>> I think, raw-ordered text from MS Office's tricky vertical
>> text can be applicable for text search, but physically-
>> layouted text cannot be applicable for text search.
 
>WoW, that's a huge mail :D

Sorry, my post was too lengthy to find what is my proposal
to poppler maintainers.

>So my understanding is that "proper" CJK searching is a lot
>of work and you advocate for just exposing the raw text to
>the upper layers (users of poppler-qt4) so they can do the
>work if they need it?

Yes. I think exposing the raw text to the upper layers would
be the reasonable starting point for various non-left-to-right
scripts, because it is script-independent.

# about the insertion of the space (U+0020) between the words,
# still I've not decided what is good.

Also I've written a preliminary patch to modify TextPage::findText()
in TextOutputDev to support the device created in rawOrder mode
(if required, I will post here). Now I'm waiting for Cobra's feedback
to see if it works for his purpose.

Regards,
mpsuzuki


More information about the poppler mailing list