[poppler] a plan to extend poppler-glib to access the raw text

mpsuzuki at hiroshima-u.ac.jp mpsuzuki at hiroshima-u.ac.jp
Tue Sep 7 00:21:13 PDT 2010


On Tue, 07 Sep 2010 09:04:13 +0200
carlosgc <carlosgc at gnome.org> wrote:
>Excerpts from mpsuzuki's message of mar sep 07 08:42:31 +0200 2010:
>> It dumps the strings collected by TextSelectionVisitor
>> object. TextSelectionVisitor define 3 methods to eat the text,
>> visitBlock(), visitLine() and visitWord(). But only visitLine()
>> method is implemented. Because "line" is defined by the
>> analysis of the text layout, there is no lines in raw order.
>>
>
>Why not simply use TextOutputDev::getText() like qt4 frontend does?
>TextOutputDev::getSelectionText() is meant for selections, but you
>don't want text in raw order for selections. I would just add a new
>method gchar *poppler_page_get_raw_text (PopplerPage *page);

Oh. If you think it's acceptable design, I will do so.
I want to add new method with argument to specify the
rectangle area where the text is extracted.

Anyway, thank you for enlightening me with quick reply.

Regards,
mpsuzuki


More information about the poppler mailing list