[poppler] poppler_page_get_selected_raw_text() for poppler-glib

mpsuzuki at hiroshima-u.ac.jp mpsuzuki at hiroshima-u.ac.jp
Tue Jan 4 03:17:15 PST 2011


Hi,

On Tue, 4 Jan 2011 12:00:09 +0100
Daniel Garcia <danigm at wadobo.com> wrote:

>On Wed, Sep 22, 2010 at 02:11:31PM +0200, carlosgc wrote:
>> Excerpts from suzuki toshiya's message of mi$(D+1(B sep 15 12:16:22 +0200 2010:
>> > Hi,
>> 
>> Hi, 
>> 
>> > Attached patches are the introduction of new API to access raw text.
>> > I wish some maintainer of poppler-glib can review it.
>> 
>> Yes, sorry for the delay. 
>> 
>> > poppler-0.15.0_glib-lib.diff
>> > patch to declare new function and its implementation
>> > 
>> 
>> I prefer poppler_page_get_raw_text(), rather than
>> poppler_page_get_selected_raw_text(), and always return the text of
>> the whole page. I don't see why you might want the selected text in
>> raw order.
>
>This patch never get applied... I'll write the
>poppler_page_get_raw_text() function. I don't know if suzuki is still
>interested.

I'm sorry for my silence to your question, I have
been too busy to write a good explanation to your
question.

The reason why I wanted to give the rectangle to
restrict the area to extract raw text was related
with the status that current poppler is difficult to
extract the text from complex layouted materials,
like vertical layouted CJK text, right-to-left like
Arabic/Hebrew, etc. For me, the achievement of the
feature as builtin feature of poppler seems to be
very long way work. Therefore, I wanted to provide
the APIs that can extract raw text from the specified
rectangle. I expected the higher level application
can move the small window in the page and collect
the fragment of the raw text with their positions
and restruct the text by themselves.

Could I answer to your question?

Regards,
mpsuzuki


More information about the poppler mailing list