[poppler] poppler_page_get_selected_raw_text() for poppler-glib
suzuki toshiya
mpsuzuki at hiroshima-u.ac.jp
Wed Sep 15 03:16:22 PDT 2010
Hi,
Attached patches are the introduction of new API to access raw text.
I wish some maintainer of poppler-glib can review it.
poppler-0.15.0_glib-lib.diff
patch to declare new function and its implementation
--
At present, poppler does not provide easy sample to use the function.
I attached a patch to add small sample "poppler-glib-get-text" for
reviewer. This is just sample for the reviewer, and I'm not proposing
this patch to official application.
poppler-0.15.0_glib-demo.diff
patch to add a sample program testing poppler_page_get_selected_text()
and poppler_page_get_selected_raw_text().
Regards,
mpsuzuki
carlosgc wrote:
> Excerpts from mpsuzuki's message of mar sep 07 09:21:13 +0200 2010:
>
>> On Tue, 07 Sep 2010 09:04:13 +0200
>> carlosgc <carlosgc at gnome.org> wrote:
>>
>>> Excerpts from mpsuzuki's message of mar sep 07 08:42:31 +0200 2010:
>>>
>>>> It dumps the strings collected by TextSelectionVisitor
>>>> object. TextSelectionVisitor define 3 methods to eat the text,
>>>> visitBlock(), visitLine() and visitWord(). But only visitLine()
>>>> method is implemented. Because "line" is defined by the
>>>> analysis of the text layout, there is no lines in raw order.
>>>>
>>>>
>>> Why not simply use TextOutputDev::getText() like qt4 frontend does?
>>> TextOutputDev::getSelectionText() is meant for selections, but you
>>> don't want text in raw order for selections. I would just add a new
>>> method gchar *poppler_page_get_raw_text (PopplerPage *page);
>>>
>> Oh. If you think it's acceptable design, I will do so.
>>
>
> Yes.
>
>
>> I want to add new method with argument to specify the
>> rectangle area where the text is extracted.
>>
>
> We currently have:
>
> - poppler_page_get_selected_text, that takes a rectangle
> - poppler_page_get_text, that doesn't take a rectangle
>
> We have already broken the API with poppler_page_get_text so we can
> just add a new parameter to specify the tetx order, or we can add
> another method poppler_page_get_raw_text(). I prefer to add a new
> method because poppler_page_get_text is used in combination to
> poppler_page_get_text_layout()
>
>
>> Anyway, thank you for enlightening me with quick reply.
>>
>> Regards,
>> mpsuzuki
>>
>
> Regards,
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: poppler-0.15.0_glib-lib.diff
Type: text/x-patch
Size: 2751 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100915/9bbd8682/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: poppler-0.15.0_glib-demo.diff
Type: text/x-patch
Size: 5628 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100915/9bbd8682/attachment-0001.bin>
More information about the poppler
mailing list