[poppler] poppler_page_get_selected_raw_text() for poppler-glib

suzuki toshiya mpsuzuki at hiroshima-u.ac.jp
Wed Sep 15 03:16:22 PDT 2010


Hi,

Attached patches are the introduction of new API to access raw text.
I wish some maintainer of poppler-glib can review it.

poppler-0.15.0_glib-lib.diff
patch to declare new function and its implementation

--

At present, poppler does not provide easy sample to use the function.
I attached a patch to add small sample "poppler-glib-get-text" for
reviewer. This is just sample for the reviewer, and I'm not proposing
this patch to official application.

poppler-0.15.0_glib-demo.diff
patch to add a sample program testing poppler_page_get_selected_text()
and poppler_page_get_selected_raw_text().

Regards,
mpsuzuki



carlosgc wrote:
> Excerpts from mpsuzuki's message of mar sep 07 09:21:13 +0200 2010:
>   
>> On Tue, 07 Sep 2010 09:04:13 +0200
>> carlosgc <carlosgc at gnome.org> wrote:
>>     
>>> Excerpts from mpsuzuki's message of mar sep 07 08:42:31 +0200 2010:
>>>       
>>>> It dumps the strings collected by TextSelectionVisitor
>>>> object. TextSelectionVisitor define 3 methods to eat the text,
>>>> visitBlock(), visitLine() and visitWord(). But only visitLine()
>>>> method is implemented. Because "line" is defined by the
>>>> analysis of the text layout, there is no lines in raw order.
>>>>
>>>>         
>>> Why not simply use TextOutputDev::getText() like qt4 frontend does?
>>> TextOutputDev::getSelectionText() is meant for selections, but you
>>> don't want text in raw order for selections. I would just add a new
>>> method gchar *poppler_page_get_raw_text (PopplerPage *page);
>>>       
>> Oh. If you think it's acceptable design, I will do so.
>>     
>
> Yes.
>
>   
>> I want to add new method with argument to specify the
>> rectangle area where the text is extracted.
>>     
>
> We currently have:
>
>  - poppler_page_get_selected_text, that takes a rectangle
>  - poppler_page_get_text, that doesn't take a rectangle
>
> We have already broken the API with poppler_page_get_text so we can
> just add a new parameter to specify the tetx order, or we can add
> another method poppler_page_get_raw_text(). I prefer to add a new
> method because poppler_page_get_text is used in combination to
> poppler_page_get_text_layout()
>
>   
>> Anyway, thank you for enlightening me with quick reply.
>>
>> Regards,
>> mpsuzuki
>>     
>
> Regards, 
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: poppler-0.15.0_glib-lib.diff
Type: text/x-patch
Size: 2751 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100915/9bbd8682/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: poppler-0.15.0_glib-demo.diff
Type: text/x-patch
Size: 5628 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100915/9bbd8682/attachment-0001.bin>


More information about the poppler mailing list