[poppler] a plan to extend poppler-glib to access the raw text

carlosgc carlosgc at gnome.org
Tue Sep 7 00:32:26 PDT 2010


Excerpts from mpsuzuki's message of mar sep 07 09:21:13 +0200 2010:
> On Tue, 07 Sep 2010 09:04:13 +0200
> carlosgc <carlosgc at gnome.org> wrote:
> >Excerpts from mpsuzuki's message of mar sep 07 08:42:31 +0200 2010:
> >> It dumps the strings collected by TextSelectionVisitor
> >> object. TextSelectionVisitor define 3 methods to eat the text,
> >> visitBlock(), visitLine() and visitWord(). But only visitLine()
> >> method is implemented. Because "line" is defined by the
> >> analysis of the text layout, there is no lines in raw order.
> >>
> >
> >Why not simply use TextOutputDev::getText() like qt4 frontend does?
> >TextOutputDev::getSelectionText() is meant for selections, but you
> >don't want text in raw order for selections. I would just add a new
> >method gchar *poppler_page_get_raw_text (PopplerPage *page);
> 
> Oh. If you think it's acceptable design, I will do so.

Yes.

> I want to add new method with argument to specify the
> rectangle area where the text is extracted.

We currently have:

 - poppler_page_get_selected_text, that takes a rectangle
 - poppler_page_get_text, that doesn't take a rectangle

We have already broken the API with poppler_page_get_text so we can
just add a new parameter to specify the tetx order, or we can add
another method poppler_page_get_raw_text(). I prefer to add a new
method because poppler_page_get_text is used in combination to
poppler_page_get_text_layout()

> Anyway, thank you for enlightening me with quick reply.
> 
> Regards,
> mpsuzuki

Regards, 
-- 
Carlos Garcia Campos
PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x523E6462
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100907/bb267c6b/attachment.pgp>


More information about the poppler mailing list