[poppler] poppler_page_get_selected_raw_text() for poppler-glib
Daniel Garcia
danigm at wadobo.com
Wed Jan 5 05:11:05 PST 2011
On Wed, Sep 22, 2010 at 02:11:31PM +0200, carlosgc wrote:
> Excerpts from suzuki toshiya's message of mié sep 15 12:16:22 +0200 2010:
> > Hi,
>
> Hi,
>
> > Attached patches are the introduction of new API to access raw text.
> > I wish some maintainer of poppler-glib can review it.
>
> Yes, sorry for the delay.
>
> > poppler-0.15.0_glib-lib.diff
> > patch to declare new function and its implementation
> >
>
> I prefer poppler_page_get_raw_text(), rather than
> poppler_page_get_selected_raw_text(), and always return the text of
> the whole page. I don't see why you might want the selected text in
> raw order.
I've made that function. Here's the patch.
-------------- next part --------------
From 389d49e3413ce09601b574308bd6bbd46044e6b3 Mon Sep 17 00:00:00 2001
From: danigm <danigm at wadobo.com>
Date: Wed, 5 Jan 2011 14:07:59 +0100
Subject: [PATCH] [glib] Added poppler_page_get_raw_text function
---
glib/poppler-page.cc | 54 +++++++++++++++++++++++++++++++++++++++++++++++++-
glib/poppler-page.h | 1 +
2 files changed, 54 insertions(+), 1 deletions(-)
diff --git a/glib/poppler-page.cc b/glib/poppler-page.cc
index a8e6b2d..8966f7e 100644
--- a/glib/poppler-page.cc
+++ b/glib/poppler-page.cc
@@ -2117,7 +2117,7 @@ poppler_page_get_crop_box (PopplerPage *page, PopplerRectangle *rect)
* This array must be freed with g_free () when done.
*
* The position in the array represents an offset in the text returned by
- * poppler_page_get_text()
+ * poppler_page_get_raw_text()
*
* Return value: %TRUE if the page contains text, %FALSE otherwise
*
@@ -2200,3 +2200,55 @@ poppler_page_get_text_layout (PopplerPage *page,
return TRUE;
}
+
+/**
+ * poppler_page_get_raw_text:
+ * @page: A #PopplerPage
+ *
+ * Return value: a pointer to the text page in raw order
+ * as a string
+ *
+ **/
+char *
+poppler_page_get_raw_text (PopplerPage *page)
+{
+ TextPage *text;
+ TextWordList *wordlist;
+ TextWord *word, *nextword;
+ char *craw_text;
+ GooString *raw_text;
+ int i = 0;
+
+ raw_text = new GooString();
+
+ g_return_val_if_fail (POPPLER_IS_PAGE (page), FALSE);
+
+ text = poppler_page_get_text_page (page);
+ wordlist = text->makeWordList (gFalse);
+
+ if (wordlist->getLength () <= 0)
+ return NULL;
+
+ for (i = 0; i < wordlist->getLength (); i++)
+ {
+ word = wordlist->get (i);
+ raw_text->append (word->getText ());
+
+ nextword = word->getNext ();
+ if (nextword)
+ {
+ raw_text->append (' ');
+ }
+ else
+ {
+ raw_text->append ('\n');
+ }
+ }
+
+ craw_text = g_strdup (raw_text->getCString ());
+
+ delete wordlist;
+ delete raw_text;
+
+ return craw_text;
+}
diff --git a/glib/poppler-page.h b/glib/poppler-page.h
index d40c0ee..333cb23 100644
--- a/glib/poppler-page.h
+++ b/glib/poppler-page.h
@@ -128,6 +128,7 @@ void poppler_page_get_crop_box (PopplerPage *page,
gboolean poppler_page_get_text_layout (PopplerPage *page,
PopplerRectangle **rectangles,
guint *n_rectangles);
+char *poppler_page_get_raw_text (PopplerPage *page);
/* A rectangle on a page, with coordinates in PDF points. */
#define POPPLER_TYPE_RECTANGLE (poppler_rectangle_get_type ())
--
1.7.3.4.742.g987cd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20110105/baf4af65/attachment.pgp>
More information about the poppler
mailing list