[poppler] [PATCH] TextPage::getText in rawOrder mode.

Daniel Garcia Moreno danigm at yaco.es
Fri May 7 00:50:56 PDT 2010


El mar, 27-04-2010 a las 18:28 +0200, Daniel Garcia Moreno escribió:
> Hi to all:
> 
> I'm reading the poppler code and touching something here and there
> because I'll implement the atk interface for evince and I need to know
> how to get the text of a pdf file from glib.
> 
> I want to get the text ordered like you'll read it, I saw that pdftotext
> get the text well ordered using the "-raw" option. I looked the code and
> I saw that it use TextOutputDev with rawOrder = true.
> 
> It's easy to dump the text to a file using the first argument that
> receive the TextOutputDev constructor, but I want to get the text as
> char *.
> 
> I saw that using rawOrder in TextOutputDev you can't use getText method,
> it always returns an empty GooString:
> 
> ...
> 3603   s = new GooString();
> 3604 
> 3605   if (rawOrder) {
> 3606     return s;
> 3607   }
> ...
> 
> And here is the question, that is a bug/not_implemented_feature or it's
> like that for some reason?
> 
> If you think that's a bug I could create the bug and upload a patch to
> "solve" it using the TextWordList.
> 

I filed the bug [1], and attached a patch. I attach the patch in this
mail too.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=27999



More information about the poppler mailing list