[poppler] [PATCH] TextPage::getText in rawOrder mode.
Daniel Garcia Moreno
danigm at yaco.es
Fri May 7 00:50:56 PDT 2010
El mar, 27-04-2010 a las 18:28 +0200, Daniel Garcia Moreno escribió:
> Hi to all:
>
> I'm reading the poppler code and touching something here and there
> because I'll implement the atk interface for evince and I need to know
> how to get the text of a pdf file from glib.
>
> I want to get the text ordered like you'll read it, I saw that pdftotext
> get the text well ordered using the "-raw" option. I looked the code and
> I saw that it use TextOutputDev with rawOrder = true.
>
> It's easy to dump the text to a file using the first argument that
> receive the TextOutputDev constructor, but I want to get the text as
> char *.
>
> I saw that using rawOrder in TextOutputDev you can't use getText method,
> it always returns an empty GooString:
>
> ...
> 3603 s = new GooString();
> 3604
> 3605 if (rawOrder) {
> 3606 return s;
> 3607 }
> ...
>
> And here is the question, that is a bug/not_implemented_feature or it's
> like that for some reason?
>
> If you think that's a bug I could create the bug and upload a patch to
> "solve" it using the TextWordList.
>
I filed the bug [1], and attached a patch. I attach the patch in this
mail too.
[1] https://bugs.freedesktop.org/show_bug.cgi?id=27999
More information about the poppler
mailing list