[poppler] TextPage::getText in rawOrder mode.

Daniel Garcia Moreno danigm at yaco.es
Tue Apr 27 09:28:45 PDT 2010


Hi to all:

I'm reading the poppler code and touching something here and there
because I'll implement the atk interface for evince and I need to know
how to get the text of a pdf file from glib.

I want to get the text ordered like you'll read it, I saw that pdftotext
get the text well ordered using the "-raw" option. I looked the code and
I saw that it use TextOutputDev with rawOrder = true.

It's easy to dump the text to a file using the first argument that
receive the TextOutputDev constructor, but I want to get the text as
char *.

I saw that using rawOrder in TextOutputDev you can't use getText method,
it always returns an empty GooString:

...
3603   s = new GooString();
3604 
3605   if (rawOrder) {
3606     return s;
3607   }
...

And here is the question, that is a bug/not_implemented_feature or it's
like that for some reason?

If you think that's a bug I could create the bug and upload a patch to
"solve" it using the TextWordList.



More information about the poppler mailing list