[poppler] TextPage::getText in rawOrder mode.
Daniel Garcia Moreno
danigm at yaco.es
Tue Apr 27 09:28:45 PDT 2010
Hi to all:
I'm reading the poppler code and touching something here and there
because I'll implement the atk interface for evince and I need to know
how to get the text of a pdf file from glib.
I want to get the text ordered like you'll read it, I saw that pdftotext
get the text well ordered using the "-raw" option. I looked the code and
I saw that it use TextOutputDev with rawOrder = true.
It's easy to dump the text to a file using the first argument that
receive the TextOutputDev constructor, but I want to get the text as
char *.
I saw that using rawOrder in TextOutputDev you can't use getText method,
it always returns an empty GooString:
...
3603 s = new GooString();
3604
3605 if (rawOrder) {
3606 return s;
3607 }
...
And here is the question, that is a bug/not_implemented_feature or it's
like that for some reason?
If you think that's a bug I could create the bug and upload a patch to
"solve" it using the TextWordList.
More information about the poppler
mailing list