[poppler] How to read textbox positions?
mpsuzuki at hiroshima-u.ac.jp
Sat Dec 30 15:45:16 UTC 2017
I've tried to implement the suggestion, I attached my current patch.
As suggested, the most part is just copied from Qt frontend and renamed,
except of one point: TextBox.nextWord() looks slightly confusing,
because the returned object is a pointer to TextBox. I wrote
text_box.next_text_box() and a macro text_box.next_word() which
calls next_text_box() internally.
Another point I want to discuss is the design of the list give by
poppler::page::text_list(). In Qt frontend, Page::textList() returns
QList<TextBox*>. For similarity, current patch returns std::vector<text_box*>
for similarity to Qt frontend.
But, if we return the vector of pointers, the client should destruct
the objects pointed by the vector, before destructing vector itself.
Using a vector of text_box (not the pointer but the object itself),
like std::vector<text_box>, could be better, because the destructor
of the vector would internally call the destructor for text_box object.
(Qt has qDeleteAll(), but I think std::vector does not have such).
If I'm misunderstanding about C++, please correct.
Albert Astals Cid wrote:
> El dimecres, 27 de desembre de 2017, a les 12:26:25 CET, Jeroen Ooms va
>> Is there a method in poppler-cpp to extract text from a pdf document,
>> including the position of each text box? Currently we use page->text()
>> with page::physical_layout which gives all text per page, but I need
>> more detailed information about each text box per page.
> You want to code the variant of qt5 frontend Poppler::Page::textList() for cpp
> frontend, it shouldn't be that hard getting inspiration (i.e. almost-copying)
> the code, do you have time for it?
>> poppler mailing list
>> poppler at lists.freedesktop.org
> poppler mailing list
> poppler at lists.freedesktop.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 7215 bytes
Desc: not available
More information about the poppler