[poppler] How to read textbox positions?

suzuki toshiya mpsuzuki at hiroshima-u.ac.jp
Sat Dec 30 15:45:16 UTC 2017


Hi,

I've tried to implement the suggestion, I attached my current patch.

As suggested, the most part is just copied from Qt frontend and renamed,
except of one point: TextBox.nextWord() looks slightly confusing,
because the returned object is a pointer to TextBox. I wrote
text_box.next_text_box() and a macro text_box.next_word() which
calls next_text_box() internally.

Another point I want to discuss is the design of the list give by
poppler::page::text_list(). In Qt frontend, Page::textList() returns
QList<TextBox*>. For similarity, current patch returns std::vector<text_box*>
for similarity to Qt frontend.

But, if we return the vector of pointers, the client should destruct
the objects pointed by the vector, before destructing vector itself.
Using a vector of text_box (not the pointer but the object itself),
like std::vector<text_box>, could be better, because the destructor
of the vector would internally call the destructor for text_box object.
(Qt has qDeleteAll(), but I think std::vector does not have such).
If I'm misunderstanding about C++, please correct.

Regards,
mpsuzuki


Albert Astals Cid wrote:
> El dimecres, 27 de desembre de 2017, a les 12:26:25 CET, Jeroen Ooms va 
> escriure:
>> Is there a method in poppler-cpp to extract text from a pdf document,
>> including the position of each text box? Currently we use page->text()
>> with page::physical_layout which gives all text per page, but I need
>> more detailed information about each text box per page.
> 
> You want to code the variant of qt5 frontend Poppler::Page::textList() for cpp 
> frontend, it shouldn't be that hard getting inspiration (i.e. almost-copying) 
> the code, do you have time for it?
> 
> Cheers,
>   Albert
> 
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/poppler
> 
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/poppler
-------------- next part --------------
A non-text attachment was scrubbed...
Name: add-text_list-to-cpp-frontend_20171230.diff
Type: text/x-patch
Size: 7215 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20171231/26a2c7ca/attachment.bin>


More information about the poppler mailing list