[poppler] How to read textbox positions?

suzuki toshiya mpsuzuki at hiroshima-u.ac.jp
Tue Jan 2 07:19:45 UTC 2018


Dear Albert, Adam,

Thank you for the helpful review. I should have more
training on C++... Please let me spend to improve the
patch.

> (Another easy option albeit with some
> overhead to get automatic destruction would be to wrap the actual data
> inside a std::shared_ptr and hence use reference counting.)

Yeah, from some posts in stackoverflow, I found about
shared_ptr... But it seems that current poppler does
not use shared_ptr at all. Is this a coding policy?

Regards,
mpsuzuki

Adam Reichold wrote:
> Hello
> 
> Am 02.01.2018 um 01:06 schrieb Albert Astals Cid:
>> El diumenge, 31 de desembre de 2017, a les 0:45:16 CET, suzuki toshiya va 
>> escriure:
>>> Hi,
>>>
>>> I've tried to implement the suggestion, I attached my current patch.
>>>
>>> As suggested, the most part is just copied from Qt frontend and renamed,
>>> except of one point: TextBox.nextWord() looks slightly confusing,
>>> because the returned object is a pointer to TextBox. I wrote
>>> text_box.next_text_box() and a macro text_box.next_word() which
>>> calls next_text_box() internally.
>>>
>>> Another point I want to discuss is the design of the list give by
>>> poppler::page::text_list(). In Qt frontend, Page::textList() returns
>>> QList<TextBox*>. For similarity, current patch returns
>>> std::vector<text_box*> for similarity to Qt frontend.
>>>
>>> But, if we return the vector of pointers, the client should destruct
>>> the objects pointed by the vector, before destructing vector itself.
>>> Using a vector of text_box (not the pointer but the object itself),
>>> like std::vector<text_box>, could be better, because the destructor
>>> of the vector would internally call the destructor for text_box object.
>>> (Qt has qDeleteAll(), but I think std::vector does not have such).
>>> If I'm misunderstanding about C++, please correct.
>> First you need to fix the text_box class, you either need to make it 
>> uncopiable (and thus making your suggestion not possible) or you need to make 
>> the copy and assignment operators not break your class since at the moment the 
>> default operators will just copy the pointer and everything will break because 
>> you will end up with a dangling pointer.
>>
>> If you go for the second option you'll want the move constructor and move 
>> operator too.
> 
> Should the first option also be viable if text_box becomes a move-only
> type, i.e. no copying but move construction and assignment? I think
> std::vector can handle it. (Another easy option albeit with some
> overhead to get automatic destruction would be to wrap the actual data
> inside a std::shared_ptr and hence use reference counting.)
> 
> Best regards, Adam.
> 
>> Cheers,
>>   Albert
>>
>>> Regards,
>>> mpsuzuki
>>>
>>> Albert Astals Cid wrote:
>>>> El dimecres, 27 de desembre de 2017, a les 12:26:25 CET, Jeroen Ooms va
>>>>
>>>> escriure:
>>>>> Is there a method in poppler-cpp to extract text from a pdf document,
>>>>> including the position of each text box? Currently we use page->text()
>>>>> with page::physical_layout which gives all text per page, but I need
>>>>> more detailed information about each text box per page.
>>>> You want to code the variant of qt5 frontend Poppler::Page::textList() for
>>>> cpp frontend, it shouldn't be that hard getting inspiration (i.e.
>>>> almost-copying) the code, do you have time for it?
>>>>
>>>> Cheers,
>>>>
>>>>   Albert
>>>>> _______________________________________________
>>>>> poppler mailing list
>>>>> poppler at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/poppler
>>>> _______________________________________________
>>>> poppler mailing list
>>>> poppler at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/poppler
>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/poppler
>>
> 



More information about the poppler mailing list