[poppler] How to read textbox positions?

Adam Reichold adam.reichold at t-online.de
Tue Jan 2 17:47:23 UTC 2018


Hello mpsuzuki,

std::shared_ptr was introduced in C++11 and Poppler does use C++11 only
since a relatively short time, so I guess there just was no time to use
it anywhere.

In any case it is a pretty heavy-handed solution due to the atomic
memory accesses on the reference counter which will slow down even
single-threaded programs which do not need it. Hence I would suggest
starting with std::unique_ptr to get a move-only type.

Best regards, Adam.

Am 02.01.2018 um 08:19 schrieb suzuki toshiya:
> Dear Albert, Adam,
> 
> Thank you for the helpful review. I should have more
> training on C++... Please let me spend to improve the
> patch.
> 
>> (Another easy option albeit with some
>> overhead to get automatic destruction would be to wrap the actual data
>> inside a std::shared_ptr and hence use reference counting.)
> 
> Yeah, from some posts in stackoverflow, I found about
> shared_ptr... But it seems that current poppler does
> not use shared_ptr at all. Is this a coding policy?
> 
> Regards,
> mpsuzuki
> 
> Adam Reichold wrote:
>> Hello
>>
>> Am 02.01.2018 um 01:06 schrieb Albert Astals Cid:
>>> El diumenge, 31 de desembre de 2017, a les 0:45:16 CET, suzuki toshiya va 
>>> escriure:
>>>> Hi,
>>>>
>>>> I've tried to implement the suggestion, I attached my current patch.
>>>>
>>>> As suggested, the most part is just copied from Qt frontend and renamed,
>>>> except of one point: TextBox.nextWord() looks slightly confusing,
>>>> because the returned object is a pointer to TextBox. I wrote
>>>> text_box.next_text_box() and a macro text_box.next_word() which
>>>> calls next_text_box() internally.
>>>>
>>>> Another point I want to discuss is the design of the list give by
>>>> poppler::page::text_list(). In Qt frontend, Page::textList() returns
>>>> QList<TextBox*>. For similarity, current patch returns
>>>> std::vector<text_box*> for similarity to Qt frontend.
>>>>
>>>> But, if we return the vector of pointers, the client should destruct
>>>> the objects pointed by the vector, before destructing vector itself.
>>>> Using a vector of text_box (not the pointer but the object itself),
>>>> like std::vector<text_box>, could be better, because the destructor
>>>> of the vector would internally call the destructor for text_box object.
>>>> (Qt has qDeleteAll(), but I think std::vector does not have such).
>>>> If I'm misunderstanding about C++, please correct.
>>> First you need to fix the text_box class, you either need to make it 
>>> uncopiable (and thus making your suggestion not possible) or you need to make 
>>> the copy and assignment operators not break your class since at the moment the 
>>> default operators will just copy the pointer and everything will break because 
>>> you will end up with a dangling pointer.
>>>
>>> If you go for the second option you'll want the move constructor and move 
>>> operator too.
>>
>> Should the first option also be viable if text_box becomes a move-only
>> type, i.e. no copying but move construction and assignment? I think
>> std::vector can handle it. (Another easy option albeit with some
>> overhead to get automatic destruction would be to wrap the actual data
>> inside a std::shared_ptr and hence use reference counting.)
>>
>> Best regards, Adam.
>>
>>> Cheers,
>>>   Albert
>>>
>>>> Regards,
>>>> mpsuzuki
>>>>
>>>> Albert Astals Cid wrote:
>>>>> El dimecres, 27 de desembre de 2017, a les 12:26:25 CET, Jeroen Ooms va
>>>>>
>>>>> escriure:
>>>>>> Is there a method in poppler-cpp to extract text from a pdf document,
>>>>>> including the position of each text box? Currently we use page->text()
>>>>>> with page::physical_layout which gives all text per page, but I need
>>>>>> more detailed information about each text box per page.
>>>>> You want to code the variant of qt5 frontend Poppler::Page::textList() for
>>>>> cpp frontend, it shouldn't be that hard getting inspiration (i.e.
>>>>> almost-copying) the code, do you have time for it?
>>>>>
>>>>> Cheers,
>>>>>
>>>>>   Albert
>>>>>> _______________________________________________
>>>>>> poppler mailing list
>>>>>> poppler at lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/poppler
>>>>> _______________________________________________
>>>>> poppler mailing list
>>>>> poppler at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/poppler
>>>
>>> _______________________________________________
>>> poppler mailing list
>>> poppler at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/poppler
>>>
>>
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 525 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180102/4e2b767c/attachment.sig>


More information about the poppler mailing list