[poppler] How to read textbox positions?

Adam Reichold adam.reichold at t-online.de
Tue Jan 2 05:54:15 UTC 2018


Hello

Am 02.01.2018 um 01:06 schrieb Albert Astals Cid:
> El diumenge, 31 de desembre de 2017, a les 0:45:16 CET, suzuki toshiya va 
> escriure:
>> Hi,
>>
>> I've tried to implement the suggestion, I attached my current patch.
>>
>> As suggested, the most part is just copied from Qt frontend and renamed,
>> except of one point: TextBox.nextWord() looks slightly confusing,
>> because the returned object is a pointer to TextBox. I wrote
>> text_box.next_text_box() and a macro text_box.next_word() which
>> calls next_text_box() internally.
>>
>> Another point I want to discuss is the design of the list give by
>> poppler::page::text_list(). In Qt frontend, Page::textList() returns
>> QList<TextBox*>. For similarity, current patch returns
>> std::vector<text_box*> for similarity to Qt frontend.
>>
>> But, if we return the vector of pointers, the client should destruct
>> the objects pointed by the vector, before destructing vector itself.
>> Using a vector of text_box (not the pointer but the object itself),
>> like std::vector<text_box>, could be better, because the destructor
>> of the vector would internally call the destructor for text_box object.
>> (Qt has qDeleteAll(), but I think std::vector does not have such).
>> If I'm misunderstanding about C++, please correct.
> 
> First you need to fix the text_box class, you either need to make it 
> uncopiable (and thus making your suggestion not possible) or you need to make 
> the copy and assignment operators not break your class since at the moment the 
> default operators will just copy the pointer and everything will break because 
> you will end up with a dangling pointer.
> 
> If you go for the second option you'll want the move constructor and move 
> operator too.

Should the first option also be viable if text_box becomes a move-only
type, i.e. no copying but move construction and assignment? I think
std::vector can handle it. (Another easy option albeit with some
overhead to get automatic destruction would be to wrap the actual data
inside a std::shared_ptr and hence use reference counting.)

Best regards, Adam.

> Cheers,
>   Albert
> 
>>
>> Regards,
>> mpsuzuki
>>
>> Albert Astals Cid wrote:
>>> El dimecres, 27 de desembre de 2017, a les 12:26:25 CET, Jeroen Ooms va
>>>
>>> escriure:
>>>> Is there a method in poppler-cpp to extract text from a pdf document,
>>>> including the position of each text box? Currently we use page->text()
>>>> with page::physical_layout which gives all text per page, but I need
>>>> more detailed information about each text box per page.
>>>
>>> You want to code the variant of qt5 frontend Poppler::Page::textList() for
>>> cpp frontend, it shouldn't be that hard getting inspiration (i.e.
>>> almost-copying) the code, do you have time for it?
>>>
>>> Cheers,
>>>
>>>   Albert
>>>>
>>>> _______________________________________________
>>>> poppler mailing list
>>>> poppler at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/poppler
>>>
>>> _______________________________________________
>>> poppler mailing list
>>> poppler at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/poppler
> 
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/poppler
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 525 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180102/378e8324/attachment.sig>


More information about the poppler mailing list