[poppler] How to read textbox positions?

Albert Astals Cid aacid at kde.org
Tue Jan 2 00:06:21 UTC 2018


El diumenge, 31 de desembre de 2017, a les 0:45:16 CET, suzuki toshiya va 
escriure:
> Hi,
> 
> I've tried to implement the suggestion, I attached my current patch.
> 
> As suggested, the most part is just copied from Qt frontend and renamed,
> except of one point: TextBox.nextWord() looks slightly confusing,
> because the returned object is a pointer to TextBox. I wrote
> text_box.next_text_box() and a macro text_box.next_word() which
> calls next_text_box() internally.
> 
> Another point I want to discuss is the design of the list give by
> poppler::page::text_list(). In Qt frontend, Page::textList() returns
> QList<TextBox*>. For similarity, current patch returns
> std::vector<text_box*> for similarity to Qt frontend.
> 
> But, if we return the vector of pointers, the client should destruct
> the objects pointed by the vector, before destructing vector itself.
> Using a vector of text_box (not the pointer but the object itself),
> like std::vector<text_box>, could be better, because the destructor
> of the vector would internally call the destructor for text_box object.
> (Qt has qDeleteAll(), but I think std::vector does not have such).
> If I'm misunderstanding about C++, please correct.

First you need to fix the text_box class, you either need to make it 
uncopiable (and thus making your suggestion not possible) or you need to make 
the copy and assignment operators not break your class since at the moment the 
default operators will just copy the pointer and everything will break because 
you will end up with a dangling pointer.

If you go for the second option you'll want the move constructor and move 
operator too.

Cheers,
  Albert

> 
> Regards,
> mpsuzuki
> 
> Albert Astals Cid wrote:
> > El dimecres, 27 de desembre de 2017, a les 12:26:25 CET, Jeroen Ooms va
> > 
> > escriure:
> >> Is there a method in poppler-cpp to extract text from a pdf document,
> >> including the position of each text box? Currently we use page->text()
> >> with page::physical_layout which gives all text per page, but I need
> >> more detailed information about each text box per page.
> > 
> > You want to code the variant of qt5 frontend Poppler::Page::textList() for
> > cpp frontend, it shouldn't be that hard getting inspiration (i.e.
> > almost-copying) the code, do you have time for it?
> > 
> > Cheers,
> > 
> >   Albert
> >> 
> >> _______________________________________________
> >> poppler mailing list
> >> poppler at lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/poppler
> > 
> > _______________________________________________
> > poppler mailing list
> > poppler at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/poppler




More information about the poppler mailing list