[poppler] extend poppler::text_box to store some font infos

Albert Astals Cid aacid at kde.org
Sun May 6 21:50:27 UTC 2018


El dilluns, 19 de març de 2018, a les 17:30:38 CEST, suzuki toshiya va 
escriure:
> Hi,
> 
> Recently I heard some people wants to retrieve the list
> of words from PDF, as cpp's poppler::page::text_list(),
> but with the font information (e.g. the familyname of
> the font).
> 
> Considering that often the office document or academic
> articles use different fonts for the section titles and
> the main text, it would be reasonable for the people to
> expect as "I want to retrieve the text boxes, but only
> the text boxes written by Helvetica-Bold".
> 
> What is the right way to do such? During the developmet
> of poppler::page::text_list(), once I've tried to do such.
> https://github.com/mpsuzuki/poppler/commit/8ce2556a62a90c034d7cea8b1dfd26715
> d03a8f0 (note: this patch was written before the stabilization
> of unique_ptr utilization. more fix is expected in future)
> 
> However, I feel it's slightly too big. Its changes are
> not only for cpp frontend codes, but also for poppler/FontInfo.{cc,h}
> and poppler/TextOutputDev.{cc,h}. I want to ask a few
> questions...
> 
> Q-1) a request for text_box with font info fits to poppler's
> scope? is there any better library to request such feature?

We already have it in TextOutputDev, so sure, why not.

> 
> Q-2) if this request fits to poppler's scope, the enhancement
> of the cpp frontend poppler::page::text_list() is the way to
> go? having different API for such purpose is better?

Well, API is exactly the problem here, what do you plan to expose, only a 
string? I've seen you've added font_size and wmode too. Is that enough? Also 
you really need some documentation, if i have a look at that class and see 

int         get_wmode(int i = 0) const;

Without any kind of documentation, i wouldn't know what to do with that 
function.

> Q-3) my current patch modifies FontInfo and TextOutputDev
> of libpoppler itself. such modification is acceptable?

If you don't create bugs or make it slower for different use cases, sure why 
wouldn't such modifications be acceptable?

Cheers,
  Albert

> 
> I appreciate if the maintainers can give some comments.
> 
> Regards,
> mpsuzuki
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/poppler






More information about the poppler mailing list