[poppler] extend poppler::text_box to store some font infos

Adam Reichold adam.reichold at t-online.de
Sat May 5 06:36:27 UTC 2018


Hello mpsuzuki,

again a commented diff is attached.

Personally, I would like to keep the return value of page::text_list a
proper value type without pointers into the object to keep the ownership
simpler.

Best regards,
Adam

Am 27.04.2018 um 04:24 schrieb suzuki toshiya:
> Hi,
> 
> Although nobody has commented on this, I updated this patch
> to work with my latest patch(es) for cpp encoding issue.
> 
> Regards,
> mpsuzuki
> 
> suzuki toshiya wrote:
>> Hi,
>>
>> Recently I heard some people wants to retrieve the list
>> of words from PDF, as cpp's poppler::page::text_list(),
>> but with the font information (e.g. the familyname of
>> the font).
>>
>> Considering that often the office document or academic
>> articles use different fonts for the section titles and
>> the main text, it would be reasonable for the people to
>> expect as "I want to retrieve the text boxes, but only
>> the text boxes written by Helvetica-Bold".
>>
>> What is the right way to do such? During the developmet
>> of poppler::page::text_list(), once I've tried to do such.
>> https://github.com/mpsuzuki/poppler/commit/8ce2556a62a90c034d7cea8b1dfd26715d03a8f0
>> (note: this patch was written before the stabilization
>> of unique_ptr utilization. more fix is expected in future)
>>
>> However, I feel it's slightly too big. Its changes are
>> not only for cpp frontend codes, but also for poppler/FontInfo.{cc,h}
>> and poppler/TextOutputDev.{cc,h}. I want to ask a few
>> questions...
>>
>> Q-1) a request for text_box with font info fits to poppler's
>> scope? is there any better library to request such feature?
>>
>> Q-2) if this request fits to poppler's scope, the enhancement
>> of the cpp frontend poppler::page::text_list() is the way to
>> go? having different API for such purpose is better?
>>
>> Q-3) my current patch modifies FontInfo and TextOutputDev
>> of libpoppler itself. such modification is acceptable?
>>
>> I appreciate if the maintainers can give some comments.
>>
>> Regards,
>> mpsuzuki
>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/poppler
>>
>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/poppler
-------------- next part --------------
A non-text attachment was scrubbed...
Name: add-cpp-textlist-font-info-comments.diff
Type: text/x-patch
Size: 17392 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180505/fa73d69a/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180505/fa73d69a/attachment.sig>


More information about the poppler mailing list