[poppler] pdftotext -bbox page size
William Bader
williambader at hotmail.com
Fri Nov 6 20:22:40 UTC 2020
I have a PDF with a TrimBox and a CropBox that do not start at the origin.
It looks pdftotext -bbox writes the maximum extent of the MediaBox into the <page> element instead of writing the page size.
Can I submit a patch to change it, or to add an option to change it, or to write more values in the <page width=... depth=...> line?
For example, I have a pdf where pdfinfo -box says
Page size: 84.95 x 2210.75 pts
Page rot: 0
MediaBox: 0.00 0.00 84.95 2310.50
CropBox: 0.00 99.75 84.95 2310.50
BleedBox: 0.00 99.75 84.95 2310.50
TrimBox: 0.00 99.75 84.95 2310.50
ArtBox: 0.00 99.75 84.95 2310.50
but pdftotext -bbox writes
<page width="84.950000" height="2310.500000">
<word xMin="13.350000" yMin="0.322500" xMax="34.018450" yMax="6.466000">NOTICE</word>
...
<word xMin="22.548900" yMin="2197.922500" xMax="36.779600" yMax="2204.066000">2010)</word>
</page>
so when I assemble the page from the words, I have an extra 99.75 points of empty space at the bottom.
Regards, William
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20201106/82bd091d/attachment.htm>
More information about the poppler
mailing list