[poppler] pdftotext -bbox page size

William Bader williambader at hotmail.com
Fri Nov 6 20:22:40 UTC 2020


I have a PDF with a TrimBox and a CropBox that do not start at the origin.
It looks pdftotext -bbox writes the maximum extent of the MediaBox into the <page> element instead of writing the page size.
Can I submit a patch to change it, or to add an option to change it, or to write more values in the <page width=... depth=...> line?
For example, I have a pdf where pdfinfo -box says
Page size:      84.95 x 2210.75 pts
Page rot:       0
MediaBox:           0.00     0.00    84.95  2310.50
CropBox:            0.00    99.75    84.95  2310.50
BleedBox:           0.00    99.75    84.95  2310.50
TrimBox:            0.00    99.75    84.95  2310.50
ArtBox:             0.00    99.75    84.95  2310.50
but pdftotext -bbox writes
  <page width="84.950000" height="2310.500000">
    <word xMin="13.350000" yMin="0.322500" xMax="34.018450" yMax="6.466000">NOTICE</word>
...
    <word xMin="22.548900" yMin="2197.922500" xMax="36.779600" yMax="2204.066000">2010)</word>
  </page>
so when I assemble the page from the words, I have an extra 99.75 points of empty space at the bottom.

Regards, William

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20201106/82bd091d/attachment.htm>


More information about the poppler mailing list