[poppler] pdftotext -bbox page size

Albert Astals Cid aacid at kde.org
Fri Nov 6 23:21:44 UTC 2020


Tom, you suggested the reverse change 9 years ago

https://gitlab.freedesktop.org/poppler/poppler/-/commit/807c1df2bf79c7c6378390b41dc230d80533ae3f

Do you remember why? Anything that William may be missing?

Cheers,
  Albert

El divendres, 6 de novembre de 2020, a les 21:22:40 CET, William Bader va escriure:
> I have a PDF with a TrimBox and a CropBox that do not start at the origin.
> It looks pdftotext -bbox writes the maximum extent of the MediaBox into the <page> element instead of writing the page size.
> Can I submit a patch to change it, or to add an option to change it, or to write more values in the <page width=... depth=...> line?
> For example, I have a pdf where pdfinfo -box says
> Page size:      84.95 x 2210.75 pts
> Page rot:       0
> MediaBox:           0.00     0.00    84.95  2310.50
> CropBox:            0.00    99.75    84.95  2310.50
> BleedBox:           0.00    99.75    84.95  2310.50
> TrimBox:            0.00    99.75    84.95  2310.50
> ArtBox:             0.00    99.75    84.95  2310.50
> but pdftotext -bbox writes
>   <page width="84.950000" height="2310.500000">
>     <word xMin="13.350000" yMin="0.322500" xMax="34.018450" yMax="6.466000">NOTICE</word>
> ...
>     <word xMin="22.548900" yMin="2197.922500" xMax="36.779600" yMax="2204.066000">2010)</word>
>   </page>
> so when I assemble the page from the words, I have an extra 99.75 points of empty space at the bottom.
> 
> Regards, William
> 
> 






More information about the poppler mailing list