[poppler] Removing watermark/footer from JSTOR PDFs

Federico Leva (Nemo) nemowiki at gmail.com
Sat Jul 23 00:44:29 PDT 2011


Hello,
you might have heard about 
<http://arstechnica.com/tech-policy/news/2011/07/swartz-supporter-dumps-18592-jstor-docs-on-the-pirate-bay.ars>
We're now going to upload those ~19000 PDFs to the Internet Archive, but 
we need to remove a watermark. Could you please give me a suggestion 
about how to do it? Sadly I don't know anything about PDF manipulation.
We tried pdfimages, which output a .pbms per page plus a .ppm (the 
footer/watermark); using ImageMagick to recombine pages in a PDF 
compressed with LZM produced a PDF almost 3 times as big as the original 
one, so I think it's better to edit the original PDF without converting 
it to other raster formats.
The PDF looks like this: http://p.defau.lt/?8I_tQEf0Q2SZpi9CJx6I8A
Apparently, we need to remove this image:
     /GxMWCL: 18 0 R, 187 x 248
Which is like this in other PDFs: http://p.defau.lt/?I1lqfJPL8ociEfOpvTfPaA
How can I do it?
Thank you,
     Federico


More information about the poppler mailing list