[poppler] Compatibility between poppler's pdfunite and JHOVE.
Leonard Rosenthol
lrosenth at adobe.com
Fri Apr 7 20:49:02 UTC 2017
Pdfunite will not properly merge PDF/A files – in fact, the only program that does the right thing would be Adobe Acrobat. Merging PDF/A files is NOT the same as merging regular PDFs – you need a product that understands the special requirements of PDF/A.
JHOVE is giving you problems for two reasons – 1) the result after merge is not a valid PDF/A and 2) JHOVE doesn’t support newer PDF features that may be present in the merged version.
Leonard
On 4/7/17, 3:44 PM, "Russell McOrmond" <Russell.McOrmond at canadiana.ca> wrote:
Replying to https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Fpoppler%2F2017-April%2F012147.html&data=02%7C01%7C%7C7154d28aa3644bbd549308d47deea6f1%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636271911338929689&sdata=lZZaBnWDnmtoU4pxeUIhG%2FP1u3YgGQ4AO0TyBKlarrE%3D&reserved=0
, On Fri, Apr 7, 2017 at 1:33 PM, Leonard Rosenthol
<lrosenth at adobe.com> wrote:
> Can I assume that you are aware that JHOVE is NOT a PDF validator in any way? In addition, it’s support for modern PDF feature is quite out of date! And their own site (<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjhove.openpreservation.org%2Fmodules%2Fpdf%2F&data=02%7C01%7C%7C7154d28aa3644bbd549308d47deea6f1%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636271911338929689&sdata=%2BZy2Pm69%2BAVsR6rOvDsvKl813exfd2UmTl3c7HOTQAY%3D&reserved=0>) says as much. I suspect that if you ran these files through a more thorough PDF validation, such as the one in Adobe Acrobat Pro, it would not report any problems.
>
> Leonard
Canadiana runs a preservation platform. We want to identify and
disallow files that aren't encoded in the publicly documented format
or that use features that aren't appropriate for long-term
preservation (PDF/A). What we need is a tool to take multiple PDF/A
files and join them together, with the result also being a PDF/A file.
This is something I presumed pdfunite could do, but that might not be
the case.
If it turns out that poppler is using features that are
inappropriate for preservation then this would mean we need to
discontinue our use of poppler. In that case messaging from JHOVE
would be helpful to know that the problem is with a specific feature
that poppler is using (the current messaging isn't very helpful). At
this point I do not know if the problem is in poppler or JHOVE (or
both).
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fverapdf.org%2Fsoftware%2F&data=02%7C01%7C%7C7154d28aa3644bbd549308d47deea6f1%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636271911338929689&sdata=q6%2BAAwN4ZsYu7giUOlYbI5sgvpYw2nIWCJWxqR2YWao%3D&reserved=0 is far more verbose in its XML output,
but I didn't think its messages would be as helpful. The text format
output offers a simple pass/fail.
russell at russell-desktop:/opt/wip/Temp/rwm$ verapdf --format text
MississaugaNews_2/0001.pdf
PASS /opt/wip/Temp/rwm/MississaugaNews_2/0001.pdf
russell at russell-desktop:/opt/wip/Temp/rwm$ verapdf --format text
MississaugaNews_2/0002.pdf
PASS /opt/wip/Temp/rwm/MississaugaNews_2/0002.pdf
russell at russell-desktop:/opt/wip/Temp/rwm$ verapdf --format text pdfunite.pdf
FAIL /opt/wip/Temp/rwm/pdfunite.pdf
russell at russell-desktop:/opt/wip/Temp/rwm$
--
System Administration and software developer,
Canadiana.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.canadiana.ca&data=02%7C01%7C%7C7154d28aa3644bbd549308d47deea6f1%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636271911338929689&sdata=NHVspRgJ23LRaJXJ5%2B6DM2Ztmrafnydcg2p0btO5g7g%3D&reserved=0
More information about the poppler
mailing list