[poppler] Compatibility between poppler's pdfunite and JHOVE.

Leonard Rosenthol lrosenth at adobe.com
Fri Apr 7 20:49:02 UTC 2017


Pdfunite will not properly merge PDF/A files – in fact, the only program that does the right thing would be Adobe Acrobat.   Merging PDF/A files is NOT the same as merging regular PDFs – you need a product that understands the special requirements of PDF/A.

JHOVE is giving you problems for two reasons – 1) the result after merge is not a valid PDF/A and 2) JHOVE doesn’t support newer PDF features that may be present in the merged version.

Leonard

On 4/7/17, 3:44 PM, "Russell McOrmond" <Russell.McOrmond at canadiana.ca> wrote:

    Replying to https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Fpoppler%2F2017-April%2F012147.html&data=02%7C01%7C%7C7154d28aa3644bbd549308d47deea6f1%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636271911338929689&sdata=lZZaBnWDnmtoU4pxeUIhG%2FP1u3YgGQ4AO0TyBKlarrE%3D&reserved=0
    , On Fri, Apr 7, 2017 at 1:33 PM, Leonard Rosenthol
    <lrosenth at adobe.com> wrote:
    
    > Can I assume that you are aware that JHOVE is NOT a PDF validator in any way?  In addition, it’s support for modern PDF feature is quite out of date!  And their own site (<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjhove.openpreservation.org%2Fmodules%2Fpdf%2F&data=02%7C01%7C%7C7154d28aa3644bbd549308d47deea6f1%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636271911338929689&sdata=%2BZy2Pm69%2BAVsR6rOvDsvKl813exfd2UmTl3c7HOTQAY%3D&reserved=0>) says as much.  I suspect that if you ran these files through a more thorough PDF validation, such as the one in Adobe Acrobat Pro, it would not report any problems.
    >
    > Leonard
    
    
      Canadiana runs a preservation platform. We want to identify and
    disallow files that aren't encoded in the publicly documented format
    or that use features that aren't appropriate for long-term
    preservation (PDF/A).  What we need is a tool to take multiple PDF/A
    files and join them together, with the result also being a PDF/A file.
    This is something I presumed pdfunite could do, but that might not be
    the case.
    
      If it turns out that poppler is using features that are
    inappropriate for preservation then this would mean we need to
    discontinue our use of poppler.  In that case messaging from JHOVE
    would be helpful to know that the problem is with a specific feature
    that poppler is using (the current messaging isn't very helpful).  At
    this point I do not know if the problem is in poppler or JHOVE (or
    both).
    
    
    https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fverapdf.org%2Fsoftware%2F&data=02%7C01%7C%7C7154d28aa3644bbd549308d47deea6f1%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636271911338929689&sdata=q6%2BAAwN4ZsYu7giUOlYbI5sgvpYw2nIWCJWxqR2YWao%3D&reserved=0 is far more verbose in its XML output,
    but I didn't think its messages would be as helpful.  The text format
    output offers a simple pass/fail.
    
    russell at russell-desktop:/opt/wip/Temp/rwm$ verapdf --format text
    MississaugaNews_2/0001.pdf
    PASS /opt/wip/Temp/rwm/MississaugaNews_2/0001.pdf
    russell at russell-desktop:/opt/wip/Temp/rwm$ verapdf --format text
    MississaugaNews_2/0002.pdf
    PASS /opt/wip/Temp/rwm/MississaugaNews_2/0002.pdf
    russell at russell-desktop:/opt/wip/Temp/rwm$ verapdf --format text pdfunite.pdf
    FAIL /opt/wip/Temp/rwm/pdfunite.pdf
    russell at russell-desktop:/opt/wip/Temp/rwm$
    
    -- 
    System Administration and software developer,
    Canadiana.org   https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.canadiana.ca&data=02%7C01%7C%7C7154d28aa3644bbd549308d47deea6f1%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636271911338929689&sdata=NHVspRgJ23LRaJXJ5%2B6DM2Ztmrafnydcg2p0btO5g7g%3D&reserved=0
    



More information about the poppler mailing list