[poppler] Java version of poppler

Thomas Freitag Thomas.Freitag at kabelmail.de
Tue Apr 30 06:11:29 PDT 2013

Hi all!

Due to several requests here with no really response I know that there 
is no really need here, but nevertheless, I want to announce, that I 
successfully completed and regtested the porting of that part of poppler 
I need for my purpose to Java last weekend.

My cms-images.jar is not an executable but a library which contains methods
- to extract metadata
- modify metadata
- extract thumbnails
- repair structures
- extract PDF pages
In other words, pdfinfo, pdffont and pdfseparate can be done in pure 
Java. For this I ported over 80 % of poppler core code to Java. What is 
missing are the pure image streams like DCT and JPEG2000, font handling 
and some of the annotation stuff. And I ported only one output device, 
PlateScannerOutputDev, which is an extension to PreScanOutputDev, so 
that I can also extract informations about used plate names and 
colorants and if overprint is used, which is rarely used but very 
important in prepress area.

Here an example of the output of a small dump tool written using this 

Dump ./010_ReadMe_Ghent_Output_Patch.pdf
dc:creator : Andy Psarianos
dc:description : Ghent Output Patches
dc:format : application/pdf
dc:rights : Copyright © 2007, Ghent PDF Workgroup (http://www.gwg.org). 
You are encouraged to use the test files as described in the associated 
documentation. However, these files remain the property of and are 
copyrighted by the Ghent PDF Workgroup. You are not allowed to change 
this file (including its name, content and metadata) in any way.
dc:subject : GWG,Ghent PDF Workgroup,Ghent Ouput 
Patches,Documentation,Patch 1.0
dc:title : Patch 1.0 — CMYK Overprint Test Documentation
img:CropBox : [0.00, 0.00, 595.28 792.00],[0.00, 0.00, 595.28 
792.00],[0.00, 0.00, 595.28 792.00],[0.00, 0.00, 595.28 792.00]
img:MediaBox : [0.00, 0.00, 595.28 792.00],[0.00, 0.00, 595.28 
792.00],[0.00, 0.00, 595.28 792.00],[0.00, 0.00, 595.28 792.00]
pdf:Keywords : GWG; Ghent PDF Workgroup; Ghent Ouput Patches; 
Documentation; Patch 1.0
pdf:PDFVersion : 1.6
pdf:Producer : Adobe PDF Library 7.0
pdf:Trapped : false
pdfx:GTS_PDFXConformance : PDF/X-3:2002
pdfx:GTS_PDFXVersion : PDF/X-3:2002
photoshop:CaptionWriter : Andy Psarianos, andy at feburman.co.uk
photoshop:ColorMode : CMYK
xmp:CreateDate : 2005-10-10T17:26:10.000Z
xmp:CreatorTool : Adobe InDesign CS2 (4.0.4)
xmp:MetadataDate : 2007-01-18T18:12:51Z
xmp:ModifyDate : 2007-01-18T18:12:51.000Z
xmpMM:DocumentID : adobe:docid:indd:d4032c63-670f-11da-8e2e-82fceca79e51
xmpMM:InstanceID : uuid:b2114939-23fd-f342-9e5c-c48aa66e67b4
xmpMM:RenditionClass : proof:pdf
xmpRights:Marked : true
xmpRights:WebStatement : http://www.gwg.org
xmpTPg:Colorants : 'PANTONE 2405 C' - CMYK(34.00, 100.00, 0.00, 0.00)
xmpTPg:Fonts : (MyriadPro-Semibold / Myriad Pro Light / Bold / Type 1C / 
null / false),(MyriadPro-Regular / Myriad Pro / Regular / Type 1C / null 
/ false),(Verdana / Verdana / Regular / Type 1C / null / 
false),(MyriadPro-SemiboldCond / Myriad Pro Light Cond / Bold Condensed 
/ Type 1C / null / false),(Verdana-Bold / Verdana / Bold / Type 1C / 
null / false)
xmpTPg:HasVisibleOverprint : true
xmpTPg:HasVisibleTransparency : false
xmpTPg:MaxPageSize : 595.28 x 792.00 point
xmpTPg:NPages : 4
xmpTPg:PlateNames : Cyan,Magenta,Yellow,Black,PANTONE 2405 C

The name of the properties are XMP compliant, and the library also 
contains the XMP scheme which is parsable by the included XMPReader and 
writable by the XMPWriter, even if the scheme itself isn't used, the 
implementation was done with JAXB.
The cms-images.jar does not only support PDF (this was my last but not 
least supported format) but also JPG, TIFF and PNG in the moment. But 
because it is completely undocumented at the moment, I don't want to 
publish it as yet, but I plan to do it sometime this year. If anyone 
here about anybody who is interested in it in the meantime: he can 
contact me and I'll provide him the source code or anything else he need.


More information about the poppler mailing list