[poppler] Toward to JBIG2 support in CairoOutputDev

suzuki toshiya mpsuzuki at hiroshima-u.ac.jp
Wed Dec 31 19:39:11 PST 2014


Dear Adrian,

Thank you for detailed explanation how to pass JBIG2
data to cairo surface.

> 4a) Set CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID on the image to some unique
> identifier. For your example I suggest "6-0". The namespace is unique to
> the CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID mime type so you do not need any
> prefix like "pdf-jbig2-globals-".

I was not aware of this! thanks.

> You could
>  1) add a function to JBIG2Stream to return the ref of the stream, or
>  2) walk through the image stream dictionary (provided to
> CairoOutputDev::drawImage) and pick out the global stream.

In my personal opinion, 1) sounds as better encapsulation...

>> Furthermore, we could imagine a worse case, differently
>> chained reference to same object;
>
> The poppler lookup functions should follow the chain and return the
> final ref unless you use the *NF (no follow) variants.
>
> eg
> Object::dictLookup / Object::dictLookupNF

Oh, Thanks! I was wondering what is "NF", although I
could found these methods do not follow the final object.

In my preliminary patch, I extended JBIG2Stream class
to hold the last referring object to the Globals stream
object (both of GlobalsStream object and the referring
object are hold, thus, ABI compatibility is broken),
and find them by repeating XRef.fetch(), like this:


   } else if (!strcmp(name, "JBIG2Decode")) {
+    obj.free();
     if (params->isDict()) {
       params->dictLookup("JBIG2Globals", &globals, recursion);
+      if (globals.isStream()) {
+        params->dictLookupNF("JBIG2Globals", &obj);
+        XRef *xref = params->getDict()->getXRef();
+        Object rObj;
+        while (xref->fetch(obj.getRefNum(), obj.getRefGen(), &rObj)->isRef()) {
+          obj.free();
+          rObj.copy(&obj);
+        }
+        rObj.free();
+      }
     }
-    str = new JBIG2Stream(str, &globals);
+    str = new JBIG2Stream(str, &globals, &obj);

# JBIG2Stream constructor is extended to receive the
# referring object before JBIG2Stream.

I'm not sure whether I should add new public setter
(because the ref to Globals stream should be readable
by CairoOutputDev, protected or private methods could
not serve for CairoOutputDev) to manage the ref to
Globals - in fact, there is no public setter for
GlobalsStream (only public getter is provided). So
I think the extension of the constructor is better.

Regards,
mpsuzuki
Adrian Johnson wrote:
> On 31/12/14 18:59, suzuki toshiya wrote:
>> Cairo interface to manage JBIG2Globals
>> --------------------------------------
>>
>> In cairo, we can pass 3 kinds related to JBIG2 data
>> via cairo_surface_set_mime_data() API;
>> 1) JBIG2 data itself (the stream in "5 0 obj" itself, in
>> above example),
>> 2) JBIG2 global data (the stream in "6 0 R" in above example),
>> 3) Unique ID to specify which JBIG2 global data should be
>> used in the decoding process.
>>
>> Yet I'm not fully understanding the official design in cairo,
>> it seems that: unique-id (3) is passed for first, and JBIG2
>> image (1) is passed in next, and finally JBIG2 global data
>> (2) is passed - when JBIG2 image is passed, cairo bind it
>> with the latest declaration of the unique-id, and, when
>> JBIG2 global data (2) is passed to cairo, cairo binds it
>> with the latest declared unique-id. Therefore, even if
>> we repeat sending same JBIG2 global data (2), as far as
>> we don't change unique-id (3), only 1 JBIG2 global data
>> is emitted to PDF output.
> 
> The usage of cairo JBIG2 API would go something like this:
> 
> For each JBIG2 image encountered by CairoOutputDev:
> 1) Create a cairo image surface (CairoOutputDev already does this).
> 
> 2) Set CAIRO_MIME_TYPE_UNIQUE_ID on the image to ensure only one
> instance of each image is embedded (CairoOutputDev already does this).
> 
> 3) Set CAIRO_MIME_TYPE_JBIG2 on the image to the JBIG2 data (the 5 0
> stream in your example above).
> 
> 4) If the JBIG2 stream uses global data:
> 
> 4a) Set CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID on the image to some unique
> identifier. For your example I suggest "6-0". The namespace is unique to
> the CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID mime type so you do not need any
> prefix like "pdf-jbig2-globals-".
> 
> 4b) Set CAIRO_MIME_TYPE_JBIG2_GLOBAL on the image to the global data.
> You only need to do this for once for each of the images that share the
> same CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID. Setting it on more than one is
> harmless. Cairo will only embed one copy.
> 
> 6) Paint the image (CairoOutputDev already does this).
> 
> 
>> The problem is "how we can determine the unique-id for
>> JBIG2 global data?".
>>
>> Problem to make a unique-id for JBIG2Globals in PDF
>> ---------------------------------------------------
>>
>> The easiest & straight-forward idea would be using the
>> object reference and generation number (referring the
>> JBIG2 global data) to form a unique-id. In above example,
>> we can declare as "pdf-jbig2-globals-6-0".
>> But, it seems that current design of JBIG2Stream hold
>> the stream itself, not the indirect object referring
>> to the stream (in above example, JBIG2Stream class
>> could access to the content of "6 0 R" stream, but
>> could not know how it is referred - the reference number
>> (=6) and generation number (=0)).
> 
> You could
>  1) add a function to JBIG2Stream to return the ref of the stream, or
>  2) walk through the image stream dictionary (provided to
> CairoOutputDev::drawImage) and pick out the global stream.
> 
>> Furthermore, we could imagine a worse case, differently
>> chained reference to same object;
> 
> The poppler lookup functions should follow the chain and return the
> final ref unless you use the *NF (no follow) variants.
> 
> eg
> Object::dictLookup / Object::dictLookupNF
> Object::arrayGet / Object::arrayGetNF
> Dict::lookup / Dict::lookupNF
> 
> 



More information about the poppler mailing list