[poppler] Toward to JBIG2 support in CairoOutputDev

suzuki toshiya mpsuzuki at hiroshima-u.ac.jp
Tue Jan 6 08:28:32 PST 2015


Hi all,

Here I submit a preliminary patch for git head,
which enables JBIG2 embedding to PDF surface via Cairo.
I attach the patch and testing PDF including JBIG2 data
with Globals (testing PDF is based on Cairo's JBIG2 test
data). I wish if anybody can review.

Regards,
mpsuzuki

suzuki toshiya wrote:
> Dear Adrian,
>
> Thank you for detailed explanation how to pass JBIG2
> data to cairo surface.
>
>> 4a) Set CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID on the image to some unique
>> identifier. For your example I suggest "6-0". The namespace is unique to
>> the CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID mime type so you do not need any
>> prefix like "pdf-jbig2-globals-".
>
> I was not aware of this! thanks.
>
>> You could
>>   1) add a function to JBIG2Stream to return the ref of the stream, or
>>   2) walk through the image stream dictionary (provided to
>> CairoOutputDev::drawImage) and pick out the global stream.
>
> In my personal opinion, 1) sounds as better encapsulation...
>
>>> Furthermore, we could imagine a worse case, differently
>>> chained reference to same object;
>>
>> The poppler lookup functions should follow the chain and return the
>> final ref unless you use the *NF (no follow) variants.
>>
>> eg
>> Object::dictLookup / Object::dictLookupNF
>
> Oh, Thanks! I was wondering what is "NF", although I
> could found these methods do not follow the final object.
>
> In my preliminary patch, I extended JBIG2Stream class
> to hold the last referring object to the Globals stream
> object (both of GlobalsStream object and the referring
> object are hold, thus, ABI compatibility is broken),
> and find them by repeating XRef.fetch(), like this:
>
>
>     } else if (!strcmp(name, "JBIG2Decode")) {
> +    obj.free();
>       if (params->isDict()) {
>         params->dictLookup("JBIG2Globals", &globals, recursion);
> +      if (globals.isStream()) {
> +        params->dictLookupNF("JBIG2Globals", &obj);
> +        XRef *xref = params->getDict()->getXRef();
> +        Object rObj;
> +        while (xref->fetch(obj.getRefNum(), obj.getRefGen(), &rObj)->isRef()) {
> +          obj.free();
> +          rObj.copy(&obj);
> +        }
> +        rObj.free();
> +      }
>       }
> -    str = new JBIG2Stream(str, &globals);
> +    str = new JBIG2Stream(str, &globals, &obj);
>
> # JBIG2Stream constructor is extended to receive the
> # referring object before JBIG2Stream.
>
> I'm not sure whether I should add new public setter
> (because the ref to Globals stream should be readable
> by CairoOutputDev, protected or private methods could
> not serve for CairoOutputDev) to manage the ref to
> Globals - in fact, there is no public setter for
> GlobalsStream (only public getter is provided). So
> I think the extension of the constructor is better.
>
> Regards,
> mpsuzuki
> Adrian Johnson wrote:
>> On 31/12/14 18:59, suzuki toshiya wrote:
>>> Cairo interface to manage JBIG2Globals
>>> --------------------------------------
>>>
>>> In cairo, we can pass 3 kinds related to JBIG2 data
>>> via cairo_surface_set_mime_data() API;
>>> 1) JBIG2 data itself (the stream in "5 0 obj" itself, in
>>> above example),
>>> 2) JBIG2 global data (the stream in "6 0 R" in above example),
>>> 3) Unique ID to specify which JBIG2 global data should be
>>> used in the decoding process.
>>>
>>> Yet I'm not fully understanding the official design in cairo,
>>> it seems that: unique-id (3) is passed for first, and JBIG2
>>> image (1) is passed in next, and finally JBIG2 global data
>>> (2) is passed - when JBIG2 image is passed, cairo bind it
>>> with the latest declaration of the unique-id, and, when
>>> JBIG2 global data (2) is passed to cairo, cairo binds it
>>> with the latest declared unique-id. Therefore, even if
>>> we repeat sending same JBIG2 global data (2), as far as
>>> we don't change unique-id (3), only 1 JBIG2 global data
>>> is emitted to PDF output.
>>
>> The usage of cairo JBIG2 API would go something like this:
>>
>> For each JBIG2 image encountered by CairoOutputDev:
>> 1) Create a cairo image surface (CairoOutputDev already does this).
>>
>> 2) Set CAIRO_MIME_TYPE_UNIQUE_ID on the image to ensure only one
>> instance of each image is embedded (CairoOutputDev already does this).
>>
>> 3) Set CAIRO_MIME_TYPE_JBIG2 on the image to the JBIG2 data (the 5 0
>> stream in your example above).
>>
>> 4) If the JBIG2 stream uses global data:
>>
>> 4a) Set CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID on the image to some unique
>> identifier. For your example I suggest "6-0". The namespace is unique to
>> the CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID mime type so you do not need any
>> prefix like "pdf-jbig2-globals-".
>>
>> 4b) Set CAIRO_MIME_TYPE_JBIG2_GLOBAL on the image to the global data.
>> You only need to do this for once for each of the images that share the
>> same CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID. Setting it on more than one is
>> harmless. Cairo will only embed one copy.
>>
>> 6) Paint the image (CairoOutputDev already does this).
>>
>>
>>> The problem is "how we can determine the unique-id for
>>> JBIG2 global data?".
>>>
>>> Problem to make a unique-id for JBIG2Globals in PDF
>>> ---------------------------------------------------
>>>
>>> The easiest & straight-forward idea would be using the
>>> object reference and generation number (referring the
>>> JBIG2 global data) to form a unique-id. In above example,
>>> we can declare as "pdf-jbig2-globals-6-0".
>>> But, it seems that current design of JBIG2Stream hold
>>> the stream itself, not the indirect object referring
>>> to the stream (in above example, JBIG2Stream class
>>> could access to the content of "6 0 R" stream, but
>>> could not know how it is referred - the reference number
>>> (=6) and generation number (=0)).
>>
>> You could
>>   1) add a function to JBIG2Stream to return the ref of the stream, or
>>   2) walk through the image stream dictionary (provided to
>> CairoOutputDev::drawImage) and pick out the global stream.
>>
>>> Furthermore, we could imagine a worse case, differently
>>> chained reference to same object;
>>
>> The poppler lookup functions should follow the chain and return the
>> final ref unless you use the *NF (no follow) variants.
>>
>> eg
>> Object::dictLookup / Object::dictLookupNF
>> Object::arrayGet / Object::arrayGetNF
>> Dict::lookup / Dict::lookupNF
>>
>>
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: JBIG2WithGlobals.pdf
Type: application/pdf
Size: 2114 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20150107/ef8a4966/attachment.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: poppler-cairo-jbig2_20140106a.diff.xz
Type: application/x-xz
Size: 2628 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20150107/ef8a4966/attachment.bin>


More information about the poppler mailing list