[poppler] Toward to JBIG2 support in CairoOutputDev

Adrian Johnson ajohnson at redneon.com
Tue Jan 20 04:10:07 PST 2015


I've tested the lastest patch. It works fine but I noticed about a 50%
slow down with pdfs containing a lot of JBIG2 images.

I've uploaded a couple of sample pdfs here:
http://people.freedesktop.org/~ajohnson/jbig2/

Below are the timings for each document using git master and git master
with the jbig2 patch. I used the command line
pdftocairo -pdf input.pdf output.pdf


- AnnualReport.pdf
master
real    0m13.562s
user    0m12.620s
sys     0m0.820s

jbig2
real    0m21.045s
user    0m20.000s
sys     0m1.060s
	

- The_Miseries_of_Human_Life.pdf
master
real    0m32.690s
user    0m29.716s
sys     0m2.832s

jbig2
real    0m42.793s
user    0m39.680s
sys     0m2.848s


While it is still worth a small performance reduction to get the
significant reduction in the output file size, it may be worth seeing if
there is a simple solution to the slowdown. My first guess is to try
attaching the global data to only one image instead of to every image.


On 20/01/15 02:42, suzuki toshiya wrote:
> Dear Carlos,
> 
> Sorry! I had a silly mistake, I should pass ref->getRef()
> (in my previous patch, I passed ref->isRef()).
> I attach corrected patch.
> 
> Regards,
> mpsuzuki
> 
> suzuki toshiya wrote:
>> Dear Carlos,
>>
>> Thank you for finding out the points! I attached revised patch.
>> The difference from my previous revision is following.
>>
>> Regards,
>> mpsuzuki
>>
>>
>> diff --git a/poppler/CairoOutputDev.cc b/poppler/CairoOutputDev.cc
>> index a428d2c..17b6b3b 100644
>> --- a/poppler/CairoOutputDev.cc
>> +++ b/poppler/CairoOutputDev.cc
>> @@ -2833,13 +2833,12 @@ void CairoOutputDev::setMimeData(GfxState *state, Stream *str, Object *ref,
>>  #endif
>>
>>    if (getStreamData (str->getNextStream(), &strBuffer, &len)) {
>> -    cairo_status_t status;
>> +    cairo_status_t status = CAIRO_STATUS_SUCCESS;
>>
>>  #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2)
>>      if (ref && ref->isRef()) {
>> -      Ref imgRef = ref->getRef();
>>        status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID,
>> -                                "poppler-surface-", imgRef);
>> +                                "poppler-surface-", ref->isRef());
>>      }
>>  #endif
>>      if (!status) {
>>
>>
>>
>> Carlos Garcia Campos wrote:
>>> Adrian Johnson <ajohnson at redneon.com> writes:
>>>
>>>> The latest patch looks good. I'll wait a week before committing to see
>>>> if anyone else has any comments. This will give me time to do some
>>>> testing of the patch.
>>> Looks good to me as well in general. There's only one thing I've
>>> noticed.
>>>
>>>> @@ -2746,31 +2827,28 @@ void CairoOutputDev::setMimeData(GfxState *state, Stream *str, Object *ref,
>>>>    if (!colorMapHasIdentityDecodeMap(colorMap))
>>>>      return;
>>>>
>>>> +#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)
>>>> +  if (strKind == strJBIG2 && !setMimeDataForJBIG2Globals(str, image))
>>>> +    return;
>>>> +#endif
>>>> +
>>>>    if (getStreamData (str->getNextStream(), &strBuffer, &len)) {
>>>> -    cairo_status_t st;
>>>> +    cairo_status_t status;
>>>>
>>>>  #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2)
>>>>      if (ref && ref->isRef()) {
>>>>        Ref imgRef = ref->getRef();
>>>> -      GooString *surfaceId = new GooString("poppler-surface-");
>>>> -      surfaceId->appendf("{0:d}-{1:d}", imgRef.gen, imgRef.num);
>>>> -      char *idBuffer = copyString(surfaceId->getCString());
>>>> -      st = cairo_surface_set_mime_data (image, CAIRO_MIME_TYPE_UNIQUE_ID,
>>>> -                                        (const unsigned char *)idBuffer,
>>>> -                                        surfaceId->getLength(),
>>>> -                                        gfree, idBuffer);
>>>> -      if (st)
>>>> -        gfree(idBuffer);
>>>> -      delete surfaceId;
>>>> +      status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID,
>>>> +                                "poppler-surface-", imgRef);
>>> We could pass ref->getRef() directly here.
>>>
>>>>      }
>>>>  #endif
>>>> +    if (!status) {
>>> status might be uninitialized at this point if cairo version is not
>>> recent enough or the ref && ref->isRef() condition is False.
>>>
>>>> +      status = cairo_surface_set_mime_data (image, mime_type,
>>>> +					    (const unsigned char *)strBuffer, len,
>>>> +					    gfree, strBuffer);
>>>> +    }
>>>  
>>>> -    st = cairo_surface_set_mime_data (image,
>>>> -				      str->getKind() == strDCT ?
>>>> -				      CAIRO_MIME_TYPE_JPEG : CAIRO_MIME_TYPE_JP2,
>>>> -				      (const unsigned char *)strBuffer, len,
>>>> -				      gfree, strBuffer);
>>>> -    if (st)
>>>> +    if (status)
>>>>        gfree (strBuffer);
>>>>    }
>>>>  }
>>>
>>>> On 11/01/15 23:05, suzuki toshiya wrote:
>>>>> Dear Adrian,
>>>>>
>>>>> OK - considering that there is a lag to the day that Linux
>>>>> distribution ship the poppler stable branch including
>>>>> CairoOutputDev with JBIG2 support even if I support some
>>>>> snapshot of Cairo in current development branch of poppler
>>>>> (and there would be no need to support Cairo snapshot in
>>>>> the day), I decided to follow original version checking.
>>>>>
>>>>> Also I removed the default value (NULL) for Ref object to
>>>>> JBIG2Stream constructor and removed NULL pointer checking
>>>>> in JBIG2Stream constructor (as same as the argument to pass
>>>>> Globals object).
>>>>>
>>>>> I replaced "st" in CairoOutputDev, by "status".
>>>>>
>>>>> Here I attached revised patch - oh, I'm quite sorry, my
>>>>> previous posts were misunderstanding this year as 2014.
>>>>>
>>>>> Regards,
>>>>> mpsuzuki
>>>>>
>>>>> Adrian Johnson wrote:
>>>>>> On 08/01/15 13:35, suzuki toshiya wrote:
>>>>>>> Dear Adrian,
>>>>>>>
>>>>>>> Great Thank you for prompt review! I attached the revised patch,
>>>>>>> but please let me discuss a few points.
>>>>>>>
>>>>>>> Adrian Johnson wrote:
>>>>>>>> On 07/01/15 02:58, suzuki toshiya wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Here I submit a preliminary patch for git head,
>>>>>>>>> which enables JBIG2 embedding to PDF surface via Cairo.
>>>>>>>> +static cairo_status_t setJBIG2GlobalsIdByRefNums(int refNum, int
>>>>>>>> refGen,
>>>>>>>> +                                                 cairo_surface_t
>>>>>>>> *image)
>>>>>>>> +{
>>>>>>>> +  GooString *globalsStrId;
>>>>>>>> +  char *idBuffer;
>>>>>>>> +  cairo_status_t st;
>>>>>>>> +
>>>>>>>> +
>>>>>>>> +  globalsStrId = new GooString;
>>>>>>>> +  globalsStrId->appendf("{0:d}-{1:d}", refNum, refGen);
>>>>>>>> +  idBuffer = copyString(globalsStrId->getCString());
>>>>>>>> +  st = cairo_surface_set_mime_data (image,
>>>>>>>> CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID,
>>>>>>>>
>>>>>>>> The cairo JBIG2 mime types are new to 1.14.0 so will need to be
>>>>>>>> inside a
>>>>>>>> #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0).
>>>>>>> If it's acceptable, I want to use the conditional like
>>>>>>>     #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID
>>>>>>> The official release of cairo including JBIG2 support is 1.14.0,
>>>>>>> however, some distributions use a snapshot before 1.14 which
>>>>>>> supports JBIG2. For example, Ubuntu 14.04 (long time support)
>>>>>>> ships with 1.13.0~20140204.
>>>>>>>    
>>>>>>> http://packages.ubuntu.com/search?suite=trusty&section=all&arch=any&keywords=libcairo2&searchon=names
>>>>>>>
>>>>>>>
>>>>>>> It includes JBIG2 support. Thus I prefer checking by the JBIG2
>>>>>>> macros, instead of the version checking.
>>>>>> I'm not keen on making an exception for this case. The target audience
>>>>>> for this is very small. It would only help users that compile a new
>>>>>> version of poppler on an old distribution without also recompiling
>>>>>> cairo. JBIG2 support for pdftocairo is a very minor feature that not
>>>>>> many people would be using (so far no one else has requested this
>>>>>> feature).
>>>>>>
>>>>>> If you really want to support distributions using a cairo 1.13 snapshot
>>>>>> I suggest adding a comment before the first occurrence of
>>>>>> #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID to explain why the version macro
>>>>>> is not used. Also include
>>>>>> "#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)" in the comment so
>>>>>> that:
>>>>>> a) searching for CAIRO_VERSION will find it, and
>>>>>> b) to document that JBIG2 support is in 1.14 or later.
>>>>>>
>>>>>>> BTW, when the string for UNIQUE_ID is constructed, its syntax
>>>>>>> is RefGen-RefNum. In PDF data, RefNum then RefGen is popular
>>>>>>> syntax. Is there any reason why the order is reverted?
>>>>>> No reason. Feel free to change it to num-gen order.
>>>>>>
>>>>>>>> +GBool CairoOutputDev::setMimeDataForJBIG2Globals(Stream  *str,
>>>>>>>> +                                                 cairo_surface_t
>>>>>>>> *image)
>>>>>>>> +{
>>>>>>>> +  JBIG2Stream *jbig2Str = static_cast<JBIG2Stream *>(str);
>>>>>>>> +  Object* globalsStr = jbig2Str->getGlobalsStream();
>>>>>>>> +  char *globalsBuffer;
>>>>>>>> +  int globalsLength;
>>>>>>>> +
>>>>>>>> +  // According to PDF spec, Globals should be stream (or nothing)
>>>>>>>> +
>>>>>>>> +  if (globalsStr->isNull())
>>>>>>>> +    return gTrue;
>>>>>>>> +
>>>>>>>> +  if (!globalsStr->isStream())
>>>>>>>> +    return gFalse;
>>>>>>>> +
>>>>>>>> +  // See JBIG2Stream constructor for valid reference number
>>>>>>>> +  if (!jbig2Str->getGlobalsStreamRefNum())
>>>>>>>> +    return gFalse;
>>>>>>>>
>>>>>>>> The globalsStr->isStream() check should be sufficient. The reference
>>>>>>>> number should always be valid of the global stream is valid.
>>>>>>> Umm. Requesting "both of JBIG2Stream::globalsStream and Ref to
>>>>>>> it must be always valid as far as GlobalsStream is used" is
>>>>>>> intuitive solution, but public JBIG2Stream constructor has not
>>>>>>> requested the Ref to GlobalsStream before in earlier version.
>>>>>>>
>>>>>>> If some developers use JBIG2Stream constructor with valid
>>>>>>> GlobalsStream but without Ref to it (as the syntax before my
>>>>>>> patch), there is the situation that JBIG2Stream::globalsStream
>>>>>>> is valid but Ref to it is invalid (if I supply the default
>>>>>>> value for the argument to pass Ref). How do we detect or prevent
>>>>>>> such case?
>>>>>>>
>>>>>>> A) Supplying the default value (NULL pointer) in the argument
>>>>>>> to pass the Ref, and JBIG2Stream returns an error when
>>>>>>> globalsStream is valid stream but its Ref is not passed - and
>>>>>>> prevent the case that globalsStream is valid but its Ref is
>>>>>>> invalid.
>>>>>>>
>>>>>>> B) Not supplying the default value, and prevent the construction
>>>>>>> without the Ref to the globalsStream.
>>>>>>>
>>>>>>> C) Add new boolean value in JBIG2Stream class, to indicate if
>>>>>>> Ref to globalsStream is explicitly set.
>>>>>>>
>>>>>>> D) Initialize Ref with irregular value (e.g. 0,0 or 0,65535)
>>>>>>> internally when Ref is not passed, and deal the case as Ref to
>>>>>>> the globalsStream is not initialized (my last patch design)
>>>>>>>
>>>>>>> Which is the best? Or, no need to case such case?
>>>>>> JBIG2Stream is not part of the public API so there is no need to
>>>>>> preserve compatibility.
>>>>>>
>>>>>> Something else I just noticed - could you rename "st" to "status".
>>>>>> CairoOutputdev.cc always uses the name "status" for cairo status so we
>>>>>> should be consistent with this convention.
>>>>>>
>>>> _______________________________________________
>>>> poppler mailing list
>>>> poppler at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
> 
> 
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
> 



More information about the poppler mailing list