[poppler] Toward to JBIG2 support in CairoOutputDev

suzuki toshiya mpsuzuki at hiroshima-u.ac.jp
Tue Jan 20 04:44:41 PST 2015


Dear Adrian,

Great Thank you for benchmark evaluation.
I will try to analyze the slowdown and
improve the performance.

Regards,
mpsuzuki

Adrian Johnson wrote:
> I've tested the lastest patch. It works fine but I noticed about a 50%
> slow down with pdfs containing a lot of JBIG2 images.
> 
> I've uploaded a couple of sample pdfs here:
> http://people.freedesktop.org/~ajohnson/jbig2/
> 
> Below are the timings for each document using git master and git master
> with the jbig2 patch. I used the command line
> pdftocairo -pdf input.pdf output.pdf
> 
> 
> - AnnualReport.pdf
> master
> real    0m13.562s
> user    0m12.620s
> sys     0m0.820s
> 
> jbig2
> real    0m21.045s
> user    0m20.000s
> sys     0m1.060s
> 	
> 
> - The_Miseries_of_Human_Life.pdf
> master
> real    0m32.690s
> user    0m29.716s
> sys     0m2.832s
> 
> jbig2
> real    0m42.793s
> user    0m39.680s
> sys     0m2.848s
> 
> 
> While it is still worth a small performance reduction to get the
> significant reduction in the output file size, it may be worth seeing if
> there is a simple solution to the slowdown. My first guess is to try
> attaching the global data to only one image instead of to every image.
> 
> 
> On 20/01/15 02:42, suzuki toshiya wrote:
>> Dear Carlos,
>>
>> Sorry! I had a silly mistake, I should pass ref->getRef()
>> (in my previous patch, I passed ref->isRef()).
>> I attach corrected patch.
>>
>> Regards,
>> mpsuzuki
>>
>> suzuki toshiya wrote:
>>> Dear Carlos,
>>>
>>> Thank you for finding out the points! I attached revised patch.
>>> The difference from my previous revision is following.
>>>
>>> Regards,
>>> mpsuzuki
>>>
>>>
>>> diff --git a/poppler/CairoOutputDev.cc b/poppler/CairoOutputDev.cc
>>> index a428d2c..17b6b3b 100644
>>> --- a/poppler/CairoOutputDev.cc
>>> +++ b/poppler/CairoOutputDev.cc
>>> @@ -2833,13 +2833,12 @@ void CairoOutputDev::setMimeData(GfxState *state, Stream *str, Object *ref,
>>>  #endif
>>>
>>>    if (getStreamData (str->getNextStream(), &strBuffer, &len)) {
>>> -    cairo_status_t status;
>>> +    cairo_status_t status = CAIRO_STATUS_SUCCESS;
>>>
>>>  #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2)
>>>      if (ref && ref->isRef()) {
>>> -      Ref imgRef = ref->getRef();
>>>        status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID,
>>> -                                "poppler-surface-", imgRef);
>>> +                                "poppler-surface-", ref->isRef());
>>>      }
>>>  #endif
>>>      if (!status) {
>>>
>>>
>>>
>>> Carlos Garcia Campos wrote:
>>>> Adrian Johnson <ajohnson at redneon.com> writes:
>>>>
>>>>> The latest patch looks good. I'll wait a week before committing to see
>>>>> if anyone else has any comments. This will give me time to do some
>>>>> testing of the patch.
>>>> Looks good to me as well in general. There's only one thing I've
>>>> noticed.
>>>>
>>>>> @@ -2746,31 +2827,28 @@ void CairoOutputDev::setMimeData(GfxState *state, Stream *str, Object *ref,
>>>>>    if (!colorMapHasIdentityDecodeMap(colorMap))
>>>>>      return;
>>>>>
>>>>> +#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)
>>>>> +  if (strKind == strJBIG2 && !setMimeDataForJBIG2Globals(str, image))
>>>>> +    return;
>>>>> +#endif
>>>>> +
>>>>>    if (getStreamData (str->getNextStream(), &strBuffer, &len)) {
>>>>> -    cairo_status_t st;
>>>>> +    cairo_status_t status;
>>>>>
>>>>>  #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2)
>>>>>      if (ref && ref->isRef()) {
>>>>>        Ref imgRef = ref->getRef();
>>>>> -      GooString *surfaceId = new GooString("poppler-surface-");
>>>>> -      surfaceId->appendf("{0:d}-{1:d}", imgRef.gen, imgRef.num);
>>>>> -      char *idBuffer = copyString(surfaceId->getCString());
>>>>> -      st = cairo_surface_set_mime_data (image, CAIRO_MIME_TYPE_UNIQUE_ID,
>>>>> -                                        (const unsigned char *)idBuffer,
>>>>> -                                        surfaceId->getLength(),
>>>>> -                                        gfree, idBuffer);
>>>>> -      if (st)
>>>>> -        gfree(idBuffer);
>>>>> -      delete surfaceId;
>>>>> +      status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID,
>>>>> +                                "poppler-surface-", imgRef);
>>>> We could pass ref->getRef() directly here.
>>>>
>>>>>      }
>>>>>  #endif
>>>>> +    if (!status) {
>>>> status might be uninitialized at this point if cairo version is not
>>>> recent enough or the ref && ref->isRef() condition is False.
>>>>
>>>>> +      status = cairo_surface_set_mime_data (image, mime_type,
>>>>> +					    (const unsigned char *)strBuffer, len,
>>>>> +					    gfree, strBuffer);
>>>>> +    }
>>>>  
>>>>> -    st = cairo_surface_set_mime_data (image,
>>>>> -				      str->getKind() == strDCT ?
>>>>> -				      CAIRO_MIME_TYPE_JPEG : CAIRO_MIME_TYPE_JP2,
>>>>> -				      (const unsigned char *)strBuffer, len,
>>>>> -				      gfree, strBuffer);
>>>>> -    if (st)
>>>>> +    if (status)
>>>>>        gfree (strBuffer);
>>>>>    }
>>>>>  }
>>>>> On 11/01/15 23:05, suzuki toshiya wrote:
>>>>>> Dear Adrian,
>>>>>>
>>>>>> OK - considering that there is a lag to the day that Linux
>>>>>> distribution ship the poppler stable branch including
>>>>>> CairoOutputDev with JBIG2 support even if I support some
>>>>>> snapshot of Cairo in current development branch of poppler
>>>>>> (and there would be no need to support Cairo snapshot in
>>>>>> the day), I decided to follow original version checking.
>>>>>>
>>>>>> Also I removed the default value (NULL) for Ref object to
>>>>>> JBIG2Stream constructor and removed NULL pointer checking
>>>>>> in JBIG2Stream constructor (as same as the argument to pass
>>>>>> Globals object).
>>>>>>
>>>>>> I replaced "st" in CairoOutputDev, by "status".
>>>>>>
>>>>>> Here I attached revised patch - oh, I'm quite sorry, my
>>>>>> previous posts were misunderstanding this year as 2014.
>>>>>>
>>>>>> Regards,
>>>>>> mpsuzuki
>>>>>>
>>>>>> Adrian Johnson wrote:
>>>>>>> On 08/01/15 13:35, suzuki toshiya wrote:
>>>>>>>> Dear Adrian,
>>>>>>>>
>>>>>>>> Great Thank you for prompt review! I attached the revised patch,
>>>>>>>> but please let me discuss a few points.
>>>>>>>>
>>>>>>>> Adrian Johnson wrote:
>>>>>>>>> On 07/01/15 02:58, suzuki toshiya wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> Here I submit a preliminary patch for git head,
>>>>>>>>>> which enables JBIG2 embedding to PDF surface via Cairo.
>>>>>>>>> +static cairo_status_t setJBIG2GlobalsIdByRefNums(int refNum, int
>>>>>>>>> refGen,
>>>>>>>>> +                                                 cairo_surface_t
>>>>>>>>> *image)
>>>>>>>>> +{
>>>>>>>>> +  GooString *globalsStrId;
>>>>>>>>> +  char *idBuffer;
>>>>>>>>> +  cairo_status_t st;
>>>>>>>>> +
>>>>>>>>> +
>>>>>>>>> +  globalsStrId = new GooString;
>>>>>>>>> +  globalsStrId->appendf("{0:d}-{1:d}", refNum, refGen);
>>>>>>>>> +  idBuffer = copyString(globalsStrId->getCString());
>>>>>>>>> +  st = cairo_surface_set_mime_data (image,
>>>>>>>>> CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID,
>>>>>>>>>
>>>>>>>>> The cairo JBIG2 mime types are new to 1.14.0 so will need to be
>>>>>>>>> inside a
>>>>>>>>> #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0).
>>>>>>>> If it's acceptable, I want to use the conditional like
>>>>>>>>     #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID
>>>>>>>> The official release of cairo including JBIG2 support is 1.14.0,
>>>>>>>> however, some distributions use a snapshot before 1.14 which
>>>>>>>> supports JBIG2. For example, Ubuntu 14.04 (long time support)
>>>>>>>> ships with 1.13.0~20140204.
>>>>>>>>    
>>>>>>>> http://packages.ubuntu.com/search?suite=trusty&section=all&arch=any&keywords=libcairo2&searchon=names
>>>>>>>>
>>>>>>>>
>>>>>>>> It includes JBIG2 support. Thus I prefer checking by the JBIG2
>>>>>>>> macros, instead of the version checking.
>>>>>>> I'm not keen on making an exception for this case. The target audience
>>>>>>> for this is very small. It would only help users that compile a new
>>>>>>> version of poppler on an old distribution without also recompiling
>>>>>>> cairo. JBIG2 support for pdftocairo is a very minor feature that not
>>>>>>> many people would be using (so far no one else has requested this
>>>>>>> feature).
>>>>>>>
>>>>>>> If you really want to support distributions using a cairo 1.13 snapshot
>>>>>>> I suggest adding a comment before the first occurrence of
>>>>>>> #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID to explain why the version macro
>>>>>>> is not used. Also include
>>>>>>> "#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)" in the comment so
>>>>>>> that:
>>>>>>> a) searching for CAIRO_VERSION will find it, and
>>>>>>> b) to document that JBIG2 support is in 1.14 or later.
>>>>>>>
>>>>>>>> BTW, when the string for UNIQUE_ID is constructed, its syntax
>>>>>>>> is RefGen-RefNum. In PDF data, RefNum then RefGen is popular
>>>>>>>> syntax. Is there any reason why the order is reverted?
>>>>>>> No reason. Feel free to change it to num-gen order.
>>>>>>>
>>>>>>>>> +GBool CairoOutputDev::setMimeDataForJBIG2Globals(Stream  *str,
>>>>>>>>> +                                                 cairo_surface_t
>>>>>>>>> *image)
>>>>>>>>> +{
>>>>>>>>> +  JBIG2Stream *jbig2Str = static_cast<JBIG2Stream *>(str);
>>>>>>>>> +  Object* globalsStr = jbig2Str->getGlobalsStream();
>>>>>>>>> +  char *globalsBuffer;
>>>>>>>>> +  int globalsLength;
>>>>>>>>> +
>>>>>>>>> +  // According to PDF spec, Globals should be stream (or nothing)
>>>>>>>>> +
>>>>>>>>> +  if (globalsStr->isNull())
>>>>>>>>> +    return gTrue;
>>>>>>>>> +
>>>>>>>>> +  if (!globalsStr->isStream())
>>>>>>>>> +    return gFalse;
>>>>>>>>> +
>>>>>>>>> +  // See JBIG2Stream constructor for valid reference number
>>>>>>>>> +  if (!jbig2Str->getGlobalsStreamRefNum())
>>>>>>>>> +    return gFalse;
>>>>>>>>>
>>>>>>>>> The globalsStr->isStream() check should be sufficient. The reference
>>>>>>>>> number should always be valid of the global stream is valid.
>>>>>>>> Umm. Requesting "both of JBIG2Stream::globalsStream and Ref to
>>>>>>>> it must be always valid as far as GlobalsStream is used" is
>>>>>>>> intuitive solution, but public JBIG2Stream constructor has not
>>>>>>>> requested the Ref to GlobalsStream before in earlier version.
>>>>>>>>
>>>>>>>> If some developers use JBIG2Stream constructor with valid
>>>>>>>> GlobalsStream but without Ref to it (as the syntax before my
>>>>>>>> patch), there is the situation that JBIG2Stream::globalsStream
>>>>>>>> is valid but Ref to it is invalid (if I supply the default
>>>>>>>> value for the argument to pass Ref). How do we detect or prevent
>>>>>>>> such case?
>>>>>>>>
>>>>>>>> A) Supplying the default value (NULL pointer) in the argument
>>>>>>>> to pass the Ref, and JBIG2Stream returns an error when
>>>>>>>> globalsStream is valid stream but its Ref is not passed - and
>>>>>>>> prevent the case that globalsStream is valid but its Ref is
>>>>>>>> invalid.
>>>>>>>>
>>>>>>>> B) Not supplying the default value, and prevent the construction
>>>>>>>> without the Ref to the globalsStream.
>>>>>>>>
>>>>>>>> C) Add new boolean value in JBIG2Stream class, to indicate if
>>>>>>>> Ref to globalsStream is explicitly set.
>>>>>>>>
>>>>>>>> D) Initialize Ref with irregular value (e.g. 0,0 or 0,65535)
>>>>>>>> internally when Ref is not passed, and deal the case as Ref to
>>>>>>>> the globalsStream is not initialized (my last patch design)
>>>>>>>>
>>>>>>>> Which is the best? Or, no need to case such case?
>>>>>>> JBIG2Stream is not part of the public API so there is no need to
>>>>>>> preserve compatibility.
>>>>>>>
>>>>>>> Something else I just noticed - could you rename "st" to "status".
>>>>>>> CairoOutputdev.cc always uses the name "status" for cairo status so we
>>>>>>> should be consistent with this convention.
>>>>>>>
>>>>> _______________________________________________
>>>>> poppler mailing list
>>>>> poppler at lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> poppler mailing list
>>> poppler at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
> 



More information about the poppler mailing list