[poppler] Toward to JBIG2 support in CairoOutputDev

suzuki toshiya mpsuzuki at hiroshima-u.ac.jp
Wed Jan 21 22:28:29 PST 2015


Dear Adrian,

I've tested with The_Miseries_of_Human_Life.pdf, but
strangely I could not reproduce the slowdown issue.
Furthermore, pdftocairo with JBIG2 patch finishes
the rendering faster than one without JBIG2 patch.
Maybe my machine (Core2Duo 1GHz, RAM 2GB) is quite
slower than yours - in fact, my machine spent 3-5 minutes
to render the PDF by pdftocairo.

I uploaded Linux/i386 static binaries built with "-pg -g -O2" at:
http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo
http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2

Also I uploaded sample profiles (gmon.out) at:
http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-4m20s.gmon.out
http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2-3m03s.gmon.out


Their analysis outputs by gprof are at:
http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-4m20s.gmon.log
http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2-3m03s.gmon.log

If I made the binaries without profiler (without -pg)
the processing times are almost same. So I think this is
not because the profiler made the rendering slower.

Comparing the profile results, the remarkable difference
would be that 3 functions OUT of poppler spent long time
in original pdftocairo:
* _cairo_pdf_surface_show_page() spent 35.77s (spent 0.08s in patched)
* longest_match() (of zlib) spent 18.72s (spent 0.72s in patched)
* _cairo_image_analyze_color() spent 13.02s (not called in patched)

I guess they are used to reencoding of pixel data from
JBIG2 to Deflate. If it's required, I will make a profiled
Cairo library and check more detail.

--

At present, I don't have an access to the machines that
can render The_Miseries_of_Human_Life.pdf within 1 min.
So, please help me to analyze the cause of slowdown.

Could you build the binary with profiler and show the
profiled results? Of course you can use my binaries to
check if our difference is because of the machines, or,
because of different configurations.

Regards,
mpsuzuki


suzuki toshiya wrote:
> Dear Adrian,
>
> Great Thank you for benchmark evaluation.
> I will try to analyze the slowdown and
> improve the performance.
>
> Regards,
> mpsuzuki
>
> Adrian Johnson wrote:
>> I've tested the lastest patch. It works fine but I noticed about a 50%
>> slow down with pdfs containing a lot of JBIG2 images.
>>
>> I've uploaded a couple of sample pdfs here:
>> http://people.freedesktop.org/~ajohnson/jbig2/
>>
>> Below are the timings for each document using git master and git master
>> with the jbig2 patch. I used the command line
>> pdftocairo -pdf input.pdf output.pdf
>>
>>
>> - AnnualReport.pdf
>> master
>> real    0m13.562s
>> user    0m12.620s
>> sys     0m0.820s
>>
>> jbig2
>> real    0m21.045s
>> user    0m20.000s
>> sys     0m1.060s
>> 	
>>
>> - The_Miseries_of_Human_Life.pdf
>> master
>> real    0m32.690s
>> user    0m29.716s
>> sys     0m2.832s
>>
>> jbig2
>> real    0m42.793s
>> user    0m39.680s
>> sys     0m2.848s
>>
>>
>> While it is still worth a small performance reduction to get the
>> significant reduction in the output file size, it may be worth seeing if
>> there is a simple solution to the slowdown. My first guess is to try
>> attaching the global data to only one image instead of to every image.
>>
>>
>> On 20/01/15 02:42, suzuki toshiya wrote:
>>> Dear Carlos,
>>>
>>> Sorry! I had a silly mistake, I should pass ref->getRef()
>>> (in my previous patch, I passed ref->isRef()).
>>> I attach corrected patch.
>>>
>>> Regards,
>>> mpsuzuki
>>>
>>> suzuki toshiya wrote:
>>>> Dear Carlos,
>>>>
>>>> Thank you for finding out the points! I attached revised patch.
>>>> The difference from my previous revision is following.
>>>>
>>>> Regards,
>>>> mpsuzuki
>>>>
>>>>
>>>> diff --git a/poppler/CairoOutputDev.cc b/poppler/CairoOutputDev.cc
>>>> index a428d2c..17b6b3b 100644
>>>> --- a/poppler/CairoOutputDev.cc
>>>> +++ b/poppler/CairoOutputDev.cc
>>>> @@ -2833,13 +2833,12 @@ void CairoOutputDev::setMimeData(GfxState *state, Stream *str, Object *ref,
>>>>   #endif
>>>>
>>>>     if (getStreamData (str->getNextStream(), &strBuffer, &len)) {
>>>> -    cairo_status_t status;
>>>> +    cairo_status_t status = CAIRO_STATUS_SUCCESS;
>>>>
>>>>   #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2)
>>>>       if (ref && ref->isRef()) {
>>>> -      Ref imgRef = ref->getRef();
>>>>         status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID,
>>>> -                                "poppler-surface-", imgRef);
>>>> +                                "poppler-surface-", ref->isRef());
>>>>       }
>>>>   #endif
>>>>       if (!status) {
>>>>
>>>>
>>>>
>>>> Carlos Garcia Campos wrote:
>>>>> Adrian Johnson <ajohnson at redneon.com> writes:
>>>>>
>>>>>> The latest patch looks good. I'll wait a week before committing to see
>>>>>> if anyone else has any comments. This will give me time to do some
>>>>>> testing of the patch.
>>>>> Looks good to me as well in general. There's only one thing I've
>>>>> noticed.
>>>>>
>>>>>> @@ -2746,31 +2827,28 @@ void CairoOutputDev::setMimeData(GfxState *state, Stream *str, Object *ref,
>>>>>>     if (!colorMapHasIdentityDecodeMap(colorMap))
>>>>>>       return;
>>>>>>
>>>>>> +#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)
>>>>>> +  if (strKind == strJBIG2 && !setMimeDataForJBIG2Globals(str, image))
>>>>>> +    return;
>>>>>> +#endif
>>>>>> +
>>>>>>     if (getStreamData (str->getNextStream(), &strBuffer, &len)) {
>>>>>> -    cairo_status_t st;
>>>>>> +    cairo_status_t status;
>>>>>>
>>>>>>   #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2)
>>>>>>       if (ref && ref->isRef()) {
>>>>>>         Ref imgRef = ref->getRef();
>>>>>> -      GooString *surfaceId = new GooString("poppler-surface-");
>>>>>> -      surfaceId->appendf("{0:d}-{1:d}", imgRef.gen, imgRef.num);
>>>>>> -      char *idBuffer = copyString(surfaceId->getCString());
>>>>>> -      st = cairo_surface_set_mime_data (image, CAIRO_MIME_TYPE_UNIQUE_ID,
>>>>>> -                                        (const unsigned char *)idBuffer,
>>>>>> -                                        surfaceId->getLength(),
>>>>>> -                                        gfree, idBuffer);
>>>>>> -      if (st)
>>>>>> -        gfree(idBuffer);
>>>>>> -      delete surfaceId;
>>>>>> +      status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID,
>>>>>> +                                "poppler-surface-", imgRef);
>>>>> We could pass ref->getRef() directly here.
>>>>>
>>>>>>       }
>>>>>>   #endif
>>>>>> +    if (!status) {
>>>>> status might be uninitialized at this point if cairo version is not
>>>>> recent enough or the ref && ref->isRef() condition is False.
>>>>>
>>>>>> +      status = cairo_surface_set_mime_data (image, mime_type,
>>>>>> +					    (const unsigned char *)strBuffer, len,
>>>>>> +					    gfree, strBuffer);
>>>>>> +    }
>>>>>
>>>>>> -    st = cairo_surface_set_mime_data (image,
>>>>>> -				      str->getKind() == strDCT ?
>>>>>> -				      CAIRO_MIME_TYPE_JPEG : CAIRO_MIME_TYPE_JP2,
>>>>>> -				      (const unsigned char *)strBuffer, len,
>>>>>> -				      gfree, strBuffer);
>>>>>> -    if (st)
>>>>>> +    if (status)
>>>>>>         gfree (strBuffer);
>>>>>>     }
>>>>>>   }
>>>>>> On 11/01/15 23:05, suzuki toshiya wrote:
>>>>>>> Dear Adrian,
>>>>>>>
>>>>>>> OK - considering that there is a lag to the day that Linux
>>>>>>> distribution ship the poppler stable branch including
>>>>>>> CairoOutputDev with JBIG2 support even if I support some
>>>>>>> snapshot of Cairo in current development branch of poppler
>>>>>>> (and there would be no need to support Cairo snapshot in
>>>>>>> the day), I decided to follow original version checking.
>>>>>>>
>>>>>>> Also I removed the default value (NULL) for Ref object to
>>>>>>> JBIG2Stream constructor and removed NULL pointer checking
>>>>>>> in JBIG2Stream constructor (as same as the argument to pass
>>>>>>> Globals object).
>>>>>>>
>>>>>>> I replaced "st" in CairoOutputDev, by "status".
>>>>>>>
>>>>>>> Here I attached revised patch - oh, I'm quite sorry, my
>>>>>>> previous posts were misunderstanding this year as 2014.
>>>>>>>
>>>>>>> Regards,
>>>>>>> mpsuzuki
>>>>>>>
>>>>>>> Adrian Johnson wrote:
>>>>>>>> On 08/01/15 13:35, suzuki toshiya wrote:
>>>>>>>>> Dear Adrian,
>>>>>>>>>
>>>>>>>>> Great Thank you for prompt review! I attached the revised patch,
>>>>>>>>> but please let me discuss a few points.
>>>>>>>>>
>>>>>>>>> Adrian Johnson wrote:
>>>>>>>>>> On 07/01/15 02:58, suzuki toshiya wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> Here I submit a preliminary patch for git head,
>>>>>>>>>>> which enables JBIG2 embedding to PDF surface via Cairo.
>>>>>>>>>> +static cairo_status_t setJBIG2GlobalsIdByRefNums(int refNum, int
>>>>>>>>>> refGen,
>>>>>>>>>> +                                                 cairo_surface_t
>>>>>>>>>> *image)
>>>>>>>>>> +{
>>>>>>>>>> +  GooString *globalsStrId;
>>>>>>>>>> +  char *idBuffer;
>>>>>>>>>> +  cairo_status_t st;
>>>>>>>>>> +
>>>>>>>>>> +
>>>>>>>>>> +  globalsStrId = new GooString;
>>>>>>>>>> +  globalsStrId->appendf("{0:d}-{1:d}", refNum, refGen);
>>>>>>>>>> +  idBuffer = copyString(globalsStrId->getCString());
>>>>>>>>>> +  st = cairo_surface_set_mime_data (image,
>>>>>>>>>> CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID,
>>>>>>>>>>
>>>>>>>>>> The cairo JBIG2 mime types are new to 1.14.0 so will need to be
>>>>>>>>>> inside a
>>>>>>>>>> #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0).
>>>>>>>>> If it's acceptable, I want to use the conditional like
>>>>>>>>>      #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID
>>>>>>>>> The official release of cairo including JBIG2 support is 1.14.0,
>>>>>>>>> however, some distributions use a snapshot before 1.14 which
>>>>>>>>> supports JBIG2. For example, Ubuntu 14.04 (long time support)
>>>>>>>>> ships with 1.13.0~20140204.
>>>>>>>>>
>>>>>>>>> http://packages.ubuntu.com/search?suite=trusty&section=all&arch=any&keywords=libcairo2&searchon=names
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It includes JBIG2 support. Thus I prefer checking by the JBIG2
>>>>>>>>> macros, instead of the version checking.
>>>>>>>> I'm not keen on making an exception for this case. The target audience
>>>>>>>> for this is very small. It would only help users that compile a new
>>>>>>>> version of poppler on an old distribution without also recompiling
>>>>>>>> cairo. JBIG2 support for pdftocairo is a very minor feature that not
>>>>>>>> many people would be using (so far no one else has requested this
>>>>>>>> feature).
>>>>>>>>
>>>>>>>> If you really want to support distributions using a cairo 1.13 snapshot
>>>>>>>> I suggest adding a comment before the first occurrence of
>>>>>>>> #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID to explain why the version macro
>>>>>>>> is not used. Also include
>>>>>>>> "#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)" in the comment so
>>>>>>>> that:
>>>>>>>> a) searching for CAIRO_VERSION will find it, and
>>>>>>>> b) to document that JBIG2 support is in 1.14 or later.
>>>>>>>>
>>>>>>>>> BTW, when the string for UNIQUE_ID is constructed, its syntax
>>>>>>>>> is RefGen-RefNum. In PDF data, RefNum then RefGen is popular
>>>>>>>>> syntax. Is there any reason why the order is reverted?
>>>>>>>> No reason. Feel free to change it to num-gen order.
>>>>>>>>
>>>>>>>>>> +GBool CairoOutputDev::setMimeDataForJBIG2Globals(Stream  *str,
>>>>>>>>>> +                                                 cairo_surface_t
>>>>>>>>>> *image)
>>>>>>>>>> +{
>>>>>>>>>> +  JBIG2Stream *jbig2Str = static_cast<JBIG2Stream *>(str);
>>>>>>>>>> +  Object* globalsStr = jbig2Str->getGlobalsStream();
>>>>>>>>>> +  char *globalsBuffer;
>>>>>>>>>> +  int globalsLength;
>>>>>>>>>> +
>>>>>>>>>> +  // According to PDF spec, Globals should be stream (or nothing)
>>>>>>>>>> +
>>>>>>>>>> +  if (globalsStr->isNull())
>>>>>>>>>> +    return gTrue;
>>>>>>>>>> +
>>>>>>>>>> +  if (!globalsStr->isStream())
>>>>>>>>>> +    return gFalse;
>>>>>>>>>> +
>>>>>>>>>> +  // See JBIG2Stream constructor for valid reference number
>>>>>>>>>> +  if (!jbig2Str->getGlobalsStreamRefNum())
>>>>>>>>>> +    return gFalse;
>>>>>>>>>>
>>>>>>>>>> The globalsStr->isStream() check should be sufficient. The reference
>>>>>>>>>> number should always be valid of the global stream is valid.
>>>>>>>>> Umm. Requesting "both of JBIG2Stream::globalsStream and Ref to
>>>>>>>>> it must be always valid as far as GlobalsStream is used" is
>>>>>>>>> intuitive solution, but public JBIG2Stream constructor has not
>>>>>>>>> requested the Ref to GlobalsStream before in earlier version.
>>>>>>>>>
>>>>>>>>> If some developers use JBIG2Stream constructor with valid
>>>>>>>>> GlobalsStream but without Ref to it (as the syntax before my
>>>>>>>>> patch), there is the situation that JBIG2Stream::globalsStream
>>>>>>>>> is valid but Ref to it is invalid (if I supply the default
>>>>>>>>> value for the argument to pass Ref). How do we detect or prevent
>>>>>>>>> such case?
>>>>>>>>>
>>>>>>>>> A) Supplying the default value (NULL pointer) in the argument
>>>>>>>>> to pass the Ref, and JBIG2Stream returns an error when
>>>>>>>>> globalsStream is valid stream but its Ref is not passed - and
>>>>>>>>> prevent the case that globalsStream is valid but its Ref is
>>>>>>>>> invalid.
>>>>>>>>>
>>>>>>>>> B) Not supplying the default value, and prevent the construction
>>>>>>>>> without the Ref to the globalsStream.
>>>>>>>>>
>>>>>>>>> C) Add new boolean value in JBIG2Stream class, to indicate if
>>>>>>>>> Ref to globalsStream is explicitly set.
>>>>>>>>>
>>>>>>>>> D) Initialize Ref with irregular value (e.g. 0,0 or 0,65535)
>>>>>>>>> internally when Ref is not passed, and deal the case as Ref to
>>>>>>>>> the globalsStream is not initialized (my last patch design)
>>>>>>>>>
>>>>>>>>> Which is the best? Or, no need to case such case?
>>>>>>>> JBIG2Stream is not part of the public API so there is no need to
>>>>>>>> preserve compatibility.
>>>>>>>>
>>>>>>>> Something else I just noticed - could you rename "st" to "status".
>>>>>>>> CairoOutputdev.cc always uses the name "status" for cairo status so we
>>>>>>>> should be consistent with this convention.
>>>>>>>>
>>>>>> _______________________________________________
>>>>>> poppler mailing list
>>>>>> poppler at lists.freedesktop.org
>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> poppler mailing list
>>>> poppler at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>
>>>
>>> _______________________________________________
>>> poppler mailing list
>>> poppler at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>
>>
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>


More information about the poppler mailing list