[poppler] Toward to JBIG2 support in CairoOutputDev

Adrian Johnson ajohnson at redneon.com
Thu Jan 22 02:41:25 PST 2015


I'll take a look and see if I can work out what is going on.

On 22/01/15 16:58, suzuki toshiya wrote:
> Dear Adrian,
> 
> I've tested with The_Miseries_of_Human_Life.pdf, but
> strangely I could not reproduce the slowdown issue.
> Furthermore, pdftocairo with JBIG2 patch finishes
> the rendering faster than one without JBIG2 patch.
> Maybe my machine (Core2Duo 1GHz, RAM 2GB) is quite
> slower than yours - in fact, my machine spent 3-5 minutes
> to render the PDF by pdftocairo.
> 
> I uploaded Linux/i386 static binaries built with "-pg -g -O2" at:
> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo
> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2
> 
> Also I uploaded sample profiles (gmon.out) at:
> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-4m20s.gmon.out
> 
> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2-3m03s.gmon.out
> 
> 
> 
> Their analysis outputs by gprof are at:
> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-4m20s.gmon.log
> 
> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2-3m03s.gmon.log
> 
> 
> If I made the binaries without profiler (without -pg)
> the processing times are almost same. So I think this is
> not because the profiler made the rendering slower.
> 
> Comparing the profile results, the remarkable difference
> would be that 3 functions OUT of poppler spent long time
> in original pdftocairo:
> * _cairo_pdf_surface_show_page() spent 35.77s (spent 0.08s in patched)
> * longest_match() (of zlib) spent 18.72s (spent 0.72s in patched)
> * _cairo_image_analyze_color() spent 13.02s (not called in patched)
> 
> I guess they are used to reencoding of pixel data from
> JBIG2 to Deflate. If it's required, I will make a profiled
> Cairo library and check more detail.
> 
> -- 
> 
> At present, I don't have an access to the machines that
> can render The_Miseries_of_Human_Life.pdf within 1 min.
> So, please help me to analyze the cause of slowdown.
> 
> Could you build the binary with profiler and show the
> profiled results? Of course you can use my binaries to
> check if our difference is because of the machines, or,
> because of different configurations.
> 
> Regards,
> mpsuzuki
> 
> 
> suzuki toshiya wrote:
>> Dear Adrian,
>>
>> Great Thank you for benchmark evaluation.
>> I will try to analyze the slowdown and
>> improve the performance.
>>
>> Regards,
>> mpsuzuki
>>
>> Adrian Johnson wrote:
>>> I've tested the lastest patch. It works fine but I noticed about a 50%
>>> slow down with pdfs containing a lot of JBIG2 images.
>>>
>>> I've uploaded a couple of sample pdfs here:
>>> http://people.freedesktop.org/~ajohnson/jbig2/
>>>
>>> Below are the timings for each document using git master and git master
>>> with the jbig2 patch. I used the command line
>>> pdftocairo -pdf input.pdf output.pdf
>>>
>>>
>>> - AnnualReport.pdf
>>> master
>>> real    0m13.562s
>>> user    0m12.620s
>>> sys     0m0.820s
>>>
>>> jbig2
>>> real    0m21.045s
>>> user    0m20.000s
>>> sys     0m1.060s
>>>     
>>>
>>> - The_Miseries_of_Human_Life.pdf
>>> master
>>> real    0m32.690s
>>> user    0m29.716s
>>> sys     0m2.832s
>>>
>>> jbig2
>>> real    0m42.793s
>>> user    0m39.680s
>>> sys     0m2.848s
>>>
>>>
>>> While it is still worth a small performance reduction to get the
>>> significant reduction in the output file size, it may be worth seeing if
>>> there is a simple solution to the slowdown. My first guess is to try
>>> attaching the global data to only one image instead of to every image.
>>>
>>>
>>> On 20/01/15 02:42, suzuki toshiya wrote:
>>>> Dear Carlos,
>>>>
>>>> Sorry! I had a silly mistake, I should pass ref->getRef()
>>>> (in my previous patch, I passed ref->isRef()).
>>>> I attach corrected patch.
>>>>
>>>> Regards,
>>>> mpsuzuki
>>>>
>>>> suzuki toshiya wrote:
>>>>> Dear Carlos,
>>>>>
>>>>> Thank you for finding out the points! I attached revised patch.
>>>>> The difference from my previous revision is following.
>>>>>
>>>>> Regards,
>>>>> mpsuzuki
>>>>>
>>>>>
>>>>> diff --git a/poppler/CairoOutputDev.cc b/poppler/CairoOutputDev.cc
>>>>> index a428d2c..17b6b3b 100644
>>>>> --- a/poppler/CairoOutputDev.cc
>>>>> +++ b/poppler/CairoOutputDev.cc
>>>>> @@ -2833,13 +2833,12 @@ void CairoOutputDev::setMimeData(GfxState
>>>>> *state, Stream *str, Object *ref,
>>>>>   #endif
>>>>>
>>>>>     if (getStreamData (str->getNextStream(), &strBuffer, &len)) {
>>>>> -    cairo_status_t status;
>>>>> +    cairo_status_t status = CAIRO_STATUS_SUCCESS;
>>>>>
>>>>>   #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2)
>>>>>       if (ref && ref->isRef()) {
>>>>> -      Ref imgRef = ref->getRef();
>>>>>         status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID,
>>>>> -                                "poppler-surface-", imgRef);
>>>>> +                                "poppler-surface-", ref->isRef());
>>>>>       }
>>>>>   #endif
>>>>>       if (!status) {
>>>>>
>>>>>
>>>>>
>>>>> Carlos Garcia Campos wrote:
>>>>>> Adrian Johnson <ajohnson at redneon.com> writes:
>>>>>>
>>>>>>> The latest patch looks good. I'll wait a week before committing
>>>>>>> to see
>>>>>>> if anyone else has any comments. This will give me time to do some
>>>>>>> testing of the patch.
>>>>>> Looks good to me as well in general. There's only one thing I've
>>>>>> noticed.
>>>>>>
>>>>>>> @@ -2746,31 +2827,28 @@ void CairoOutputDev::setMimeData(GfxState
>>>>>>> *state, Stream *str, Object *ref,
>>>>>>>     if (!colorMapHasIdentityDecodeMap(colorMap))
>>>>>>>       return;
>>>>>>>
>>>>>>> +#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)
>>>>>>> +  if (strKind == strJBIG2 && !setMimeDataForJBIG2Globals(str,
>>>>>>> image))
>>>>>>> +    return;
>>>>>>> +#endif
>>>>>>> +
>>>>>>>     if (getStreamData (str->getNextStream(), &strBuffer, &len)) {
>>>>>>> -    cairo_status_t st;
>>>>>>> +    cairo_status_t status;
>>>>>>>
>>>>>>>   #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2)
>>>>>>>       if (ref && ref->isRef()) {
>>>>>>>         Ref imgRef = ref->getRef();
>>>>>>> -      GooString *surfaceId = new GooString("poppler-surface-");
>>>>>>> -      surfaceId->appendf("{0:d}-{1:d}", imgRef.gen, imgRef.num);
>>>>>>> -      char *idBuffer = copyString(surfaceId->getCString());
>>>>>>> -      st = cairo_surface_set_mime_data (image,
>>>>>>> CAIRO_MIME_TYPE_UNIQUE_ID,
>>>>>>> -                                        (const unsigned char
>>>>>>> *)idBuffer,
>>>>>>> -                                        surfaceId->getLength(),
>>>>>>> -                                        gfree, idBuffer);
>>>>>>> -      if (st)
>>>>>>> -        gfree(idBuffer);
>>>>>>> -      delete surfaceId;
>>>>>>> +      status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID,
>>>>>>> +                                "poppler-surface-", imgRef);
>>>>>> We could pass ref->getRef() directly here.
>>>>>>
>>>>>>>       }
>>>>>>>   #endif
>>>>>>> +    if (!status) {
>>>>>> status might be uninitialized at this point if cairo version is not
>>>>>> recent enough or the ref && ref->isRef() condition is False.
>>>>>>
>>>>>>> +      status = cairo_surface_set_mime_data (image, mime_type,
>>>>>>> +                        (const unsigned char *)strBuffer, len,
>>>>>>> +                        gfree, strBuffer);
>>>>>>> +    }
>>>>>>
>>>>>>> -    st = cairo_surface_set_mime_data (image,
>>>>>>> -                      str->getKind() == strDCT ?
>>>>>>> -                      CAIRO_MIME_TYPE_JPEG : CAIRO_MIME_TYPE_JP2,
>>>>>>> -                      (const unsigned char *)strBuffer, len,
>>>>>>> -                      gfree, strBuffer);
>>>>>>> -    if (st)
>>>>>>> +    if (status)
>>>>>>>         gfree (strBuffer);
>>>>>>>     }
>>>>>>>   }
>>>>>>> On 11/01/15 23:05, suzuki toshiya wrote:
>>>>>>>> Dear Adrian,
>>>>>>>>
>>>>>>>> OK - considering that there is a lag to the day that Linux
>>>>>>>> distribution ship the poppler stable branch including
>>>>>>>> CairoOutputDev with JBIG2 support even if I support some
>>>>>>>> snapshot of Cairo in current development branch of poppler
>>>>>>>> (and there would be no need to support Cairo snapshot in
>>>>>>>> the day), I decided to follow original version checking.
>>>>>>>>
>>>>>>>> Also I removed the default value (NULL) for Ref object to
>>>>>>>> JBIG2Stream constructor and removed NULL pointer checking
>>>>>>>> in JBIG2Stream constructor (as same as the argument to pass
>>>>>>>> Globals object).
>>>>>>>>
>>>>>>>> I replaced "st" in CairoOutputDev, by "status".
>>>>>>>>
>>>>>>>> Here I attached revised patch - oh, I'm quite sorry, my
>>>>>>>> previous posts were misunderstanding this year as 2014.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> mpsuzuki
>>>>>>>>
>>>>>>>> Adrian Johnson wrote:
>>>>>>>>> On 08/01/15 13:35, suzuki toshiya wrote:
>>>>>>>>>> Dear Adrian,
>>>>>>>>>>
>>>>>>>>>> Great Thank you for prompt review! I attached the revised patch,
>>>>>>>>>> but please let me discuss a few points.
>>>>>>>>>>
>>>>>>>>>> Adrian Johnson wrote:
>>>>>>>>>>> On 07/01/15 02:58, suzuki toshiya wrote:
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> Here I submit a preliminary patch for git head,
>>>>>>>>>>>> which enables JBIG2 embedding to PDF surface via Cairo.
>>>>>>>>>>> +static cairo_status_t setJBIG2GlobalsIdByRefNums(int refNum,
>>>>>>>>>>> int
>>>>>>>>>>> refGen,
>>>>>>>>>>> +                                                
>>>>>>>>>>> cairo_surface_t
>>>>>>>>>>> *image)
>>>>>>>>>>> +{
>>>>>>>>>>> +  GooString *globalsStrId;
>>>>>>>>>>> +  char *idBuffer;
>>>>>>>>>>> +  cairo_status_t st;
>>>>>>>>>>> +
>>>>>>>>>>> +
>>>>>>>>>>> +  globalsStrId = new GooString;
>>>>>>>>>>> +  globalsStrId->appendf("{0:d}-{1:d}", refNum, refGen);
>>>>>>>>>>> +  idBuffer = copyString(globalsStrId->getCString());
>>>>>>>>>>> +  st = cairo_surface_set_mime_data (image,
>>>>>>>>>>> CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID,
>>>>>>>>>>>
>>>>>>>>>>> The cairo JBIG2 mime types are new to 1.14.0 so will need to be
>>>>>>>>>>> inside a
>>>>>>>>>>> #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0).
>>>>>>>>>> If it's acceptable, I want to use the conditional like
>>>>>>>>>>      #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID
>>>>>>>>>> The official release of cairo including JBIG2 support is 1.14.0,
>>>>>>>>>> however, some distributions use a snapshot before 1.14 which
>>>>>>>>>> supports JBIG2. For example, Ubuntu 14.04 (long time support)
>>>>>>>>>> ships with 1.13.0~20140204.
>>>>>>>>>>
>>>>>>>>>> http://packages.ubuntu.com/search?suite=trusty&section=all&arch=any&keywords=libcairo2&searchon=names
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It includes JBIG2 support. Thus I prefer checking by the JBIG2
>>>>>>>>>> macros, instead of the version checking.
>>>>>>>>> I'm not keen on making an exception for this case. The target
>>>>>>>>> audience
>>>>>>>>> for this is very small. It would only help users that compile a
>>>>>>>>> new
>>>>>>>>> version of poppler on an old distribution without also recompiling
>>>>>>>>> cairo. JBIG2 support for pdftocairo is a very minor feature
>>>>>>>>> that not
>>>>>>>>> many people would be using (so far no one else has requested this
>>>>>>>>> feature).
>>>>>>>>>
>>>>>>>>> If you really want to support distributions using a cairo 1.13
>>>>>>>>> snapshot
>>>>>>>>> I suggest adding a comment before the first occurrence of
>>>>>>>>> #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID to explain why the
>>>>>>>>> version macro
>>>>>>>>> is not used. Also include
>>>>>>>>> "#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)" in the
>>>>>>>>> comment so
>>>>>>>>> that:
>>>>>>>>> a) searching for CAIRO_VERSION will find it, and
>>>>>>>>> b) to document that JBIG2 support is in 1.14 or later.
>>>>>>>>>
>>>>>>>>>> BTW, when the string for UNIQUE_ID is constructed, its syntax
>>>>>>>>>> is RefGen-RefNum. In PDF data, RefNum then RefGen is popular
>>>>>>>>>> syntax. Is there any reason why the order is reverted?
>>>>>>>>> No reason. Feel free to change it to num-gen order.
>>>>>>>>>
>>>>>>>>>>> +GBool CairoOutputDev::setMimeDataForJBIG2Globals(Stream  *str,
>>>>>>>>>>> +                                                
>>>>>>>>>>> cairo_surface_t
>>>>>>>>>>> *image)
>>>>>>>>>>> +{
>>>>>>>>>>> +  JBIG2Stream *jbig2Str = static_cast<JBIG2Stream *>(str);
>>>>>>>>>>> +  Object* globalsStr = jbig2Str->getGlobalsStream();
>>>>>>>>>>> +  char *globalsBuffer;
>>>>>>>>>>> +  int globalsLength;
>>>>>>>>>>> +
>>>>>>>>>>> +  // According to PDF spec, Globals should be stream (or
>>>>>>>>>>> nothing)
>>>>>>>>>>> +
>>>>>>>>>>> +  if (globalsStr->isNull())
>>>>>>>>>>> +    return gTrue;
>>>>>>>>>>> +
>>>>>>>>>>> +  if (!globalsStr->isStream())
>>>>>>>>>>> +    return gFalse;
>>>>>>>>>>> +
>>>>>>>>>>> +  // See JBIG2Stream constructor for valid reference number
>>>>>>>>>>> +  if (!jbig2Str->getGlobalsStreamRefNum())
>>>>>>>>>>> +    return gFalse;
>>>>>>>>>>>
>>>>>>>>>>> The globalsStr->isStream() check should be sufficient. The
>>>>>>>>>>> reference
>>>>>>>>>>> number should always be valid of the global stream is valid.
>>>>>>>>>> Umm. Requesting "both of JBIG2Stream::globalsStream and Ref to
>>>>>>>>>> it must be always valid as far as GlobalsStream is used" is
>>>>>>>>>> intuitive solution, but public JBIG2Stream constructor has not
>>>>>>>>>> requested the Ref to GlobalsStream before in earlier version.
>>>>>>>>>>
>>>>>>>>>> If some developers use JBIG2Stream constructor with valid
>>>>>>>>>> GlobalsStream but without Ref to it (as the syntax before my
>>>>>>>>>> patch), there is the situation that JBIG2Stream::globalsStream
>>>>>>>>>> is valid but Ref to it is invalid (if I supply the default
>>>>>>>>>> value for the argument to pass Ref). How do we detect or prevent
>>>>>>>>>> such case?
>>>>>>>>>>
>>>>>>>>>> A) Supplying the default value (NULL pointer) in the argument
>>>>>>>>>> to pass the Ref, and JBIG2Stream returns an error when
>>>>>>>>>> globalsStream is valid stream but its Ref is not passed - and
>>>>>>>>>> prevent the case that globalsStream is valid but its Ref is
>>>>>>>>>> invalid.
>>>>>>>>>>
>>>>>>>>>> B) Not supplying the default value, and prevent the construction
>>>>>>>>>> without the Ref to the globalsStream.
>>>>>>>>>>
>>>>>>>>>> C) Add new boolean value in JBIG2Stream class, to indicate if
>>>>>>>>>> Ref to globalsStream is explicitly set.
>>>>>>>>>>
>>>>>>>>>> D) Initialize Ref with irregular value (e.g. 0,0 or 0,65535)
>>>>>>>>>> internally when Ref is not passed, and deal the case as Ref to
>>>>>>>>>> the globalsStream is not initialized (my last patch design)
>>>>>>>>>>
>>>>>>>>>> Which is the best? Or, no need to case such case?
>>>>>>>>> JBIG2Stream is not part of the public API so there is no need to
>>>>>>>>> preserve compatibility.
>>>>>>>>>
>>>>>>>>> Something else I just noticed - could you rename "st" to "status".
>>>>>>>>> CairoOutputdev.cc always uses the name "status" for cairo
>>>>>>>>> status so we
>>>>>>>>> should be consistent with this convention.
>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> poppler mailing list
>>>>>>> poppler at lists.freedesktop.org
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> poppler mailing list
>>>>> poppler at lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>
>>>>
>>>> _______________________________________________
>>>> poppler mailing list
>>>> poppler at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>
>>>
>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler



More information about the poppler mailing list