[poppler] Toward to JBIG2 support in CairoOutputDev

Adrian Johnson ajohnson at redneon.com
Fri Jan 23 03:55:54 PST 2015


I can't seem to reproduce the slow down any more. I am assuming
something was not rebuilt correctly when the slowdown occurred. I've
pushed out the patch as I am now confident there is no regression.


On 22/01/15 21:11, Adrian Johnson wrote:
> I'll take a look and see if I can work out what is going on.
> 
> On 22/01/15 16:58, suzuki toshiya wrote:
>> Dear Adrian,
>>
>> I've tested with The_Miseries_of_Human_Life.pdf, but
>> strangely I could not reproduce the slowdown issue.
>> Furthermore, pdftocairo with JBIG2 patch finishes
>> the rendering faster than one without JBIG2 patch.
>> Maybe my machine (Core2Duo 1GHz, RAM 2GB) is quite
>> slower than yours - in fact, my machine spent 3-5 minutes
>> to render the PDF by pdftocairo.
>>
>> I uploaded Linux/i386 static binaries built with "-pg -g -O2" at:
>> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo
>> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2
>>
>> Also I uploaded sample profiles (gmon.out) at:
>> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-4m20s.gmon.out
>>
>> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2-3m03s.gmon.out
>>
>>
>>
>> Their analysis outputs by gprof are at:
>> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-4m20s.gmon.log
>>
>> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2-3m03s.gmon.log
>>
>>
>> If I made the binaries without profiler (without -pg)
>> the processing times are almost same. So I think this is
>> not because the profiler made the rendering slower.
>>
>> Comparing the profile results, the remarkable difference
>> would be that 3 functions OUT of poppler spent long time
>> in original pdftocairo:
>> * _cairo_pdf_surface_show_page() spent 35.77s (spent 0.08s in patched)
>> * longest_match() (of zlib) spent 18.72s (spent 0.72s in patched)
>> * _cairo_image_analyze_color() spent 13.02s (not called in patched)
>>
>> I guess they are used to reencoding of pixel data from
>> JBIG2 to Deflate. If it's required, I will make a profiled
>> Cairo library and check more detail.
>>
>> -- 
>>
>> At present, I don't have an access to the machines that
>> can render The_Miseries_of_Human_Life.pdf within 1 min.
>> So, please help me to analyze the cause of slowdown.
>>
>> Could you build the binary with profiler and show the
>> profiled results? Of course you can use my binaries to
>> check if our difference is because of the machines, or,
>> because of different configurations.
>>
>> Regards,
>> mpsuzuki
>>
>>
>> suzuki toshiya wrote:
>>> Dear Adrian,
>>>
>>> Great Thank you for benchmark evaluation.
>>> I will try to analyze the slowdown and
>>> improve the performance.
>>>
>>> Regards,
>>> mpsuzuki
>>>
>>> Adrian Johnson wrote:
>>>> I've tested the lastest patch. It works fine but I noticed about a 50%
>>>> slow down with pdfs containing a lot of JBIG2 images.
>>>>
>>>> I've uploaded a couple of sample pdfs here:
>>>> http://people.freedesktop.org/~ajohnson/jbig2/
>>>>
>>>> Below are the timings for each document using git master and git master
>>>> with the jbig2 patch. I used the command line
>>>> pdftocairo -pdf input.pdf output.pdf
>>>>
>>>>
>>>> - AnnualReport.pdf
>>>> master
>>>> real    0m13.562s
>>>> user    0m12.620s
>>>> sys     0m0.820s
>>>>
>>>> jbig2
>>>> real    0m21.045s
>>>> user    0m20.000s
>>>> sys     0m1.060s
>>>>     
>>>>
>>>> - The_Miseries_of_Human_Life.pdf
>>>> master
>>>> real    0m32.690s
>>>> user    0m29.716s
>>>> sys     0m2.832s
>>>>
>>>> jbig2
>>>> real    0m42.793s
>>>> user    0m39.680s
>>>> sys     0m2.848s
>>>>
>>>>
>>>> While it is still worth a small performance reduction to get the
>>>> significant reduction in the output file size, it may be worth seeing if
>>>> there is a simple solution to the slowdown. My first guess is to try
>>>> attaching the global data to only one image instead of to every image.
>>>>
>>>>
>>>> On 20/01/15 02:42, suzuki toshiya wrote:
>>>>> Dear Carlos,
>>>>>
>>>>> Sorry! I had a silly mistake, I should pass ref->getRef()
>>>>> (in my previous patch, I passed ref->isRef()).
>>>>> I attach corrected patch.
>>>>>
>>>>> Regards,
>>>>> mpsuzuki
>>>>>
>>>>> suzuki toshiya wrote:
>>>>>> Dear Carlos,
>>>>>>
>>>>>> Thank you for finding out the points! I attached revised patch.
>>>>>> The difference from my previous revision is following.
>>>>>>
>>>>>> Regards,
>>>>>> mpsuzuki
>>>>>>
>>>>>>
>>>>>> diff --git a/poppler/CairoOutputDev.cc b/poppler/CairoOutputDev.cc
>>>>>> index a428d2c..17b6b3b 100644
>>>>>> --- a/poppler/CairoOutputDev.cc
>>>>>> +++ b/poppler/CairoOutputDev.cc
>>>>>> @@ -2833,13 +2833,12 @@ void CairoOutputDev::setMimeData(GfxState
>>>>>> *state, Stream *str, Object *ref,
>>>>>>   #endif
>>>>>>
>>>>>>     if (getStreamData (str->getNextStream(), &strBuffer, &len)) {
>>>>>> -    cairo_status_t status;
>>>>>> +    cairo_status_t status = CAIRO_STATUS_SUCCESS;
>>>>>>
>>>>>>   #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2)
>>>>>>       if (ref && ref->isRef()) {
>>>>>> -      Ref imgRef = ref->getRef();
>>>>>>         status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID,
>>>>>> -                                "poppler-surface-", imgRef);
>>>>>> +                                "poppler-surface-", ref->isRef());
>>>>>>       }
>>>>>>   #endif
>>>>>>       if (!status) {
>>>>>>
>>>>>>
>>>>>>
>>>>>> Carlos Garcia Campos wrote:
>>>>>>> Adrian Johnson <ajohnson at redneon.com> writes:
>>>>>>>
>>>>>>>> The latest patch looks good. I'll wait a week before committing
>>>>>>>> to see
>>>>>>>> if anyone else has any comments. This will give me time to do some
>>>>>>>> testing of the patch.
>>>>>>> Looks good to me as well in general. There's only one thing I've
>>>>>>> noticed.
>>>>>>>
>>>>>>>> @@ -2746,31 +2827,28 @@ void CairoOutputDev::setMimeData(GfxState
>>>>>>>> *state, Stream *str, Object *ref,
>>>>>>>>     if (!colorMapHasIdentityDecodeMap(colorMap))
>>>>>>>>       return;
>>>>>>>>
>>>>>>>> +#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)
>>>>>>>> +  if (strKind == strJBIG2 && !setMimeDataForJBIG2Globals(str,
>>>>>>>> image))
>>>>>>>> +    return;
>>>>>>>> +#endif
>>>>>>>> +
>>>>>>>>     if (getStreamData (str->getNextStream(), &strBuffer, &len)) {
>>>>>>>> -    cairo_status_t st;
>>>>>>>> +    cairo_status_t status;
>>>>>>>>
>>>>>>>>   #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2)
>>>>>>>>       if (ref && ref->isRef()) {
>>>>>>>>         Ref imgRef = ref->getRef();
>>>>>>>> -      GooString *surfaceId = new GooString("poppler-surface-");
>>>>>>>> -      surfaceId->appendf("{0:d}-{1:d}", imgRef.gen, imgRef.num);
>>>>>>>> -      char *idBuffer = copyString(surfaceId->getCString());
>>>>>>>> -      st = cairo_surface_set_mime_data (image,
>>>>>>>> CAIRO_MIME_TYPE_UNIQUE_ID,
>>>>>>>> -                                        (const unsigned char
>>>>>>>> *)idBuffer,
>>>>>>>> -                                        surfaceId->getLength(),
>>>>>>>> -                                        gfree, idBuffer);
>>>>>>>> -      if (st)
>>>>>>>> -        gfree(idBuffer);
>>>>>>>> -      delete surfaceId;
>>>>>>>> +      status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID,
>>>>>>>> +                                "poppler-surface-", imgRef);
>>>>>>> We could pass ref->getRef() directly here.
>>>>>>>
>>>>>>>>       }
>>>>>>>>   #endif
>>>>>>>> +    if (!status) {
>>>>>>> status might be uninitialized at this point if cairo version is not
>>>>>>> recent enough or the ref && ref->isRef() condition is False.
>>>>>>>
>>>>>>>> +      status = cairo_surface_set_mime_data (image, mime_type,
>>>>>>>> +                        (const unsigned char *)strBuffer, len,
>>>>>>>> +                        gfree, strBuffer);
>>>>>>>> +    }
>>>>>>>
>>>>>>>> -    st = cairo_surface_set_mime_data (image,
>>>>>>>> -                      str->getKind() == strDCT ?
>>>>>>>> -                      CAIRO_MIME_TYPE_JPEG : CAIRO_MIME_TYPE_JP2,
>>>>>>>> -                      (const unsigned char *)strBuffer, len,
>>>>>>>> -                      gfree, strBuffer);
>>>>>>>> -    if (st)
>>>>>>>> +    if (status)
>>>>>>>>         gfree (strBuffer);
>>>>>>>>     }
>>>>>>>>   }
>>>>>>>> On 11/01/15 23:05, suzuki toshiya wrote:
>>>>>>>>> Dear Adrian,
>>>>>>>>>
>>>>>>>>> OK - considering that there is a lag to the day that Linux
>>>>>>>>> distribution ship the poppler stable branch including
>>>>>>>>> CairoOutputDev with JBIG2 support even if I support some
>>>>>>>>> snapshot of Cairo in current development branch of poppler
>>>>>>>>> (and there would be no need to support Cairo snapshot in
>>>>>>>>> the day), I decided to follow original version checking.
>>>>>>>>>
>>>>>>>>> Also I removed the default value (NULL) for Ref object to
>>>>>>>>> JBIG2Stream constructor and removed NULL pointer checking
>>>>>>>>> in JBIG2Stream constructor (as same as the argument to pass
>>>>>>>>> Globals object).
>>>>>>>>>
>>>>>>>>> I replaced "st" in CairoOutputDev, by "status".
>>>>>>>>>
>>>>>>>>> Here I attached revised patch - oh, I'm quite sorry, my
>>>>>>>>> previous posts were misunderstanding this year as 2014.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> mpsuzuki
>>>>>>>>>
>>>>>>>>> Adrian Johnson wrote:
>>>>>>>>>> On 08/01/15 13:35, suzuki toshiya wrote:
>>>>>>>>>>> Dear Adrian,
>>>>>>>>>>>
>>>>>>>>>>> Great Thank you for prompt review! I attached the revised patch,
>>>>>>>>>>> but please let me discuss a few points.
>>>>>>>>>>>
>>>>>>>>>>> Adrian Johnson wrote:
>>>>>>>>>>>> On 07/01/15 02:58, suzuki toshiya wrote:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here I submit a preliminary patch for git head,
>>>>>>>>>>>>> which enables JBIG2 embedding to PDF surface via Cairo.
>>>>>>>>>>>> +static cairo_status_t setJBIG2GlobalsIdByRefNums(int refNum,
>>>>>>>>>>>> int
>>>>>>>>>>>> refGen,
>>>>>>>>>>>> +                                                
>>>>>>>>>>>> cairo_surface_t
>>>>>>>>>>>> *image)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +  GooString *globalsStrId;
>>>>>>>>>>>> +  char *idBuffer;
>>>>>>>>>>>> +  cairo_status_t st;
>>>>>>>>>>>> +
>>>>>>>>>>>> +
>>>>>>>>>>>> +  globalsStrId = new GooString;
>>>>>>>>>>>> +  globalsStrId->appendf("{0:d}-{1:d}", refNum, refGen);
>>>>>>>>>>>> +  idBuffer = copyString(globalsStrId->getCString());
>>>>>>>>>>>> +  st = cairo_surface_set_mime_data (image,
>>>>>>>>>>>> CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID,
>>>>>>>>>>>>
>>>>>>>>>>>> The cairo JBIG2 mime types are new to 1.14.0 so will need to be
>>>>>>>>>>>> inside a
>>>>>>>>>>>> #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0).
>>>>>>>>>>> If it's acceptable, I want to use the conditional like
>>>>>>>>>>>      #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID
>>>>>>>>>>> The official release of cairo including JBIG2 support is 1.14.0,
>>>>>>>>>>> however, some distributions use a snapshot before 1.14 which
>>>>>>>>>>> supports JBIG2. For example, Ubuntu 14.04 (long time support)
>>>>>>>>>>> ships with 1.13.0~20140204.
>>>>>>>>>>>
>>>>>>>>>>> http://packages.ubuntu.com/search?suite=trusty&section=all&arch=any&keywords=libcairo2&searchon=names
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It includes JBIG2 support. Thus I prefer checking by the JBIG2
>>>>>>>>>>> macros, instead of the version checking.
>>>>>>>>>> I'm not keen on making an exception for this case. The target
>>>>>>>>>> audience
>>>>>>>>>> for this is very small. It would only help users that compile a
>>>>>>>>>> new
>>>>>>>>>> version of poppler on an old distribution without also recompiling
>>>>>>>>>> cairo. JBIG2 support for pdftocairo is a very minor feature
>>>>>>>>>> that not
>>>>>>>>>> many people would be using (so far no one else has requested this
>>>>>>>>>> feature).
>>>>>>>>>>
>>>>>>>>>> If you really want to support distributions using a cairo 1.13
>>>>>>>>>> snapshot
>>>>>>>>>> I suggest adding a comment before the first occurrence of
>>>>>>>>>> #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID to explain why the
>>>>>>>>>> version macro
>>>>>>>>>> is not used. Also include
>>>>>>>>>> "#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)" in the
>>>>>>>>>> comment so
>>>>>>>>>> that:
>>>>>>>>>> a) searching for CAIRO_VERSION will find it, and
>>>>>>>>>> b) to document that JBIG2 support is in 1.14 or later.
>>>>>>>>>>
>>>>>>>>>>> BTW, when the string for UNIQUE_ID is constructed, its syntax
>>>>>>>>>>> is RefGen-RefNum. In PDF data, RefNum then RefGen is popular
>>>>>>>>>>> syntax. Is there any reason why the order is reverted?
>>>>>>>>>> No reason. Feel free to change it to num-gen order.
>>>>>>>>>>
>>>>>>>>>>>> +GBool CairoOutputDev::setMimeDataForJBIG2Globals(Stream  *str,
>>>>>>>>>>>> +                                                
>>>>>>>>>>>> cairo_surface_t
>>>>>>>>>>>> *image)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +  JBIG2Stream *jbig2Str = static_cast<JBIG2Stream *>(str);
>>>>>>>>>>>> +  Object* globalsStr = jbig2Str->getGlobalsStream();
>>>>>>>>>>>> +  char *globalsBuffer;
>>>>>>>>>>>> +  int globalsLength;
>>>>>>>>>>>> +
>>>>>>>>>>>> +  // According to PDF spec, Globals should be stream (or
>>>>>>>>>>>> nothing)
>>>>>>>>>>>> +
>>>>>>>>>>>> +  if (globalsStr->isNull())
>>>>>>>>>>>> +    return gTrue;
>>>>>>>>>>>> +
>>>>>>>>>>>> +  if (!globalsStr->isStream())
>>>>>>>>>>>> +    return gFalse;
>>>>>>>>>>>> +
>>>>>>>>>>>> +  // See JBIG2Stream constructor for valid reference number
>>>>>>>>>>>> +  if (!jbig2Str->getGlobalsStreamRefNum())
>>>>>>>>>>>> +    return gFalse;
>>>>>>>>>>>>
>>>>>>>>>>>> The globalsStr->isStream() check should be sufficient. The
>>>>>>>>>>>> reference
>>>>>>>>>>>> number should always be valid of the global stream is valid.
>>>>>>>>>>> Umm. Requesting "both of JBIG2Stream::globalsStream and Ref to
>>>>>>>>>>> it must be always valid as far as GlobalsStream is used" is
>>>>>>>>>>> intuitive solution, but public JBIG2Stream constructor has not
>>>>>>>>>>> requested the Ref to GlobalsStream before in earlier version.
>>>>>>>>>>>
>>>>>>>>>>> If some developers use JBIG2Stream constructor with valid
>>>>>>>>>>> GlobalsStream but without Ref to it (as the syntax before my
>>>>>>>>>>> patch), there is the situation that JBIG2Stream::globalsStream
>>>>>>>>>>> is valid but Ref to it is invalid (if I supply the default
>>>>>>>>>>> value for the argument to pass Ref). How do we detect or prevent
>>>>>>>>>>> such case?
>>>>>>>>>>>
>>>>>>>>>>> A) Supplying the default value (NULL pointer) in the argument
>>>>>>>>>>> to pass the Ref, and JBIG2Stream returns an error when
>>>>>>>>>>> globalsStream is valid stream but its Ref is not passed - and
>>>>>>>>>>> prevent the case that globalsStream is valid but its Ref is
>>>>>>>>>>> invalid.
>>>>>>>>>>>
>>>>>>>>>>> B) Not supplying the default value, and prevent the construction
>>>>>>>>>>> without the Ref to the globalsStream.
>>>>>>>>>>>
>>>>>>>>>>> C) Add new boolean value in JBIG2Stream class, to indicate if
>>>>>>>>>>> Ref to globalsStream is explicitly set.
>>>>>>>>>>>
>>>>>>>>>>> D) Initialize Ref with irregular value (e.g. 0,0 or 0,65535)
>>>>>>>>>>> internally when Ref is not passed, and deal the case as Ref to
>>>>>>>>>>> the globalsStream is not initialized (my last patch design)
>>>>>>>>>>>
>>>>>>>>>>> Which is the best? Or, no need to case such case?
>>>>>>>>>> JBIG2Stream is not part of the public API so there is no need to
>>>>>>>>>> preserve compatibility.
>>>>>>>>>>
>>>>>>>>>> Something else I just noticed - could you rename "st" to "status".
>>>>>>>>>> CairoOutputdev.cc always uses the name "status" for cairo
>>>>>>>>>> status so we
>>>>>>>>>> should be consistent with this convention.
>>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> poppler mailing list
>>>>>>>> poppler at lists.freedesktop.org
>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> poppler mailing list
>>>>>> poppler at lists.freedesktop.org
>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> poppler mailing list
>>>>> poppler at lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> poppler mailing list
>>> poppler at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
> 



More information about the poppler mailing list