[poppler] Toward to JBIG2 support in CairoOutputDev
Adrian Johnson
ajohnson at redneon.com
Tue Jan 20 04:10:07 PST 2015
I've tested the lastest patch. It works fine but I noticed about a 50%
slow down with pdfs containing a lot of JBIG2 images.
I've uploaded a couple of sample pdfs here:
http://people.freedesktop.org/~ajohnson/jbig2/
Below are the timings for each document using git master and git master
with the jbig2 patch. I used the command line
pdftocairo -pdf input.pdf output.pdf
- AnnualReport.pdf
master
real 0m13.562s
user 0m12.620s
sys 0m0.820s
jbig2
real 0m21.045s
user 0m20.000s
sys 0m1.060s
- The_Miseries_of_Human_Life.pdf
master
real 0m32.690s
user 0m29.716s
sys 0m2.832s
jbig2
real 0m42.793s
user 0m39.680s
sys 0m2.848s
While it is still worth a small performance reduction to get the
significant reduction in the output file size, it may be worth seeing if
there is a simple solution to the slowdown. My first guess is to try
attaching the global data to only one image instead of to every image.
On 20/01/15 02:42, suzuki toshiya wrote:
> Dear Carlos,
>
> Sorry! I had a silly mistake, I should pass ref->getRef()
> (in my previous patch, I passed ref->isRef()).
> I attach corrected patch.
>
> Regards,
> mpsuzuki
>
> suzuki toshiya wrote:
>> Dear Carlos,
>>
>> Thank you for finding out the points! I attached revised patch.
>> The difference from my previous revision is following.
>>
>> Regards,
>> mpsuzuki
>>
>>
>> diff --git a/poppler/CairoOutputDev.cc b/poppler/CairoOutputDev.cc
>> index a428d2c..17b6b3b 100644
>> --- a/poppler/CairoOutputDev.cc
>> +++ b/poppler/CairoOutputDev.cc
>> @@ -2833,13 +2833,12 @@ void CairoOutputDev::setMimeData(GfxState *state, Stream *str, Object *ref,
>> #endif
>>
>> if (getStreamData (str->getNextStream(), &strBuffer, &len)) {
>> - cairo_status_t status;
>> + cairo_status_t status = CAIRO_STATUS_SUCCESS;
>>
>> #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2)
>> if (ref && ref->isRef()) {
>> - Ref imgRef = ref->getRef();
>> status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID,
>> - "poppler-surface-", imgRef);
>> + "poppler-surface-", ref->isRef());
>> }
>> #endif
>> if (!status) {
>>
>>
>>
>> Carlos Garcia Campos wrote:
>>> Adrian Johnson <ajohnson at redneon.com> writes:
>>>
>>>> The latest patch looks good. I'll wait a week before committing to see
>>>> if anyone else has any comments. This will give me time to do some
>>>> testing of the patch.
>>> Looks good to me as well in general. There's only one thing I've
>>> noticed.
>>>
>>>> @@ -2746,31 +2827,28 @@ void CairoOutputDev::setMimeData(GfxState *state, Stream *str, Object *ref,
>>>> if (!colorMapHasIdentityDecodeMap(colorMap))
>>>> return;
>>>>
>>>> +#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)
>>>> + if (strKind == strJBIG2 && !setMimeDataForJBIG2Globals(str, image))
>>>> + return;
>>>> +#endif
>>>> +
>>>> if (getStreamData (str->getNextStream(), &strBuffer, &len)) {
>>>> - cairo_status_t st;
>>>> + cairo_status_t status;
>>>>
>>>> #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2)
>>>> if (ref && ref->isRef()) {
>>>> Ref imgRef = ref->getRef();
>>>> - GooString *surfaceId = new GooString("poppler-surface-");
>>>> - surfaceId->appendf("{0:d}-{1:d}", imgRef.gen, imgRef.num);
>>>> - char *idBuffer = copyString(surfaceId->getCString());
>>>> - st = cairo_surface_set_mime_data (image, CAIRO_MIME_TYPE_UNIQUE_ID,
>>>> - (const unsigned char *)idBuffer,
>>>> - surfaceId->getLength(),
>>>> - gfree, idBuffer);
>>>> - if (st)
>>>> - gfree(idBuffer);
>>>> - delete surfaceId;
>>>> + status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID,
>>>> + "poppler-surface-", imgRef);
>>> We could pass ref->getRef() directly here.
>>>
>>>> }
>>>> #endif
>>>> + if (!status) {
>>> status might be uninitialized at this point if cairo version is not
>>> recent enough or the ref && ref->isRef() condition is False.
>>>
>>>> + status = cairo_surface_set_mime_data (image, mime_type,
>>>> + (const unsigned char *)strBuffer, len,
>>>> + gfree, strBuffer);
>>>> + }
>>>
>>>> - st = cairo_surface_set_mime_data (image,
>>>> - str->getKind() == strDCT ?
>>>> - CAIRO_MIME_TYPE_JPEG : CAIRO_MIME_TYPE_JP2,
>>>> - (const unsigned char *)strBuffer, len,
>>>> - gfree, strBuffer);
>>>> - if (st)
>>>> + if (status)
>>>> gfree (strBuffer);
>>>> }
>>>> }
>>>
>>>> On 11/01/15 23:05, suzuki toshiya wrote:
>>>>> Dear Adrian,
>>>>>
>>>>> OK - considering that there is a lag to the day that Linux
>>>>> distribution ship the poppler stable branch including
>>>>> CairoOutputDev with JBIG2 support even if I support some
>>>>> snapshot of Cairo in current development branch of poppler
>>>>> (and there would be no need to support Cairo snapshot in
>>>>> the day), I decided to follow original version checking.
>>>>>
>>>>> Also I removed the default value (NULL) for Ref object to
>>>>> JBIG2Stream constructor and removed NULL pointer checking
>>>>> in JBIG2Stream constructor (as same as the argument to pass
>>>>> Globals object).
>>>>>
>>>>> I replaced "st" in CairoOutputDev, by "status".
>>>>>
>>>>> Here I attached revised patch - oh, I'm quite sorry, my
>>>>> previous posts were misunderstanding this year as 2014.
>>>>>
>>>>> Regards,
>>>>> mpsuzuki
>>>>>
>>>>> Adrian Johnson wrote:
>>>>>> On 08/01/15 13:35, suzuki toshiya wrote:
>>>>>>> Dear Adrian,
>>>>>>>
>>>>>>> Great Thank you for prompt review! I attached the revised patch,
>>>>>>> but please let me discuss a few points.
>>>>>>>
>>>>>>> Adrian Johnson wrote:
>>>>>>>> On 07/01/15 02:58, suzuki toshiya wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Here I submit a preliminary patch for git head,
>>>>>>>>> which enables JBIG2 embedding to PDF surface via Cairo.
>>>>>>>> +static cairo_status_t setJBIG2GlobalsIdByRefNums(int refNum, int
>>>>>>>> refGen,
>>>>>>>> + cairo_surface_t
>>>>>>>> *image)
>>>>>>>> +{
>>>>>>>> + GooString *globalsStrId;
>>>>>>>> + char *idBuffer;
>>>>>>>> + cairo_status_t st;
>>>>>>>> +
>>>>>>>> +
>>>>>>>> + globalsStrId = new GooString;
>>>>>>>> + globalsStrId->appendf("{0:d}-{1:d}", refNum, refGen);
>>>>>>>> + idBuffer = copyString(globalsStrId->getCString());
>>>>>>>> + st = cairo_surface_set_mime_data (image,
>>>>>>>> CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID,
>>>>>>>>
>>>>>>>> The cairo JBIG2 mime types are new to 1.14.0 so will need to be
>>>>>>>> inside a
>>>>>>>> #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0).
>>>>>>> If it's acceptable, I want to use the conditional like
>>>>>>> #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID
>>>>>>> The official release of cairo including JBIG2 support is 1.14.0,
>>>>>>> however, some distributions use a snapshot before 1.14 which
>>>>>>> supports JBIG2. For example, Ubuntu 14.04 (long time support)
>>>>>>> ships with 1.13.0~20140204.
>>>>>>>
>>>>>>> http://packages.ubuntu.com/search?suite=trusty§ion=all&arch=any&keywords=libcairo2&searchon=names
>>>>>>>
>>>>>>>
>>>>>>> It includes JBIG2 support. Thus I prefer checking by the JBIG2
>>>>>>> macros, instead of the version checking.
>>>>>> I'm not keen on making an exception for this case. The target audience
>>>>>> for this is very small. It would only help users that compile a new
>>>>>> version of poppler on an old distribution without also recompiling
>>>>>> cairo. JBIG2 support for pdftocairo is a very minor feature that not
>>>>>> many people would be using (so far no one else has requested this
>>>>>> feature).
>>>>>>
>>>>>> If you really want to support distributions using a cairo 1.13 snapshot
>>>>>> I suggest adding a comment before the first occurrence of
>>>>>> #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID to explain why the version macro
>>>>>> is not used. Also include
>>>>>> "#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)" in the comment so
>>>>>> that:
>>>>>> a) searching for CAIRO_VERSION will find it, and
>>>>>> b) to document that JBIG2 support is in 1.14 or later.
>>>>>>
>>>>>>> BTW, when the string for UNIQUE_ID is constructed, its syntax
>>>>>>> is RefGen-RefNum. In PDF data, RefNum then RefGen is popular
>>>>>>> syntax. Is there any reason why the order is reverted?
>>>>>> No reason. Feel free to change it to num-gen order.
>>>>>>
>>>>>>>> +GBool CairoOutputDev::setMimeDataForJBIG2Globals(Stream *str,
>>>>>>>> + cairo_surface_t
>>>>>>>> *image)
>>>>>>>> +{
>>>>>>>> + JBIG2Stream *jbig2Str = static_cast<JBIG2Stream *>(str);
>>>>>>>> + Object* globalsStr = jbig2Str->getGlobalsStream();
>>>>>>>> + char *globalsBuffer;
>>>>>>>> + int globalsLength;
>>>>>>>> +
>>>>>>>> + // According to PDF spec, Globals should be stream (or nothing)
>>>>>>>> +
>>>>>>>> + if (globalsStr->isNull())
>>>>>>>> + return gTrue;
>>>>>>>> +
>>>>>>>> + if (!globalsStr->isStream())
>>>>>>>> + return gFalse;
>>>>>>>> +
>>>>>>>> + // See JBIG2Stream constructor for valid reference number
>>>>>>>> + if (!jbig2Str->getGlobalsStreamRefNum())
>>>>>>>> + return gFalse;
>>>>>>>>
>>>>>>>> The globalsStr->isStream() check should be sufficient. The reference
>>>>>>>> number should always be valid of the global stream is valid.
>>>>>>> Umm. Requesting "both of JBIG2Stream::globalsStream and Ref to
>>>>>>> it must be always valid as far as GlobalsStream is used" is
>>>>>>> intuitive solution, but public JBIG2Stream constructor has not
>>>>>>> requested the Ref to GlobalsStream before in earlier version.
>>>>>>>
>>>>>>> If some developers use JBIG2Stream constructor with valid
>>>>>>> GlobalsStream but without Ref to it (as the syntax before my
>>>>>>> patch), there is the situation that JBIG2Stream::globalsStream
>>>>>>> is valid but Ref to it is invalid (if I supply the default
>>>>>>> value for the argument to pass Ref). How do we detect or prevent
>>>>>>> such case?
>>>>>>>
>>>>>>> A) Supplying the default value (NULL pointer) in the argument
>>>>>>> to pass the Ref, and JBIG2Stream returns an error when
>>>>>>> globalsStream is valid stream but its Ref is not passed - and
>>>>>>> prevent the case that globalsStream is valid but its Ref is
>>>>>>> invalid.
>>>>>>>
>>>>>>> B) Not supplying the default value, and prevent the construction
>>>>>>> without the Ref to the globalsStream.
>>>>>>>
>>>>>>> C) Add new boolean value in JBIG2Stream class, to indicate if
>>>>>>> Ref to globalsStream is explicitly set.
>>>>>>>
>>>>>>> D) Initialize Ref with irregular value (e.g. 0,0 or 0,65535)
>>>>>>> internally when Ref is not passed, and deal the case as Ref to
>>>>>>> the globalsStream is not initialized (my last patch design)
>>>>>>>
>>>>>>> Which is the best? Or, no need to case such case?
>>>>>> JBIG2Stream is not part of the public API so there is no need to
>>>>>> preserve compatibility.
>>>>>>
>>>>>> Something else I just noticed - could you rename "st" to "status".
>>>>>> CairoOutputdev.cc always uses the name "status" for cairo status so we
>>>>>> should be consistent with this convention.
>>>>>>
>>>> _______________________________________________
>>>> poppler mailing list
>>>> poppler at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>
>
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>
More information about the poppler
mailing list