[poppler] Marked content, Gfx.cc

Nils Höglund nils.hoglund at gmail.com
Thu Feb 11 04:50:53 PST 2010


Should I file a bug report?

Kind regards,

Nils Höglund

2010/2/10 Nils Höglund <nils.hoglund at gmail.com>:
> Albert,
>
> For example: http://www.bbc.co.uk/guidelines/futuremedia/accessibility/bbc_accessible_pdf_master17.pdf.
> However, the issue seems to be present with many tagged PDF documents.
>
> I have added the member variables int begincounter and int endcounter
> and initialized them to zero in the HtmlOutputDev constructor and
> implementing the following test methods:
>
> void HtmlOutputDev::beginMarkedContent(char *name, Dict *properties) {
>        begincounter++;
>        fprintf(stderr, "begincounter:%d\n",begincounter);
> }
> void HtmlOutputDev::endMarkedContent(GfxState *state) {
>        endcounter++;
>        fprintf(stderr, "endcounter:%d\n",endcounter);
> }
>
> I run the program pdftohtml and look at the output:
>
> $ utils/pdftohtml ../bbc_accessible_pdf_master17.pdf 2>&1 1>/dev/null | tail -5
> begincounter:382
> endcounter:395
> begincounter:383
> endcounter:396
> endcounter:397
> $
>
> I see that the beginMarkedContent method is called 383 times in total
> while the endMarkedContent method is called 397 times in total.
>
> If I add an else-statement at the end of the Gfx::opBeginMarkedContent
> method, the numbers match up:
>
>  if(numArgs == 2 && args[1].isDict ()) {
>    out->beginMarkedContent(args[0].getName(),args[1].getDict());
>  } else {
>    out->beginMarkedContent(args[0].getName(),NULL);
>  }
>
> $ utils/pdftohtml ../bbc_accessible_pdf_master17.pdf 2>&1 1>/dev/null | tail -5
> endcounter:395
> begincounter:396
> begincounter:397
> endcounter:396
> endcounter:397
> $
>
> I am using the latest version of poppler in the git repository (master branch).
>
>
> Kind regards,
>
> Nils Höglund
>
> On 9 February 2010 22:11, Albert Astals Cid <aacid at kde.org> wrote:
>> A Dimarts, 9 de febrer de 2010, Nils Höglund va escriure:
>>> Hi,
>>>
>>> In Gfx::opBeginMarkedContent, I would change the end of the function
>>> to something like:
>>>
>>> if(numArgs == 2 && args[1].isDict ()) {
>>>   out->beginMarkedContent(args[0].getName(),args[1].getDict());
>>> } else if(numArgs == 1) {
>>>   out->beginMarkedContent(args[0].getName(),NULL);
>>> }
>>>
>>> (adding the else clause)
>>>
>>> Otherwise beginMarkedContent and endMarkedContent will be unbalanced
>>> (called different amount of times) in the output device.
>>
>> Do you have any pdf to reproduce the problem?
>>
>> Albert
>>
>>>
>>>
>>> Kind regards,
>>>
>>>
>>> Nils Höglund
>>> _______________________________________________
>>> poppler mailing list
>>> poppler at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>


More information about the poppler mailing list