[poppler] Marked content, Gfx.cc

Nils Höglund nils.hoglund at gmail.com
Wed Feb 10 05:25:28 PST 2010


Albert,

For example: http://www.bbc.co.uk/guidelines/futuremedia/accessibility/bbc_accessible_pdf_master17.pdf.
However, the issue seems to be present with many tagged PDF documents.

I have added the member variables int begincounter and int endcounter
and initialized them to zero in the HtmlOutputDev constructor and
implementing the following test methods:

void HtmlOutputDev::beginMarkedContent(char *name, Dict *properties) {
        begincounter++;
        fprintf(stderr, "begincounter:%d\n",begincounter);
}
void HtmlOutputDev::endMarkedContent(GfxState *state) {
        endcounter++;
        fprintf(stderr, "endcounter:%d\n",endcounter);
}

I run the program pdftohtml and look at the output:

$ utils/pdftohtml ../bbc_accessible_pdf_master17.pdf 2>&1 1>/dev/null | tail -5
begincounter:382
endcounter:395
begincounter:383
endcounter:396
endcounter:397
$

I see that the beginMarkedContent method is called 383 times in total
while the endMarkedContent method is called 397 times in total.

If I add an else-statement at the end of the Gfx::opBeginMarkedContent
method, the numbers match up:

  if(numArgs == 2 && args[1].isDict ()) {
    out->beginMarkedContent(args[0].getName(),args[1].getDict());
  } else {
    out->beginMarkedContent(args[0].getName(),NULL);
  }

$ utils/pdftohtml ../bbc_accessible_pdf_master17.pdf 2>&1 1>/dev/null | tail -5
endcounter:395
begincounter:396
begincounter:397
endcounter:396
endcounter:397
$

I am using the latest version of poppler in the git repository (master branch).


Kind regards,

Nils Höglund

On 9 February 2010 22:11, Albert Astals Cid <aacid at kde.org> wrote:
> A Dimarts, 9 de febrer de 2010, Nils Höglund va escriure:
>> Hi,
>>
>> In Gfx::opBeginMarkedContent, I would change the end of the function
>> to something like:
>>
>> if(numArgs == 2 && args[1].isDict ()) {
>>   out->beginMarkedContent(args[0].getName(),args[1].getDict());
>> } else if(numArgs == 1) {
>>   out->beginMarkedContent(args[0].getName(),NULL);
>> }
>>
>> (adding the else clause)
>>
>> Otherwise beginMarkedContent and endMarkedContent will be unbalanced
>> (called different amount of times) in the output device.
>
> Do you have any pdf to reproduce the problem?
>
> Albert
>
>>
>>
>> Kind regards,
>>
>>
>> Nils Höglund
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>



-- 
Kind regards,


Nils Höglund


More information about the poppler mailing list