[poppler] Marked content, Gfx.cc
Nils Höglund
nils.hoglund at gmail.com
Wed Feb 10 05:25:28 PST 2010
Albert,
For example: http://www.bbc.co.uk/guidelines/futuremedia/accessibility/bbc_accessible_pdf_master17.pdf.
However, the issue seems to be present with many tagged PDF documents.
I have added the member variables int begincounter and int endcounter
and initialized them to zero in the HtmlOutputDev constructor and
implementing the following test methods:
void HtmlOutputDev::beginMarkedContent(char *name, Dict *properties) {
begincounter++;
fprintf(stderr, "begincounter:%d\n",begincounter);
}
void HtmlOutputDev::endMarkedContent(GfxState *state) {
endcounter++;
fprintf(stderr, "endcounter:%d\n",endcounter);
}
I run the program pdftohtml and look at the output:
$ utils/pdftohtml ../bbc_accessible_pdf_master17.pdf 2>&1 1>/dev/null | tail -5
begincounter:382
endcounter:395
begincounter:383
endcounter:396
endcounter:397
$
I see that the beginMarkedContent method is called 383 times in total
while the endMarkedContent method is called 397 times in total.
If I add an else-statement at the end of the Gfx::opBeginMarkedContent
method, the numbers match up:
if(numArgs == 2 && args[1].isDict ()) {
out->beginMarkedContent(args[0].getName(),args[1].getDict());
} else {
out->beginMarkedContent(args[0].getName(),NULL);
}
$ utils/pdftohtml ../bbc_accessible_pdf_master17.pdf 2>&1 1>/dev/null | tail -5
endcounter:395
begincounter:396
begincounter:397
endcounter:396
endcounter:397
$
I am using the latest version of poppler in the git repository (master branch).
Kind regards,
Nils Höglund
On 9 February 2010 22:11, Albert Astals Cid <aacid at kde.org> wrote:
> A Dimarts, 9 de febrer de 2010, Nils Höglund va escriure:
>> Hi,
>>
>> In Gfx::opBeginMarkedContent, I would change the end of the function
>> to something like:
>>
>> if(numArgs == 2 && args[1].isDict ()) {
>> out->beginMarkedContent(args[0].getName(),args[1].getDict());
>> } else if(numArgs == 1) {
>> out->beginMarkedContent(args[0].getName(),NULL);
>> }
>>
>> (adding the else clause)
>>
>> Otherwise beginMarkedContent and endMarkedContent will be unbalanced
>> (called different amount of times) in the output device.
>
> Do you have any pdf to reproduce the problem?
>
> Albert
>
>>
>>
>> Kind regards,
>>
>>
>> Nils Höglund
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>
--
Kind regards,
Nils Höglund
More information about the poppler
mailing list