[poppler] Bug in pdftohtml

Joaquin Cuenca Abela joaquin at cuencaabela.com
Tue Jun 14 10:46:03 PDT 2011


On Tue, Jun 14, 2011 at 7:40 PM, Albert Astals Cid <aacid at kde.org> wrote:
> Please, do not answer me directly, answer the list.
>
> A Tuesday, June 14, 2011, vàreu escriure:
>> On Mon, Jun 13, 2011 at 8:23 PM, Albert Astals Cid <aacid at kde.org> wrote:
>> > A Monday, June 13, 2011, Joaquin Cuenca Abela va escriure:
>> >> Hi,
>> >>
>> >> Looking at HtmlOutputDev.cc, line 488, it tries to compute whether
>> >> str1 and str2 overlap:
>> >>
>> >>     if (str2->yMin >= str1->yMin && str2->yMin <= str1->yMax)
>> >>     {
>> >>         vertOverlap = str1->yMax - str2->yMin;
>> >>     } else
>> >>     if (str2->yMax >= str1->yMin && str2->yMax <= str1->yMax)
>> >>     {
>> >>         vertOverlap = str2->yMax - str1->yMin;
>> >>     } else
>> >>     {
>> >>         vertOverlap = 0;
>> >>     }
>> >>
>> >> it seems to be this code lacks a case for when str2 fully contains str1,
>> >> ie:
>> >>
>> >> if (str2->yMin < str1->yMin && str2->yMax > str1->yMax)
>> >>   vertOverlap = str2->yMax - str1->yMin;
>> >
>> > Do you have any pdf that gets fixed by applying this "patch"?
>>
>> No, I don't. Is it mandatory to have a PDF for any patch?
>
> No it is not, but it helps, proves you know it's fixing something.
>
>>
>> If this is actually a bug, at least it deserves a comment. Why is
>> there a vertOverlap variable that doesn't always have the vertical
>> overlap of these two strings?
>
> I have no slight idea, noone that coded that is in the poppler list, so having
> a reason to change it (read a PDF) helps.

This makes sense. I will try to cook a PDF that reproduces the problem
this heuristic is trying to fix.
Thanks for your comments!

Cheers,

-- 
Joaquin Cuenca Abela -- presspeople.com: Fuentes de prensa y comunicados


More information about the poppler mailing list