[Poppler-bugs] [Bug 98305] New: -xml outputs malformed xml

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Oct 18 09:15:03 UTC 2016


https://bugs.freedesktop.org/show_bug.cgi?id=98305

            Bug ID: 98305
           Summary: -xml outputs malformed xml
           Product: poppler
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: pdftohtml
          Assignee: poppler-bugs at lists.freedesktop.org
          Reporter: daniel.van.den.ouden at gmail.com

Overview:

    The following pdf causes pdftohtml to output malformed xml:
   
http://www.atmel.com/images/Atmel-8284-8-bit-AVR-microcontroller-ATmega169A_PA_329A_PA_3290A_PA_649A_P_6490A_P_datasheet.pdf 
    The resulting xml file has multiple similar errors, the first one on line
71641:
    <text top="180" left="71" width="101" height="15" font="11"><b>Sp<a
href="Atmel-8284-8-bit-AVR-microcontroller-ATmega169A_PA_329A_PA_3290A_PA_649A_P_6490A_P_datasheet.html#876">eed
[MHz] </b>(3)</a></text>
    (the closing b and a tags are not in the correct order)

Steps to Reproduce: 

    1) wget
http://www.atmel.com/images/Atmel-8284-8-bit-AVR-microcontroller-ATmega169A_PA_329A_PA_3290A_PA_649A_P_6490A_P_datasheet.pdf 

    2) pdftohtml -q -i -xml
Atmel-8284-8-bit-AVR-microcontroller-ATmega169A_PA_329A_PA_3290A_PA_649A_P_6490A_P_datasheet.pdf
output.xml

Actual Results: 

    malformed xml

Expected Results: 

    well-formed xml. And I'm not quite sure if the link is placed on the
correct piece of text. In the pdf only the text "(3)" is clickable and none of
it is bold.

Build Date & Hardware: 

    Built on 2016-10-18 from source (0.48.0) on Ubunty 14.04 LTS

Additional Builds and Platforms: 

    Also occurred in the version of pdftohtml that was installed using apt-get
(0.28 if I recall correctly)


Cheers,


Daniel

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20161018/d4fad203/attachment.html>


More information about the Poppler-bugs mailing list