[Poppler-bugs] [Bug 89239] pdftohtml produces wrongly nested tags

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Mar 13 04:29:01 PDT 2015


https://bugs.freedesktop.org/show_bug.cgi?id=89239

albbas at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #114252|0                           |1
        is obsolete|                            |

--- Comment #8 from albbas at gmail.com ---
Created attachment 114284
  --> https://bugs.freedesktop.org/attachment.cgi?id=114284&action=edit
Fix opening and ending tag mismatch when resulting xml document contains
invalid xml chars

The difference between this patch and patch 114242 is this:
diff --git a/utils/HtmlOutputDev.cc b/utils/HtmlOutputDev.cc
index d725578..4915030 100644
--- a/utils/HtmlOutputDev.cc
+++ b/utils/HtmlOutputDev.cc
@@ -480,14 +480,16 @@ static bool tag_exists( std::list<std::string> tags,
std::string tag )

 static void CloseTag(GooString *htext, std::list<std::string> &tags,
std::string tag)
 {
+    size_t index = strlen(htext->getCString());
     while( !tags.empty() && tags.back() != tag ) {
         std::string current_tag = tags.back();
-        htext->append(current_tag.c_str(), current_tag.length());
+        htext->insert(index, current_tag.c_str());
+        index += current_tag.length();
         tags.pop_back();
     }
     if( !tags.empty()) {
       std::string current_tag = tags.back();
-      htext->append(current_tag.c_str(), current_tag.length());
+      htext->insert(index, current_tag.c_str());
       tags.pop_back();
     }
 }

When the htext variable contains what produces the "PCDATA invalid Char value"
errors in xmllint, the append function does not work as it should.

To force the ending tags to be appended to the GooString, the insert function
is used instead.

This produces output that does not have the "opening and ending mismatch" and
"premature end of data in tag" errors when ran through xmllint.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20150313/dbe42e68/attachment-0001.html>


More information about the Poppler-bugs mailing list