[Libreoffice-commits] core.git: sw/qa writerfilter/source

Mike Kaganski mike.kaganski at collabora.com
Thu Jul 20 09:07:29 UTC 2017


 sw/qa/extras/ooxmlimport/data/tdf108849.docx |binary
 sw/qa/extras/ooxmlimport/ooxmlimport.cxx     |    8 ++++++++
 writerfilter/source/ooxml/model.xml          |   14 +++++++++++++-
 3 files changed, 21 insertions(+), 1 deletion(-)

New commits:
commit 4b4cd502806cfc9c9cc9754b8aae18a2c2632cdc
Author: Mike Kaganski <mike.kaganski at collabora.com>
Date:   Tue Jul 18 23:02:32 2017 +0300

    tdf#108849: allow out-of-order sectPr
    
    According to ISO/IEC 29500-1:2016(E) 17.6.17), the final <w:sectPr>
    must be the last child element of the body element. Also, this is
    enforced in schema for CT_Body complex type (Annex A. (normative)
    Schemas – W3C XML Schema, A.1 WordprocessingML, page 3866), where
    sectPr is a part of <xsd:sequence>, and thus *must* stay at specific
    place in sequence, namely being the last element, and be at most one
    instance.
    
    However, real-life documents (generated by some third-party software)
    have sectPr before other body contents. Unfortunately, MS Word seems
    to allow this standards-violating content, and thus encourages
    creation of non-standard documents by third-party generators.
    
    This patch doesn't assume that current final (body-level) sectPr is
    the last body element, and does not mark current paragraph as last
    section's paragraph. Thus, current section (possibly started after
    previous paragraph-level sectPr) is continued after final sectPr is
    closed.
    
    Change-Id: I8e88288bc6659d77d17986514b3b4fe16a5b45d9
    Reviewed-on: https://gerrit.libreoffice.org/40161
    Tested-by: Jenkins <ci at libreoffice.org>
    Reviewed-by: Mike Kaganski <mike.kaganski at collabora.com>

diff --git a/sw/qa/extras/ooxmlimport/data/tdf108849.docx b/sw/qa/extras/ooxmlimport/data/tdf108849.docx
new file mode 100644
index 000000000000..6f3664374cd8
Binary files /dev/null and b/sw/qa/extras/ooxmlimport/data/tdf108849.docx differ
diff --git a/sw/qa/extras/ooxmlimport/ooxmlimport.cxx b/sw/qa/extras/ooxmlimport/ooxmlimport.cxx
index 7cbeb5cf3e68..384e0e09a053 100644
--- a/sw/qa/extras/ooxmlimport/ooxmlimport.cxx
+++ b/sw/qa/extras/ooxmlimport/ooxmlimport.cxx
@@ -1397,6 +1397,14 @@ DECLARE_OOXMLIMPORT_TEST(testTdf109053, "tdf109053.docx")
     CPPUNIT_ASSERT_EQUAL(getPages(), 2);
 }
 
+DECLARE_OOXMLIMPORT_TEST(testTdf108849, "tdf108849.docx")
+{
+    // sectPr element that is child element of body must be the last child. Hovewer, Word accepts it
+    // in wrong places, and we should do the same (bug-to-bug compatibility) without creating extra sections.
+    CPPUNIT_ASSERT_EQUAL(2, getParagraphs());
+    CPPUNIT_ASSERT_EQUAL_MESSAGE("Misplaced body-level sectPr's create extra sections!", 2, getPages());
+}
+
 // tests should only be added to ooxmlIMPORT *if* they fail round-tripping in ooxmlEXPORT
 
 CPPUNIT_PLUGIN_IMPLEMENT();
diff --git a/writerfilter/source/ooxml/model.xml b/writerfilter/source/ooxml/model.xml
index 8f78c8390d75..92e8677a8ecb 100644
--- a/writerfilter/source/ooxml/model.xml
+++ b/writerfilter/source/ooxml/model.xml
@@ -13194,6 +13194,14 @@
         </element>
         <ref name="AG_SectPrAttributes"/>
       </define>
+      <define name="CT_finalSectPr">
+        <ref name="EG_HdrFtrReferences"/>
+        <ref name="EG_SectPrContents"/>
+        <element name="sectPrChange">
+          <ref name="CT_SectPrChange"/>
+        </element>
+        <ref name="AG_SectPrAttributes"/>
+      </define>
       <define name="ST_BrType">
         <choice>
           <!-- Page Break -->
@@ -16307,7 +16315,7 @@
       <define name="CT_Body">
         <ref name="EG_BlockLevelElts"/>
         <element name="sectPr">
-          <ref name="CT_SectPr"/>
+          <ref name="CT_finalSectPr"/>
         </element>
       </define>
       <define name="CT_ShapeDefaults">
@@ -17844,6 +17852,10 @@
       <element name="sectPrChange" tokenid="ooxml:CT_SectPr_sectPrChange"/>
       <action name="start" action="setLastParagraphInSection"/>
     </resource>
+    <resource name="CT_finalSectPr" resource="Properties">
+      <action name="start" action="handleLastParagraphInSection"/>
+      <element name="sectPrChange" tokenid="ooxml:CT_SectPr_sectPrChange"/>
+    </resource>
     <resource name="ST_BrType" resource="List">
       <value tokenid="ooxml:Value_ST_BrType_column">column</value>
       <value tokenid="ooxml:Value_ST_BrType_page">page</value>


More information about the Libreoffice-commits mailing list