[Libreoffice-bugs] [Bug 108849] New: DOCX IMPORT: Extra pages and wrong page sizes in a specific document

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Thu Jun 29 08:20:12 UTC 2017


https://bugs.documentfoundation.org/show_bug.cgi?id=108849

            Bug ID: 108849
           Summary: DOCX IMPORT: Extra pages and wrong page sizes in a
                    specific document
           Product: LibreOffice
           Version: unspecified
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Keywords: filter:docx
          Severity: normal
          Priority: medium
         Component: Writer
          Assignee: libreoffice-bugs at lists.freedesktop.org
          Reporter: mikekaganski at hotmail.com

Created attachment 134374
  --> https://bugs.documentfoundation.org/attachment.cgi?id=134374&action=edit
A sanitized DOCX that has only 2 pages in Word

The attached test document has only 2 pages in Word, first 15x10 cm, and second
25x20 cm (both landscape), having one paragraph with short text each.

When imported into LibreOffice, it has 4 pages: first (empty) 10x15 portrait,
second (empty) 25x20 landscape, third 15x10 cm landscape (with text "Page 1"),
and fourth Letter-sized (with text "Page 2").

The document is sanitized version of a real-life document generated by a
third-party report generator. It actually is invalid OOXML, with last section
defined in wrong place.

According to ISO/IEC 29500-1:2016(E) 17.6.17 sectPr (Document Final Section
Properties), the final <w:sectPr> must be the last child element of the body
element. Also, this is enforced in schema for CT_Body complex type (Annex A.
(normative) Schemas – W3C XML Schema, A.1 WordprocessingML, page 3866), where
sectPr is a part of <xsd:sequence>, and thus *must* stay at specific place in
sequence, namely being the last element, and be at most one instance.

However, the test document has two sectPr before other body contents.
Unfortunately, MS Word seems to allow this standards-violating content, and
thus encourages creation of non-standard documents by third-party generators.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20170629/3874f1bf/attachment.html>


More information about the Libreoffice-bugs mailing list