[Libreoffice-bugs] [Bug 52028] New: Writer 3.6 creates and applies empty automatic styles everywhere, leading to needless file complexity and size

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Jul 13 03:55:42 CEST 2012


https://bugs.freedesktop.org/show_bug.cgi?id=52028

             Bug #: 52028
           Summary: Writer 3.6 creates and applies empty automatic styles
                    everywhere, leading to needless file complexity and
                    size
    Classification: Unclassified
           Product: LibreOffice
           Version: 3.6.0.0.beta3
          Platform: Other
        OS/Version: Mac OS X (All)
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Writer
        AssignedTo: libreoffice-bugs at lists.freedesktop.org
        ReportedBy: bugs at eikota.de


Created attachment 64151
  --> https://bugs.freedesktop.org/attachment.cgi?id=64151
ZIP archive with all test files from test A to C

It took me two days to find out what's going wrong with paragraph and character
styles in LibreOffice 3.6 beta 3, so please take this serious ;-) I'm sorry I
have to give a rather lengthy explanation, but the matter is not simple ...


== Results ==

When editing even the simplest .odt files, Writer now applies automatic styles
everywhere even if this is completely unnecessary because the text formatting
does not differ at all from the current paragraph style (and/or character
style, if any). Therefore, if you keep editing a .odt file for a while, this
causes the creation of countless automatic styles, most of which are completely
empty (contain no formatting information). This leads to needless .odt file
size and complexity.

Normally this problem is not "visible" at the UI level of Writer, you have to
look at the .odt file contents with an editor which can show the contents of an
.odt file directly (e.g., BBEdit) or to save the .odt file as .fodt file which
every text editor can open. But there are circumstances under which you can see
the result of this bug directly in LibreOffice (and in printing and in PDF
export!):

* Font kerning does not work over the beginning and end of characters styles,
therefore even the most necessary kerning pairs like A + V or A + T or T + -
(which have a negative kerning value in most fonts to look better) will not get
kerned anymore if you insert one of the characters subsequently (see test C
below for further explanation).

* Ligatures don't work over the beginning and end of character styles, too,
therefore, if you use a font which contains ligature information for pairs like
f + i, they will not get applied if you insert one of these characters
subsequently.

Why is this bug a bug? You may say that it just increases the complexity of
.odt files, but is not "wrong" in a technical sense. Well, first, there are
special circumstances under which this bug makes the documents actually look
wrong (see the note about font kerning and ligatures above -- both get broken
by this bug!). And second, this bug unnecessarily increases the complexity and
size of .odt files, making the contents unnecessarily hard to read and parse,
which is IMHO against the philosophy behind the ODF file format specification
-- unlike Microsoft's strange Office 2007 XML format (.docx etc.), which may be
intentionally complex to make parsing difficult for foreign software, the ODF
file format was designed to be as simple as possible, to make it easy to write
parsers and even to allow human beings to read the XML code directly. This is
counteracted by this bug.


== Steps to reproduce ==

All the following tests have been made with LibreOffice 3.5.5.3 and LibreOffice
3.6.0 beta 3, both with German langpacks installed, both on MacOS X 10.6.8
German, and both using LibreOffice default settings (I delete/rename my user
profile completely before doing every part of these tests!).

=== Test A -- creating a new document ===

1.1  Reset your LibreOffice user profile (rename it or delete it),
     to make sure that the results are not influenced by any custom settings.
1.2  Create a new empty Writer document with LibreOffice 3.5.5.3.
1.3  Show the "Styles and Formatting" window.
1.4  In the "Styles and Formatting", double-click on the paragraph style
     "Text body" to select it.
1.5  Click once in the (empty) document text area (this may be important!
     No kidding!).
1.6  Type "Hello world!".
1.7  Save the document in .odt format.
1.8  Save the document in .fodt format.

1.9  Open the .fodt file created with LibreOffice 3.5.5.3
     with a good text editor of you choice.
1.10 Search for "<office:automatic-styles>";
     you will find only one page layout entry
       <style:page-layout style:name="pm1">
         ...
       </style:page-layout>
     but no automatic paragraph or characters styles.
1.11 Scroll down to the end of the document;
     you will find that the text contents of the document are very simple:
       <text:p text:style-name="Text_20_body">Hello world!</text:p>
     and this is how it should be.

2.1- Repeat steps 1.1 to 1.8 with LibreOffice 3.6.0 beta 3,
2.8  creating new .odt and .fodt files.

2.9  Open the .fodt file created with LibreOffice 3.6.0 beta 3
     with a good text editor of you choice.
2.10 Search for "<office:automatic-styles>"; you will find one or two (*)
     completely needless automatically created styles, one or both of:
       <style:style style:name="P1" style:family="paragraph"
       style:parent-style-name="Text_20_body">
         <style:text-properties officeooo:paragraph-rsid="001ffdd7"/>
       </style:style>
       <style:style style:name="T1" style:family="text">
         <style:text-properties officeooo:rsid="001ffdd7"/>
       </style:style>
     The paragraph style ("P1") is a child of the "Text body" style, but it
     does not contain any additional style information, i.e. it does not
     at all differ from "Text body", and this is what I call an "empty" style.
     The same is true for the character style ("T1"): it is obvious that it
     does not contain any formatting information, it is empty.
2.11 Scroll down to the end of the document;
     you will find that the text contents of the document looks somehow like:
       <text:p text:style-name="P1">
         <text:span text:style-name="T1">Hello world!</text:span>
       </text:p>
     There are two errors here:
     (a) it is really needless to use "P1" instead of "Text body" directly;
     (b) the complete <text:span ...> </text:span> is nonsense,
     because it does not change the text formatting at all.

(*) Note: under circumstances which are not completely transparent to me
(probably if you don't reset your user profile before doing these test, or if
don't klick into the main document text area before you type "Hello world!", or
if use the backspace while typing "Hello World", etc.), it is possible that in
the LibreOffice 3.6 beta 3 generated file only one of the two automatic styles
is created and applied, i.e. "P1" *or* "T1" instead of both. But even if there
is just one of these automatic styles, this still is an error, because both
automatic styles are completely needless -- just compare the file generated
with LibreOffice 3.5.5.3!


=== Test B -- inserting and appending some text ===

3.1  Reset your LibreOffice user profile (rename it or delete it),
     to make sure that the results are not influenced by any custom settings.
3.2  Duplicate the .odt file created with LibreOffice 3.5.5.3
     in step 1.2 to 1.7 above.
3.3  Open it again with LibreOffice 3.5.5.3.
3.4  Click between "Hello" and "world!" and type something like "my dear".
3.5  Click after "world!" and type something else, e.g.
     "You are God’s creation."
3.6  Save the document.
3.7  Save the document in .fodt format.

3.8  Open the .fodt file created with LibreOffice 3.5.5.3
     with a good text editor of you choice.
3.9  Search for "<office:automatic-styles>";
     you will still find only one page layout entry
       <style:page-layout style:name="pm1">
         ...
       </style:page-layout>
     but still no automatic paragraph or characters styles.
3.10 Scroll down to the end of the document;
     you will find that the text contents of the document are still simple:
       <text:p text:style-name="Text_20_body">Hello my beloved world!
       You are God’s creation!</text:p>
     and this is how it should be.

4.1- Repeat steps 3.1 to 3.7 with LibreOffice 3.6.0 beta 3,
4.7  creating new .odt and .fodt files.

4.8  Open the .fodt file created with LibreOffice 3.6.0 beta 3
     with a good text editor of you choice.
4.9  Search for "<office:automatic-styles>"; you will find the same useless
     automatically created styles mentioned above in step 2.10,
     plus one or two (*) additional empty character styles, e.g.:
       <style:style style:name="P1" style:family="paragraph"
       style:parent-style-name="Text_20_body">
         <style:text-properties officeooo:paragraph-rsid="001ffdd7"/>
       </style:style>
       <style:style style:name="T1" style:family="text">
        <style:text-properties officeooo:rsid="00214ec2"/>
       </style:style>
       <style:style style:name="T2" style:family="text">
        <style:text-properties officeooo:rsid="002177ac"/>
       </style:style>
     For all these automatic styles is true what I said in step 2.10 above:
     They don't add any formatting, they are "empty", they are useless
     and even confusing.
4.10 Scroll down to the end of the document;
     you will find that the text contents of the document looks somehow like:
       <text:p text:style-name="P1">
         Hello <text:span text:style-name="T1">my dear </text:span>
         world! <text:span text:style-name="T2">You are God’s
         creation!</text:span>
       </text:p>
     or even worse (*):
       <text:p text:style-name="P1">
         <text:span text:style-name="T1">Hello
           <text:span text:style-name="T2">my dear </text:span>
           world! <text:span text:style-name="T3">You are God’s
           creation!</text:span>
         </text:span>
       </text:p>
=>   We see that every insertion or appending of text creates a new <span>
     and a corresponding new automatic character style.
     All these <span>s and automatic styles are a mess, because they are
     useless (don't add any formatting) and just make the simple document
     unnecessarily complex.

(*) Note: if you find one or two additional automatic character styles, and
therefore two or three automatic character styles in total, depends again from
some circumstances not completely clear to my, see my note after step 2.11
above.


=== Test C -- kerning is broken ===

5.1  Reset your LibreOffice user profile (rename it or delete it),
     to make sure that the results are not influenced by any custom settings.
5.2  Create a new empty Writer document with LibreOffice 3.5.5.3.
5.3  Type "ATAA".
5.4  Save the document in .odt format.
5.5  Close the document.
5.6  Open the document again with LibreOffice 3.5.5.3.
5.7  Click between the two "A" at the end and type "T".
5.8  Click after the last "A" and type "T".

=>   The document now reads "ATATAT". Depending on some LibreOffice defaults
     settings (is kerning enabled by default -- for me, it is) and on the
     fonts of your operating system you will see (you may need to zoom in
     before!) that there is some negative kerning between every "A" and
     every "T", making the text look equally spaced.

6.1- Repeat steps 5.1 to 5.8 with LibreOffice 3.6.0 beta 3,
6.8  creating a new .odt file.

=>  The document now reads "ATATAT", too. But if you zoom in, you will see
    that this text looks strange! There is kerning around the 1st "T",
    but neither between the 2nd "A" and the 2nd "T" nor between the 2nd
    "T" and the third "A", nor before the last "T".

    Why? Every time we insert or append some text, LibreOffice 3.6
    puts it into a new <span>...</span>, and there is no kerning possible
    over the margin of a <span>; i.e., "AT" get kerned, but "A<span>T"
    or "A</span>T" does not get kerned. You can proof this if you save
    the .odt file created in step 6.1 to 6.8 as .fodt file (with
    LibreOffice 3.6 beta 3, of course) and look at the .fodt file
    with a text editor; the text contents of the document read

    <text:p text:style-name="P1">ATA
      <text:span text:style-name="T1">T</text:span>
      A<text:span text:style-name="T1">T</text:span>
    </text:p>

    or similar, and this is why kerning is working only around
    the first "T", but nowhere else. In the file created with
    LibreOffice 3.5.5.3, the same section just reads

    <text:p text:style-name="Standard">ATATAT</text:p>

    and this is how it should be, making kerning possible.


=== Test D -- getting everything right. ===

To remove all the unnecessary automatic paragraph and character styles and all
the needless <span>s created by LibreOffice 3.6 beta 3, just open any of the
.odt files created by LibreOffice 3.6 beta 3 with LibreOffice 3.5.5.3, insert a
space, delete it again, save the document -- and all the mess is gone, as you
will see if you save the file as .fodt file and open the latter with a good
text editor. Voilá!

This seems to indicate that there is some "tidy up" code in LibreOffice 3.5
which removes unnecessary automatic styles and <span>s, but just stopped
working for some reason in LibreOffice 3.6 beta 3. We need to re-enable it.


Attached to this bug report you will find a ZIP file containing all sample
documents from test A to C above.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Libreoffice-bugs mailing list