[Libreoffice] Ridiculous xml?

Ryan Jendoubi ryan.jendoubi at gmail.com
Fri Jul 29 03:30:43 PDT 2011


Hi all,

Recently noticed something very odd about the contents of content.xml. 
It looks like a text span has been placed around every individual word, 
with a different one around every individual space:

</text:span><text:span text:style-name="T1">Web</text:span><text:span 
text:style-name="T2">. </text:span><text:span 
text:style-name="T1">But</text:span><text:span text:style-name="T2"> 
</text:span><text:span text:style-name="T1">it</text:span><text:span 
text:style-name="T2"> </text:span><text:span 
text:style-name="T1">is</text:span><text:span text:style-name="T2"> 
</text:span><text:span text:style-name="T1">not</text:span><text:span 
text:style-name="T2">

...etc.

I can only imagine this makes the files somewhat bigger than I would 
have thought was necessary?

More of a problem for me is that I was using a shell script to inflate 
content.xml and grep it for a certain string within the text. I was 
accounting for odd whitespace, but obviously this mad tagging thwarted 
such a simple approach. Have now adapted with a perl script to strip all 
xml tags before grepping, but I'm still curious about why content.xml 
appears this way?

Might it be because the file was imported from .doc format? Is it a 
transformation "bug" of some kind?

Bests,

--Ryan


More information about the LibreOffice mailing list