[Libreoffice] Excel 2003 XML format

Peter Jentsch pjotr at guineapics.de
Wed Mar 23 12:04:36 PDT 2011

Hi Kohei,

I'm currently working on replacing the Java based XSLT transformations
to using libxslt.

The Office 2003 XML filters are escpecially interesting: they use a
Saxon (Java) extension function to extract embedded OLE streams from
Word 2003 XML in a way Word 2003 can understand (I think that extension
is not being used in the Excel import/export filters). I'm unsure about
the state of the Java based transformation: for a simple testdocument
that embeds a BMP into a writer document the OLE object doesn't survive
a round-trip, neither in LibreOffice nor in the OOo 3.2.1, Windows or
Linux, so I'm unsure if the whole extension function stuff does anything
meaningful at all (feedback about that, including a bunch of sample
documents, greatly appreciated).

Anyway, with Michaels kind support I was able to port that extension
function to C++/libxslt, and I'm able to export and import documents
now, but I guess I'm far from finished with that. The XSLT scripts need
some tweaking, not so much because they depend on XSLT 2.0 features, but
because libxslt has a different notion about some XSLT features than
Saxon has.

So, I guess I'll be able to provide a patch that completely replaces
that Java bases XSLT transformations with C++/libxslt based ones in a
reasonable timeframe ... but! That work will most benefit small
documents because we don't need to start a JVM each time someone wants
to load a document. With large documents, it's the structure of the XSLT
files that affects performance most. I don't expect libxslt to do
wonders with respect to processing the quite complex rules in the office
2003 xml filters on a large document.

I haven't dug very deeply into the Office 2003 import / export filters:
considering the fact that Office 2003 XML will not gain popularity in
the future, I'd be personally prefer  dropping support for it completely
and focus on creating a rock-solid OOXML/ODF roundtrip experience. But
someone with a large body of Office 2003 XML documents will think
diffently about that. But anyway: I'm currently close to where that bug
sits, so yes, assign it to me for the moment. Unfortunately I can't
reliable predict how much time I have avaible for hacking on LibreOffice
altogether, so if I feel I'm unable to fix it I'll have to pass it on.

w/regard to dropping xslt filters altogeher: I see the XSLT framework as
a perfect starting point for implementing individually crafted special
purpose filters, so I really wouldn't want that to go away. Maybe it's
not the right platform for a generic bridge to that other office suite.



Am 23.03.11 18:05, schrieb Kohei Yoshida:
> On Wed, 2011-03-23 at 16:03 +0000, Michael Meeks wrote:
>> Hi Kohei,
>> On Wed, 2011-03-23 at 10:06 -0400, Kohei Yoshida wrote:
>>> https://bugs.freedesktop.org/show_bug.cgi?id=35543
>>> I'd like to know if anybody has an opinion about this problem.  Honestly
>>> I would LOVE to remove all of our current XSLT-based filters since they
>>> cause major performance issues & not scalable at all.
>> 	I believe the plan was to convert them to XSLT 1.0 and use the much
>> faster built-in XSLT support that Peter is working on.
> Ok.  In that case, is it okay to assign the above bug to you, Peter?
> Kohei
> _______________________________________________
> LibreOffice mailing list
> LibreOffice at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/libreoffice

More information about the LibreOffice mailing list