[Libreoffice] Excel 2003 XML format

Cedric Bosdonnat cbosdonnat at novell.com
Wed Apr 13 00:52:02 PDT 2011

Hi Peter,

On Tue, 2011-04-12 at 23:28 +0200, Peter Jentsch wrote:
> Am Montag, den 11.04.2011, 10:48 +0100 schrieb Michael Meeks:
> > Hi Peter,
> > 
> > On Mon, 2011-04-11 at 00:11 +0200, Peter Jentsch wrote:
> > 	Oh - completely :-) I'm not disagreeing, just trying to find someone
> > who you can work with - so eg. which component: Calc, Writer, Impress
> > are you most interested in ? :-)
> > 
> Well then, that's Writer. 

Then the code for the filter is sitting in two places:
  * import is in the writerfilter module
  * export sits in sw/source/filter/ww8

I'ld say that the easiest to get started with is the export filter...
but it has much less bugs and missing bits. The idea for the import
filter is the following:
  * a tokenizer sends tokens to the domain mapper
  * the domain mapper is the one actually doing the job on the document
The OOXML tokenizer's code is located in writerfilter/source/ooxml and
the domain mapper is located in writerfilter/source/dmapper. The OOXML
tokenizer is pretty complex to understand at first sight. Here are some
keys to understand it:
  * sax handlers are generated from an XML description of the spec
(model.xml file). This generation is done using some of the many xsl
files in the ooxml folder.
  * The generated handlers all end up calling some more method in a
ContextHandler defined in the ooxml folder.

Some other infos (not much) can be found on that page of the OOo wiki:

> > 	I guess the best thing to do is, either to look for OOXML import or
> > export bugs - which often are disguised round-trip interop problems, I
> > imagine we have a number of them in bugzilla. Failing that, I'm sure we
> > have a number of guys interested in interop problems there that would
> > love to have your help :-)
> I'll just have a look at the ooxml filters and try to figure what's
> happening there and then take a stab at a bug, and then see where I want
> to go from there. 

There are quite some bugs on that and they aren't necessarily easy to
handle. A nice start would be to fix some of the differences between
OOXML ISO standard and OOXML Ecma v1 standard: those differences often
include easy to hack things.

If you have questions, feel free to ping me on IRC; my nick is


Cedric Bosdonnat

More information about the LibreOffice mailing list