[Libreoffice] writerfilter vs XSL

Miklos Vajna vmiklos at frugalware.org
Fri May 20 03:37:53 PDT 2011


On Thu, May 19, 2011 at 12:52:41PM +0200, Cedric Bosdonnat <cedric.bosdonnat.ooo at free.fr> wrote:
> As you'll work on the tokenizer, I think it would be nice to introduce
> some kind of tokens dumper replacing the dmapper that would dump what
> goes in the dmapper. That would possibly provide some way to isolate
> whether the import problem comes from the tokenizer (specific to each
> format) or the domain mapper (that would impact all handled formats).

Yes, that makes sense.

> You would then have a much more reliable way to test that your tokenizer
> is working... but that wouldn't help testing the domain mapper. To test
> that one, I think that mostly conversions like those you are explaining
> are helping.

OK.

> > (I already heard of the xml dumper for the rendered layout, is there
> > something similar for the internal document model?)
> 
> Yes, the ODF is a pretty good representation of the internals... though
> we could surely implement something nearer from the actual data
> structures. Let me know if it would be of any use to create such a
> dumper... I'm sure we could come pretty quickly to something useful.

Fine, I'll use ODF for now, then if it turns out to be too much trouble,
we can still work on a dumper.

Other question: writerfilter seems to use a lot of XSL to extract
required data from the spec, we agreed that this is a problem as XSL is
hard to maintain. Now if I follow this way, RTF would introduce another
bunch of XSL. :)

So, what could be a solution here? Possible ideas from me:

- even with its problems, we have nothing better, introducing new XSL
  code for RTF is not the best, but let's live with it. (the
  conservative one)

- write C++ code to do the transformations build-time (the "i don't know
  any scripting languages" one)

- use perl or Python to do the transformations (my perl-fu is weak, but
  it's doable; I would vote for Python, but not sure about reusing our
  internal python in the build system is a problem or not)

Thanks.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20110520/77f2ed56/attachment.pgp>


More information about the LibreOffice mailing list