[Libreoffice] writerfilter vs XSL
vmiklos at frugalware.org
Fri May 20 03:37:53 PDT 2011
On Thu, May 19, 2011 at 12:52:41PM +0200, Cedric Bosdonnat <cedric.bosdonnat.ooo at free.fr> wrote:
> As you'll work on the tokenizer, I think it would be nice to introduce
> some kind of tokens dumper replacing the dmapper that would dump what
> goes in the dmapper. That would possibly provide some way to isolate
> whether the import problem comes from the tokenizer (specific to each
> format) or the domain mapper (that would impact all handled formats).
Yes, that makes sense.
> You would then have a much more reliable way to test that your tokenizer
> is working... but that wouldn't help testing the domain mapper. To test
> that one, I think that mostly conversions like those you are explaining
> are helping.
> > (I already heard of the xml dumper for the rendered layout, is there
> > something similar for the internal document model?)
> Yes, the ODF is a pretty good representation of the internals... though
> we could surely implement something nearer from the actual data
> structures. Let me know if it would be of any use to create such a
> dumper... I'm sure we could come pretty quickly to something useful.
Fine, I'll use ODF for now, then if it turns out to be too much trouble,
we can still work on a dumper.
Other question: writerfilter seems to use a lot of XSL to extract
required data from the spec, we agreed that this is a problem as XSL is
hard to maintain. Now if I follow this way, RTF would introduce another
bunch of XSL. :)
So, what could be a solution here? Possible ideas from me:
- even with its problems, we have nothing better, introducing new XSL
code for RTF is not the best, but let's live with it. (the
- write C++ code to do the transformations build-time (the "i don't know
any scripting languages" one)
- use perl or Python to do the transformations (my perl-fu is weak, but
it's doable; I would vote for Python, but not sure about reusing our
internal python in the build system is a problem or not)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: not available
More information about the LibreOffice