Document conversion engine

Sun Jul 8 16:25:56 PDT 2012

Hi Michael,

nice to ear from someone so "up the ranks" like you.. makes me feel much
more important :-)

2012/7/6 Michael Meeks <michael.meeks at suse.com>

> Hi Flavio,
>
> On Tue, 2012-07-03 at 11:45 +0100, Flavio Moringa wrote:
> > my name is Flávio Moringa, I'm from Portugal and I'm starting my
> > Masters Dissertation next September (Master in Open Source software -
> > http://moss.dcti.iscte.pt ).
>
>         Welcome :-)
>

Thanks

>
> > I'm not a programmer, so what I'm interested in doing is something in
> > the lines of investigating the main conversion problems, identifying
> > the possible conversion flows, analysing the way the conversion flow
> > is implemented in LibreOffice, and eventually trying to improve this
> > flow somehow.
>
>         So - it will be hard to improve the flow without being a
> programmer I'm
> afraid :-)
>

well, although not a programmer right now I've had my fair share of perl,
python, c, bash, java, php... maybe I'm not so "fluent" in programming
right now, but I'm certainly no strange to it, and definitely not afraid to
do it if the need arises... what I meant was that I'll probably wont't be
able to do a conversion engine by myself... but I can definitely mess
around with code...

>
> > From your reply I assume that testing the filters, and doing
> > regression tests is something I could do, maybe identifying the main
> > conversion issues in groups of documents and kind of creating a "major
> > conversion issues" table, and prioritizing those issues. Is there
> > already something like that?
>
>         There is a useful QA role in prioritising bug reports and
> interoperability issues; we have a real problem with masses of bug
> reports many of which could be duplicates. Having said that -
> interoperability has many, many known feature / impedance mis-matches
> that are non-trivial development problems to fix.
>
>         One thing that -would- be really useful, and that Microsoft have
> internally, is an analysis tool for Microsoft's XML document formats -
> such that we can get a good idea of which attributes are actually used
> much. ie. by analysing and comparing a large corpus of documents out
> there, we can answer questions such as:
>
>         "should we implement surface charts, or 3D doughnut charts ?"
>
>         given whatever amount of feature-development time we have - simply
> by
> referring to the database of crunched XML files to work out which one is
> used most.
>
>         It'd be nice to have that for ODF as well too of course for when we
> have to make zero-sum back-compatibility decisions; but for
> interoperability crunching those MS documents would be really good.
>
>         Is that something you could do ? a bit of perl, zip extraction, XML
> parsing, etc. ?
>

Yes, it's definitely something I can do... I do believe that the harder
part is getting that " large corpus of documents out
there...". At least as my experience goes, I've found that it's hard to get
users to send us documents they use... either due to privacy questions or
enterprise policies... But a tool like that makes a lot of sense

>
>         Developers are -much- more likely to let themselves be lead by
> objective statistics on real documents out there, rather than subjective
> feelings of priority - which can prove rather controversial :-)
>

I can certainly relate to that...

>
>         Thanks !
>

For now then I'll start doing as you suggest and look in bugzilla for
documents with conversion problems to try and compile as much examples as I
can. Then maybe using the latest beta to do the conversion and see which
problems are still there. Then maybe starting a perl script that can scrap
the OOXML files to find the most used tags... and start from there...

>
>                 Michael.
>
> --
> michael.meeks at suse.com  <><, Pseudo Engineer, itinerant idiot
>
>

Thanks a lot for helping out.
Cheers

-- 
*Flávio Moringa*
Project Leader

Caixa Mágica Software
Energia Open Source
Rua Soeiro Pereira Gomes, Lote 1 - 4.º B,
Edifício Espanha, 1600-196 Lisboa - Portugal
Tel.: +351 217 921 260 Fax: +351 217 921 261
http://www.caixamagica.pt
https://twitter.com/flaviomoringa
https://www.facebook.com/flaviomoringa<https://www.facebook.com/flavio.moringa>
http://pt.linkedin.com/in/flaviomoringa
http://people.caixamagica.pt/flaviomoringa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20120709/5a0dd18b/attachment.html>