Roundtripping DOCX file in CLI vs API vs GUI - different results !

Miklos Vajna vmiklos at collabora.co.uk
Wed Oct 16 10:27:35 CEST 2013


Hi Adam,

On Tue, Oct 15, 2013 at 07:05:32PM +0300, Adam Fyne <adam.fyne at cloudon.com> wrote:
> I was wondering if you have encountered this, or know what is the root cause
> of this.
> 
> There are 3 different ways to perform a roundtrip of a DOCX file using
> LibreOffice:
> 
> *         GUI        -              This way you run LibreOfffice, open the
> file, do 'save as', choose 'DOCX' and type the filename
> 
> *         API         -              This way you simply load\import the
> LibreOffice libraries - and use the API to convert a file from one format to
> another
> 
> *         CLI          -              This way you type in the console
> something like './soffice -conver-to '.
> 
>  
> 
> What we have noticed is - that the results of the outputted DOCX files are
> not visually identical in all 3 way !

One pitfall is that on the export side, we have two DOCX export filters:
one is 'Office Open XML Text', the other is 'Microsoft Word 2007/2010
XML'. They are basically the same filter, but in case the ECMA standard
and what Word does differs, the use can choose what does she prefer.

The filter queries its name like this:

http://opengrok.libreoffice.org/xref/core/sw/source/filter/ww8/docxattributeoutput.cxx#1900

In the past, for some reason --convert-to used the ECMA filter, while
gui save as used the Word filter, in case "docx" was given as a file
extension. So this can be one problem -- though I just checked with
latest master, and now I don't see this anymore, even --convert-to uses
the Word filter (which is good).

An other pitfall is that in Writer, layout is rendered in an idle loop
(there are at least two iterations, the first should be quick, the
second should be more correct), so if you use --convert-to, we somehow
have to wait for the layout. This kind of problem used to cause problems
with unit tests as well: we wanted to test layout, but layout wasn't
ready right after loading. ;-)

That's why we have this code for unit tests:

http://opengrok.libreoffice.org/xref/core/sw/qa/extras/inc/swmodeltestbase.hxx#92

A third pitfall is that the "main()" of unit tests and the "desktop app"
is not the same, it happened in the past that desktop app exported
images, while unit tests didn't. That's how we have code like this:

http://opengrok.libreoffice.org/xref/core/test/source/bootstrapfixture.cxx#69

All in all, there can be multiple reasons why such output differs, of
course ideally none of them should happen.

> Difference in naming of image files in "media" folder
> 
> (e.g. file round trip through CLI may have image1/2/3.png as names whereas
> file round trip through API may have image101/102/103.png or different.

I haven't noticed such difference earlier, so that's something that needs
tracking down.

> Entire Table of Contents is lost if round trip through CLI/API whereas it is
> retained (largely) if done through GUI.

We have this writerfilter/source/dmapper/ModelEventListener.cxx class,
which reacts on this "OnFocus" even of the document, and does some field
updating -- isn't it possible that in case of CLI/API, that event is
never emitted, and that causes your problem?

> I don't really understand how come the filter behaves differently between
> CLI\API and GUI.

I guess in all cases it's not the filter itself, it's the layout, filter
name, some API what the filter tries to use, etc. -- that differs.

> If there is some duplicate code and we are doing our fixing in the wrong
> place than we need to be aware of it. 
> 
> Maybe there is some additional code to the filter in the core model that is
> not running in the CLI ?

There is some duplication, as mentioned above for unit tests vs desktop
app (desktop::Desktop::Main() vs test::BootstrapFixture::setUp()), but
in your GUI/--convert-to/API case, that all should use the desktop main
loop, so should not affect you.

> We have opened a bug <https://bugs.freedesktop.org/show_bug.cgi?id=70481>
> for this in Bugzilla, 
> 
> but I was wondering if you have any thoughts about this or if you have
> encountered this before.

As written above, not this one, but I saw similar issues, yes.

Reading the bugreport, Owen mentions that the situation used to be a bit
better, that is worth to investigate as well (e.g. if it's a
regression, what commit caused that), I haven't done that so far.

Hope this helps,

Miklos
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20131016/4223ebe4/attachment.pgp>


More information about the LibreOffice mailing list