[Libreoffice] Unexpected failures (eg. segfaults) using PyUNO and LibreOffice/OpenOffice
mstahl at redhat.com
Fri Oct 21 02:42:11 PDT 2011
On 19/10/11 10:17, Dag Wieers wrote:
> During the course of the LibreOffice conference in Paris, we (the
> unoconv and cloudooo projects) found that some of the issues our users
> were having while doing document conversions using PyUNO and OpenOffice
> and LibreOffice were not related to our own project, but have a
> root-cause in either PyUNO or LibreOffice/OpenOffice.
> The result of these issues are various and individual:
> - segfaults
> - various error codes
> - PyUNO crashes
> - memory leaks
> - xslt problems
> And while some of them are reproducable (and consistent), others are
> not, which makes me believe they are related to internal state or timing
> issues of LibreOffice/OpenOffice or related to import/export filters.
> Since these issues are very common and can be triggered very quickly, we
> would like to have developers look at them to see what is the cause and
> how we can fix them.
it is well known that the threading implementation in the OOo
applications is rather unreliable.
currently for thread safety the implementers of UNO APIs are required to
explicitly use low-level synchronization primitives such as mutexes.
not doing it correctly (such as locking a mutex while it should not be
locked, or forgetting to lock a mutex while it should be locked) lead to
very subtle problems that do not show up during ordinary office use, and
are extremely difficult to reproduce.
basically the only way for developers to find these issues is via the
subsequenttests, which currently are mostly implemented in Java and
connect to the OOo instance via a UNO remote bridge.
and the only issues that are half-way easy do debug are deadlocks; in
case of missing locks you may get a memory corruption _somewhere_ which
causes some later test to crash, but it is very difficult to track down
the root cause.
also, most of the developers who work on the applications are not
experts in multi-threading issues (those who are tend to work on the
lower-level layers like the URE). for example i discovered once that in
Writer almost all destructors of UNO objects do not lock a mutex but
then call into the Writer core (have partially fixed this for OOo 3.3).
so as a result of all of this driving OOo/LO via remote bridges is
some have suggested the best way out of this is to find a way so that
implementers of UNO APIs do not have to care about thread safety
themselves, but instead there should be a framework that does it
automatically. such a framework actually exists for many years now (Kay
Ramme's "UNO threading framework"), but most of OOo/LO does not make use
of it (iirc it is used for only some database drivers).
of course there may also be problems in PyUNO on top of that; back at
Sun we had nothing that depended on PyUNO so i guess nobody spent much
time debugging it...
> The cloudooo project has tested about 100.000 conversions and
> implemented some techniques to overcome the issues by monitoring the
> libreoffice process for memory leaks and 'endless loops', and retrying
> on failure. In the end this brought the failure rate down from about 10%
> tot 1.1%.
yes, there are various ways to minimize the risk of failure, no doubt
you are already doing most of these:
- monitor the OOo instance and restart it
- only connect to an OOo instance from a single thread (should result in
fewer problems, but e.g. with a JVM you still effectively get multiple
connections, don't know about PyUNO)
> Both the cloudooo and unoconv presentations will become available and
> contain some information on both projects and the PyUNO/LO unreliabilities.
> Below is some
> example failure output from a single run, LibreOffice does seem a bit
> more stable than OpenOffice though.
there are a lot of XSLT errors; LO (at least in 3.4) ships a different
XSLT implementation, perhaps that has helped...
More information about the LibreOffice