[Libreoffice] subsequenttest hang ...
Stephan Bergmann
sbergman at redhat.com
Mon Sep 26 08:33:20 PDT 2011
On 09/24/2011 12:48 PM, Michael Meeks wrote:
> I'm poking at an endless hang in the smoketest:
>
> #12 0xb7d24aec in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
> #3 0xb7f1b6c0 in osl_waitCondition ()
> from /data/opt/libreoffice/core/solver/unxlngi6.pro/lib/libuno_sal.so.3
> #4 0xb72db42a in osl::Condition::wait (this=0xbfffb8c4, pTimeout=0x0)
> at /data/opt/libreoffice/core/solver/unxlngi6.pro/inc/osl/conditn.hxx:84
> #5 0xb72d9024 in (anonymous namespace)::Test::test (this=0xb7c16008)
> at /data/opt/libreoffice/core/smoketestoo_native/smoketest.cxx:200
> #6 0xb72d9e2e in CppUnit::TestCaller<<unnamed>::Test>::runTest(void)
> (this=0xb73ac0a8)
> at /data/opt/libreoffice/core/solver/unxlngi6.pro/inc/cppunit/TestCaller.h:166
>
> If I were a betting man I'd say this is down to us waiting on a
> condition, and not spinning the main-loop; but (to be honest) this
> remote-control nonsense is somewhat opaque to me. I see no live
> soffice.bin process being controlled. I was slightly amazed to read:
>
> toolkit/source/awt/AsyncCallback::addCallback()
>
> which seems to do nothing / not fire an exception if
> Application::IsInMain() is not true - which is in itself odd.
>
> I have another quiescent thread:
>
> #2 0xb7d24b44 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
> from /lib/libc.so.6
> #3 0xb7f3f18e in ?? ()
> from /data/opt/libreoffice/core/solver/unxlngi6.pro/lib/libuno_sal.so.3
> #4 0xb7c28b05 in start_thread (arg=0xb7c0fb70) at pthread_create.c:297
> #5 0xb7d16d5e in clone () from /lib/libc.so.6
>
> So - I'm tempted to say:
>
> Result result;
> // Shifted to main thread to work around potential deadlocks
> (i112867):
> com::sun::star::awt::AsyncCallback::create(
> connection_.getComponentContext())->addCallback(
> new Callback(
> disp, url, css::uno::Sequence< css::beans::PropertyValue
>> (),
> new Listener(&result)),
> css::uno::Any());
> result.condition.wait();
> CPPUNIT_ASSERT(result.success);
>
> should be a timed wait - but only if we fail if the timeout is
> triggered (ie. not on the common path). I've committed that at 30
> seconds - possibly this needs tweaking to be infinite when under the
> debugger.
A timed wait is no solution here. (Timeouts in this kind of code pose
at least two problems. For one, they prevent a human from coming back
to a hung "make check" after a while, only to find out they no longer
get a clue where it hang, as the build has unhelpfully been forced to
move forward. For another, what is typically also needed is proper
cleanup, like killing abandoned sub-processes, so that manual
intervention is needed, anyway.) The real solution, instead, is to not
only wait on the Result object, but also on the OfficeConnection. Fixed
as
<http://cgit.freedesktop.org/libreoffice/core/commit/?id=c09b966f94f5a50fe537916398451339f008947d>.
-Stephan
More information about the LibreOffice
mailing list