[Libreoffice] subsequenttest hang ...

Stephan Bergmann sbergman at redhat.com
Mon Sep 26 08:33:20 PDT 2011


On 09/24/2011 12:48 PM, Michael Meeks wrote:
> I'm poking at an endless hang in the smoketest:
>
> #12  0xb7d24aec in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
> #3  0xb7f1b6c0 in osl_waitCondition ()
> from /data/opt/libreoffice/core/solver/unxlngi6.pro/lib/libuno_sal.so.3
> #4  0xb72db42a in osl::Condition::wait (this=0xbfffb8c4, pTimeout=0x0)
> at /data/opt/libreoffice/core/solver/unxlngi6.pro/inc/osl/conditn.hxx:84
> #5  0xb72d9024 in (anonymous namespace)::Test::test (this=0xb7c16008)
> at /data/opt/libreoffice/core/smoketestoo_native/smoketest.cxx:200
> #6  0xb72d9e2e in CppUnit::TestCaller<<unnamed>::Test>::runTest(void)
> (this=0xb73ac0a8)
> at /data/opt/libreoffice/core/solver/unxlngi6.pro/inc/cppunit/TestCaller.h:166
>
> 	If I were a betting man I'd say this is down to us waiting on a
> condition, and not spinning the main-loop; but (to be honest) this
> remote-control nonsense is somewhat opaque to me. I see no live
> soffice.bin process being controlled. I was slightly amazed to read:
>
> toolkit/source/awt/AsyncCallback::addCallback()
>
> 	which seems to do nothing / not fire an exception if
> Application::IsInMain() is not true - which is in itself odd.
>
> 	I have another quiescent thread:
>
> #2  0xb7d24b44 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
> from /lib/libc.so.6
> #3  0xb7f3f18e in ?? ()
> from /data/opt/libreoffice/core/solver/unxlngi6.pro/lib/libuno_sal.so.3
> #4  0xb7c28b05 in start_thread (arg=0xb7c0fb70) at pthread_create.c:297
> #5  0xb7d16d5e in clone () from /lib/libc.so.6
>
> 	So - I'm tempted to say:
>
>      Result result;
>      // Shifted to main thread to work around potential deadlocks
> (i112867):
>      com::sun::star::awt::AsyncCallback::create(
>          connection_.getComponentContext())->addCallback(
>              new Callback(
>                  disp, url, css::uno::Sequence<  css::beans::PropertyValue
>> (),
>                  new Listener(&result)),
>              css::uno::Any());
>      result.condition.wait();
>      CPPUNIT_ASSERT(result.success);
>
> 	should be a timed wait - but only if we fail if the timeout is
> triggered (ie. not on the common path). I've committed that at 30
> seconds - possibly this needs tweaking to be infinite when under the
> debugger.

A timed wait is no solution here.  (Timeouts in this kind of code pose 
at least two problems.  For one, they prevent a human from coming back 
to a hung "make check" after a while, only to find out they no longer 
get a clue where it hang, as the build has unhelpfully been forced to 
move forward.  For another, what is typically also needed is proper 
cleanup, like killing abandoned sub-processes, so that manual 
intervention is needed, anyway.)  The real solution, instead, is to not 
only wait on the Result object, but also on the OfficeConnection.  Fixed 
as 
<http://cgit.freedesktop.org/libreoffice/core/commit/?id=c09b966f94f5a50fe537916398451339f008947d>.

-Stephan


More information about the LibreOffice mailing list