--headless broken with --enable-headless

Stephan Bergmann sbergman at redhat.com
Wed Dec 12 04:12:07 PST 2012


On 10/01/2012 07:40 PM, Riccardo Magliocchetti wrote:
> Il 01/10/2012 18:11, Michael Meeks ha scritto:
>> On Fri, 2012-09-28 at 20:26 +0200, Riccardo Magliocchetti wrote:
>>>> Here you go, you'll find the valgrind output sans all the zlib / python
>>>> related errors. Thanks a lot!
>>>
>>> update to git 7a3a8a254363801bf6893b23d67f86a7461f8f3b and more or less
>>> same valgrind errors. See attached trace.
>>
>>     It looks really interesting :-) it's hard to tell (from valgrind)
>> where
>> the threads were pre-empted [ actually logging that might be quite a
>> nice feature to add to for valgrind ;-], but there is already the
>> SolarMutex guard to try to stop this from happening.
>>
>>     I append the odd trace with some thoughts.
>>
>>     It is -possible- that we never get around to taking the solar
>> mutex in
>> the first instance in 'main' - such that by the time we come to release
>> and re-take it in vcl/headless/svpinst.cxx:
>>
>> void SvpSalInstance::Yield( bool bWait, bool bHandleAllCurrentEvents )
>>
>>     the count is zero - could you add some fprintf debugging to check
>> that
>> - just dump nAcquireCount. If that is the case this patch might help:
>>
>> --- a/vcl/headless/svpinst.cxx
>> +++ b/vcl/headless/svpinst.cxx
>> @@ -171,6 +171,9 @@ bool SvpSalInstance::CheckTimeout( bool
>> bExecuteTimers )
>>                   // timed out, update timeout
>>                   m_aTimeout = aTimeOfDay;
>>                   m_aTimeout += m_nTimeoutMS;
>> +
>> +                osl::SolarGuard aGuard( mpSalYieldMutex );
>> +
>>                   // notify
>>                   ImplSVData* pSVData = ImplGetSVData();
>>                   if( pSVData->mpSalTimer )
>>
>>     If it does - it's not the real fix ;-) but it'd be good to know.
>>
>>     Thanks for posting the trace - rather interesting :-)
>
> nAcquireCount is always 0 and the patch above let me run two unoconv
> testsuite runs in a row without valgrind logs.

It indeed looks like Application::Yield shall be routinely called with 
the solar mutex unlocked, and that timer callbacks called from within 
Yield shall be called with the solar mutex locked, so e.g. 
sal_gtk_timeout_dispatch (vcl/unx/gtk/app/gtkdata.cxx; when using the 
GTK VCL plug-in) explicitly locks the solar mutex  before calling the 
timer callback in a stack like

> #7  0x00007ffff32e940e in Timer::ImplTimerCallbackProc () at lo/core/vcl/source/app/timer.cxx:133
> #8  0x00007fffe64e370b in SalTimer::CallCallback (this=0x11001d0) at lo/core/vcl/inc/saltimer.hxx:57
> #9  0x00007fffe64e2d54 in sal_gtk_timeout_dispatch (pSource=0x12be2a0) at lo/core/vcl/unx/gtk/app/gtkdata.cxx:844
> #10 0x0000003d8d647825 in g_main_dispatch (context=0x6597b0) at gmain.c:2539
> #11 g_main_context_dispatch (context=context at entry=0x6597b0) at gmain.c:3075
> #12 0x0000003d8d647b58 in g_main_context_iterate (context=context at entry=0x6597b0, block=block at entry=0, dispatch=dispatch at entry=1, self=<optimized out>) at gmain.c:3146
> #13 0x0000003d8d647c14 in g_main_context_iteration (context=0x6597b0, may_block=0) at gmain.c:3207
> #14 0x00007fffe64e207a in GtkData::Yield (this=0x6113a0, bWait=true, bHandleAllCurrentEvents=false) at lo/core/vcl/unx/gtk/app/gtkdata.cxx:582
> #15 0x00007fffe64e5c38 in GtkInstance::Yield (this=0x611830, bWait=true, bHandleAllCurrentEvents=false) at lo/core/vcl/unx/gtk/app/gtkinst.cxx:563
> #16 0x00007ffff32debcc in ImplYield (i_bWait=true, i_bAllEvents=false) at lo/core/vcl/source/app/svapp.cxx:425
> #17 0x00007ffff32dafb7 in Application::Yield (i_bAllEvents=false) at lo/core/vcl/source/app/svapp.cxx:459
> #18 0x00007ffff32daf58 in Application::Execute () at lo/core/vcl/source/app/svapp.cxx:404
> #19 0x00007ffff7a8cda4 in desktop::Desktop::Main (this=0x7fffffffe430) at lo/core/desktop/source/app/app.cxx:1661
> #20 0x00007ffff32e644b in ImplSVMain () at lo/core/vcl/source/app/svmain.cxx:162
> #21 0x00007ffff32e6578 in SVMain () at lo/core/vcl/source/app/svmain.cxx:199
> #22 0x00007ffff7acddc7 in soffice_main () at lo/core/desktop/source/app/sofficemain.cxx:74
> #23 0x0000000000400948 in sal_main () at lo/core/desktop/source/app/main.c:48
> #24 0x0000000000400929 in main (argc=1, argv=0x7fffffffe618) at lo/core/desktop/source/app/main.c:47

That means that 
<http://cgit.freedesktop.org/libreoffice/core/commit/?id=002f8b272bc343477c649c648c5ceb2ccf0fba89> 
"vcl/headless: [HACK] workaround solarMutexx splat with headless mode" 
looks like it is the right fix indeed (rather than a hack).

Now, the interesting thing is that the Linux-specific CreateSalInstance 
(called from ImplSVMain -> InitVCL before calling the application's Main 
in ImplSVMain) explicitly locks the solar mutex thanks to 
<http://cgit.freedesktop.org/libreoffice/core/commit/?id=852574f46f686a936a1b267e5780ca17d0f0d5ab> 
"INTEGRATION: CWS ause0c2 (1.3.18); FILE MERGED: 2004/03/12 15:25:43 hjs 
1.3.18.1: #115868# anti freeze," while all the other variants of 
CreateSalInstance apparently do not (incl. the one in 
vcl/headless/headlessinst.cxx that is relevant when configured 
--enable-headless, as is the case for this mail thread).  That means 
that all of desktop::Desktop::Main -> Application::Execute -> 
Application::Yield runs with the solar mutex locked on Linux (sans 
--enable-headless configuration) and with the solar mutex unlocked 
everywhere else.

The commit message for 852574f46f686a936a1b267e5780ca17d0f0d5ab is 
unfortunately rather unhelpful, but temporarily reverting it and doing 
"make check" on a Linux box starts to produce crashes within 
desktop::Desktop::Main -> Application::Execute -> Application::Yield -> 
..., in code that apparently expects the solar mutex to be locked.

Stephan

>> ==28574== Thread 1:
>> ==28574== Invalid read of size 8
>> ==28574==    at 0x739C702: SfxStateCache::SetState_Impl(unsigned
>> short, SfxPoolItem const*, unsigned char) (statcach.cxx:454)
>> ==28574==    by 0x738042E:
>> SfxBindings::UpdateControllers_Impl(SfxInterface const*,
>> SfxFoundCache_Impl const*, SfxPoolItem const*, unsigned short)
>> (bindings.cxx:1500)
>> ==28574==    by 0x7383C5B: SfxBindings::Update_Impl(SfxStateCache*)
>> (bindings.cxx:437)
>> ==28574==    by 0x7383FD7: SfxBindings::NextJob_Impl(Timer*)
>> (bindings.cxx:1646)
>> ==28574==    by 0x92BF4AD: Timer::ImplTimerCallbackProc() (timer.cxx:142)
>>
>>     The YieldMutex -should- be held at this point excluding this other
>> thread (?).
>>
>> ==28574==    by 0x95D152D: SvpSalInstance::CheckTimeout(bool)
>> (saltimer.hxx:57)
>> ==28574==    by 0x95D17D7: SvpSalInstance::Yield(bool, bool)
>> (svpinst.cxx:288)
>> ==28574==    by 0x92B4530: Application::Yield(bool) (svapp.cxx:434)
>> ==28574==    by 0x92B45E6: Application::Execute() (svapp.cxx:413)
>> ==28574==    by 0x50AED58: desktop::Desktop::Main() (app.cxx:1711)
>> ==28574==    by 0x92BDFB8: ImplSVMain() (svmain.cxx:173)
>> ==28574==    by 0x92BE8E4: SVMain() (svmain.cxx:210)
>> ==28574==    by 0x50E1CF4: soffice_main (sofficemain.cxx:83)
>> ==28574==    by 0x40069A: main (main.c:25)
>>
>> ==28574==  Address 0x1466bdc8 is 56 bytes inside a block of size 72
>> free'd
>> ==28574==    at 0x4C27FF2: operator delete(void*)
>> (vg_replace_malloc.c:387)
>> ==28574==    by 0x737F479: SfxBindings::DeleteControllers_Impl()
>> (bindings.cxx:325)
>> ==28574==    by 0x7380572: SfxBindings::~SfxBindings() (bindings.cxx:264)
>> ==28574==    by 0x7380718: SfxBindings::~SfxBindings() (bindings.cxx:275)
>> ==28574==    by 0x758B91E: SfxFrame::DoClose_Impl() (frame.cxx:178)
>> ==28574==    by 0x75AAFC1: SfxBaseController::dispose()
>> (sfxbasecontroller.cxx:1041)
>>
>> sfx2/source/view/sfxbasecontroller.cxx:
>>
>>     This method has a live:
>>
>>             SolarMutexGuard aGuard;
>>
>> ==28574==    by 0x16B8E785:
>> framework::Frame::setComponent(com::sun::star::uno::Reference<com::sun::star::awt::XWindow>
>> const&,
>> com::sun::star::uno::Reference<com::sun::star::frame::XController>
>> const&) (frame.cxx:1380)
>> ==28574==    by 0x16B8E1CF: framework::Frame::close(unsigned char)
>> (frame.cxx:1633)
>> ==28574==    by 0x758BE87: SfxFrame::DoClose() (frame.cxx:140)
>> ==28574==    by 0x797F51C: SfxBroadcaster::Broadcast(SfxHint const&)
>> (brdcst.cxx:49)
>> ==28574==    by 0x7511211:
>> SfxModelListener_Impl::notifyClosing(com::sun::star::lang::EventObject
>> const&) (objxtor.cxx:173)
>> ==28574==    by 0x7534972: SfxBaseModel::close(unsigned char)
>> (sfxbasemodel.cxx:1491)
>> ==28574==    by 0x1E776909: SwXTextDocument::close(unsigned char)
>> (unotxdoc.cxx:578)
>> ==28574==    by 0x7537FB0: SfxBaseModel::dispose() (sfxbasemodel.cxx:788)
>> ==28574==    by 0x1211D6E7: gcc3::callVirtualMethod(void*, unsigned
>> int, void*, _typelib_TypeDescriptionReference*, bool, unsigned long*,
>> unsigned int, unsigned long*, unsigned int, double*, unsigned int)
>> (callvirtualmethod.cxx:128)
>> ==28574==    by 0x12120FEE:
>> cpp_call(bridges::cpp_uno::shared::UnoInterfaceProxy*,
>> bridges::cpp_uno::shared::VtableSlot,
>> _typelib_TypeDescriptionReference*, int, _typelib_MethodParameter*,
>> void*, void**, _uno_Any**) (uno2cpp.cxx:246)
>> ==28574==    by 0x12121ACD:
>> bridges::cpp_uno::shared::unoInterfaceProxyDispatch(_uno_Interface*,
>> _typelib_TypeDescription const*, void*, void**, _uno_Any**)
>> (uno2cpp.cxx:440)
>> ==28574==    by 0x15D5CE5E:
>> binaryurp::IncomingRequest::execute_throw(binaryurp::BinaryAny*,
>> std::vector<binaryurp::BinaryAny,
>> std::allocator<binaryurp::BinaryAny>  >*) const (incomingrequest.cxx:251)
>> ==28574==    by 0x15D5D8AB: binaryurp::IncomingRequest::execute()
>> const (incomingrequest.cxx:80)
>> ==28574==    by 0x15D61B4C: request (reader.cxx:89)
>> ==28574==    by 0x663A506: cppu_threadpool::JobQueue::enter(long,
>> unsigned char) (jobqueue.cxx:115)
>> ==28574==    by 0x663B00D: cppu_threadpool::ORequestThread::run()
>> (thread.cxx:179)
>> ==28574==    by 0x663B4E9: threadFunc (thread.hxx:196)
>> ==28574==    by 0x4E467AE: osl_thread_start_Impl (thread.c:261)
>> ==28574==    by 0x5E53EFB: start_thread (pthread_create.c:304)
>> ==28574==    by 0x5B8E89C: clone (clone.S:112)


More information about the LibreOffice mailing list