--headless broken with --enable-headless
Stephan Bergmann
sbergman at redhat.com
Wed Dec 12 04:12:07 PST 2012
On 10/01/2012 07:40 PM, Riccardo Magliocchetti wrote:
> Il 01/10/2012 18:11, Michael Meeks ha scritto:
>> On Fri, 2012-09-28 at 20:26 +0200, Riccardo Magliocchetti wrote:
>>>> Here you go, you'll find the valgrind output sans all the zlib / python
>>>> related errors. Thanks a lot!
>>>
>>> update to git 7a3a8a254363801bf6893b23d67f86a7461f8f3b and more or less
>>> same valgrind errors. See attached trace.
>>
>> It looks really interesting :-) it's hard to tell (from valgrind)
>> where
>> the threads were pre-empted [ actually logging that might be quite a
>> nice feature to add to for valgrind ;-], but there is already the
>> SolarMutex guard to try to stop this from happening.
>>
>> I append the odd trace with some thoughts.
>>
>> It is -possible- that we never get around to taking the solar
>> mutex in
>> the first instance in 'main' - such that by the time we come to release
>> and re-take it in vcl/headless/svpinst.cxx:
>>
>> void SvpSalInstance::Yield( bool bWait, bool bHandleAllCurrentEvents )
>>
>> the count is zero - could you add some fprintf debugging to check
>> that
>> - just dump nAcquireCount. If that is the case this patch might help:
>>
>> --- a/vcl/headless/svpinst.cxx
>> +++ b/vcl/headless/svpinst.cxx
>> @@ -171,6 +171,9 @@ bool SvpSalInstance::CheckTimeout( bool
>> bExecuteTimers )
>> // timed out, update timeout
>> m_aTimeout = aTimeOfDay;
>> m_aTimeout += m_nTimeoutMS;
>> +
>> + osl::SolarGuard aGuard( mpSalYieldMutex );
>> +
>> // notify
>> ImplSVData* pSVData = ImplGetSVData();
>> if( pSVData->mpSalTimer )
>>
>> If it does - it's not the real fix ;-) but it'd be good to know.
>>
>> Thanks for posting the trace - rather interesting :-)
>
> nAcquireCount is always 0 and the patch above let me run two unoconv
> testsuite runs in a row without valgrind logs.
It indeed looks like Application::Yield shall be routinely called with
the solar mutex unlocked, and that timer callbacks called from within
Yield shall be called with the solar mutex locked, so e.g.
sal_gtk_timeout_dispatch (vcl/unx/gtk/app/gtkdata.cxx; when using the
GTK VCL plug-in) explicitly locks the solar mutex before calling the
timer callback in a stack like
> #7 0x00007ffff32e940e in Timer::ImplTimerCallbackProc () at lo/core/vcl/source/app/timer.cxx:133
> #8 0x00007fffe64e370b in SalTimer::CallCallback (this=0x11001d0) at lo/core/vcl/inc/saltimer.hxx:57
> #9 0x00007fffe64e2d54 in sal_gtk_timeout_dispatch (pSource=0x12be2a0) at lo/core/vcl/unx/gtk/app/gtkdata.cxx:844
> #10 0x0000003d8d647825 in g_main_dispatch (context=0x6597b0) at gmain.c:2539
> #11 g_main_context_dispatch (context=context at entry=0x6597b0) at gmain.c:3075
> #12 0x0000003d8d647b58 in g_main_context_iterate (context=context at entry=0x6597b0, block=block at entry=0, dispatch=dispatch at entry=1, self=<optimized out>) at gmain.c:3146
> #13 0x0000003d8d647c14 in g_main_context_iteration (context=0x6597b0, may_block=0) at gmain.c:3207
> #14 0x00007fffe64e207a in GtkData::Yield (this=0x6113a0, bWait=true, bHandleAllCurrentEvents=false) at lo/core/vcl/unx/gtk/app/gtkdata.cxx:582
> #15 0x00007fffe64e5c38 in GtkInstance::Yield (this=0x611830, bWait=true, bHandleAllCurrentEvents=false) at lo/core/vcl/unx/gtk/app/gtkinst.cxx:563
> #16 0x00007ffff32debcc in ImplYield (i_bWait=true, i_bAllEvents=false) at lo/core/vcl/source/app/svapp.cxx:425
> #17 0x00007ffff32dafb7 in Application::Yield (i_bAllEvents=false) at lo/core/vcl/source/app/svapp.cxx:459
> #18 0x00007ffff32daf58 in Application::Execute () at lo/core/vcl/source/app/svapp.cxx:404
> #19 0x00007ffff7a8cda4 in desktop::Desktop::Main (this=0x7fffffffe430) at lo/core/desktop/source/app/app.cxx:1661
> #20 0x00007ffff32e644b in ImplSVMain () at lo/core/vcl/source/app/svmain.cxx:162
> #21 0x00007ffff32e6578 in SVMain () at lo/core/vcl/source/app/svmain.cxx:199
> #22 0x00007ffff7acddc7 in soffice_main () at lo/core/desktop/source/app/sofficemain.cxx:74
> #23 0x0000000000400948 in sal_main () at lo/core/desktop/source/app/main.c:48
> #24 0x0000000000400929 in main (argc=1, argv=0x7fffffffe618) at lo/core/desktop/source/app/main.c:47
That means that
<http://cgit.freedesktop.org/libreoffice/core/commit/?id=002f8b272bc343477c649c648c5ceb2ccf0fba89>
"vcl/headless: [HACK] workaround solarMutexx splat with headless mode"
looks like it is the right fix indeed (rather than a hack).
Now, the interesting thing is that the Linux-specific CreateSalInstance
(called from ImplSVMain -> InitVCL before calling the application's Main
in ImplSVMain) explicitly locks the solar mutex thanks to
<http://cgit.freedesktop.org/libreoffice/core/commit/?id=852574f46f686a936a1b267e5780ca17d0f0d5ab>
"INTEGRATION: CWS ause0c2 (1.3.18); FILE MERGED: 2004/03/12 15:25:43 hjs
1.3.18.1: #115868# anti freeze," while all the other variants of
CreateSalInstance apparently do not (incl. the one in
vcl/headless/headlessinst.cxx that is relevant when configured
--enable-headless, as is the case for this mail thread). That means
that all of desktop::Desktop::Main -> Application::Execute ->
Application::Yield runs with the solar mutex locked on Linux (sans
--enable-headless configuration) and with the solar mutex unlocked
everywhere else.
The commit message for 852574f46f686a936a1b267e5780ca17d0f0d5ab is
unfortunately rather unhelpful, but temporarily reverting it and doing
"make check" on a Linux box starts to produce crashes within
desktop::Desktop::Main -> Application::Execute -> Application::Yield ->
..., in code that apparently expects the solar mutex to be locked.
Stephan
>> ==28574== Thread 1:
>> ==28574== Invalid read of size 8
>> ==28574== at 0x739C702: SfxStateCache::SetState_Impl(unsigned
>> short, SfxPoolItem const*, unsigned char) (statcach.cxx:454)
>> ==28574== by 0x738042E:
>> SfxBindings::UpdateControllers_Impl(SfxInterface const*,
>> SfxFoundCache_Impl const*, SfxPoolItem const*, unsigned short)
>> (bindings.cxx:1500)
>> ==28574== by 0x7383C5B: SfxBindings::Update_Impl(SfxStateCache*)
>> (bindings.cxx:437)
>> ==28574== by 0x7383FD7: SfxBindings::NextJob_Impl(Timer*)
>> (bindings.cxx:1646)
>> ==28574== by 0x92BF4AD: Timer::ImplTimerCallbackProc() (timer.cxx:142)
>>
>> The YieldMutex -should- be held at this point excluding this other
>> thread (?).
>>
>> ==28574== by 0x95D152D: SvpSalInstance::CheckTimeout(bool)
>> (saltimer.hxx:57)
>> ==28574== by 0x95D17D7: SvpSalInstance::Yield(bool, bool)
>> (svpinst.cxx:288)
>> ==28574== by 0x92B4530: Application::Yield(bool) (svapp.cxx:434)
>> ==28574== by 0x92B45E6: Application::Execute() (svapp.cxx:413)
>> ==28574== by 0x50AED58: desktop::Desktop::Main() (app.cxx:1711)
>> ==28574== by 0x92BDFB8: ImplSVMain() (svmain.cxx:173)
>> ==28574== by 0x92BE8E4: SVMain() (svmain.cxx:210)
>> ==28574== by 0x50E1CF4: soffice_main (sofficemain.cxx:83)
>> ==28574== by 0x40069A: main (main.c:25)
>>
>> ==28574== Address 0x1466bdc8 is 56 bytes inside a block of size 72
>> free'd
>> ==28574== at 0x4C27FF2: operator delete(void*)
>> (vg_replace_malloc.c:387)
>> ==28574== by 0x737F479: SfxBindings::DeleteControllers_Impl()
>> (bindings.cxx:325)
>> ==28574== by 0x7380572: SfxBindings::~SfxBindings() (bindings.cxx:264)
>> ==28574== by 0x7380718: SfxBindings::~SfxBindings() (bindings.cxx:275)
>> ==28574== by 0x758B91E: SfxFrame::DoClose_Impl() (frame.cxx:178)
>> ==28574== by 0x75AAFC1: SfxBaseController::dispose()
>> (sfxbasecontroller.cxx:1041)
>>
>> sfx2/source/view/sfxbasecontroller.cxx:
>>
>> This method has a live:
>>
>> SolarMutexGuard aGuard;
>>
>> ==28574== by 0x16B8E785:
>> framework::Frame::setComponent(com::sun::star::uno::Reference<com::sun::star::awt::XWindow>
>> const&,
>> com::sun::star::uno::Reference<com::sun::star::frame::XController>
>> const&) (frame.cxx:1380)
>> ==28574== by 0x16B8E1CF: framework::Frame::close(unsigned char)
>> (frame.cxx:1633)
>> ==28574== by 0x758BE87: SfxFrame::DoClose() (frame.cxx:140)
>> ==28574== by 0x797F51C: SfxBroadcaster::Broadcast(SfxHint const&)
>> (brdcst.cxx:49)
>> ==28574== by 0x7511211:
>> SfxModelListener_Impl::notifyClosing(com::sun::star::lang::EventObject
>> const&) (objxtor.cxx:173)
>> ==28574== by 0x7534972: SfxBaseModel::close(unsigned char)
>> (sfxbasemodel.cxx:1491)
>> ==28574== by 0x1E776909: SwXTextDocument::close(unsigned char)
>> (unotxdoc.cxx:578)
>> ==28574== by 0x7537FB0: SfxBaseModel::dispose() (sfxbasemodel.cxx:788)
>> ==28574== by 0x1211D6E7: gcc3::callVirtualMethod(void*, unsigned
>> int, void*, _typelib_TypeDescriptionReference*, bool, unsigned long*,
>> unsigned int, unsigned long*, unsigned int, double*, unsigned int)
>> (callvirtualmethod.cxx:128)
>> ==28574== by 0x12120FEE:
>> cpp_call(bridges::cpp_uno::shared::UnoInterfaceProxy*,
>> bridges::cpp_uno::shared::VtableSlot,
>> _typelib_TypeDescriptionReference*, int, _typelib_MethodParameter*,
>> void*, void**, _uno_Any**) (uno2cpp.cxx:246)
>> ==28574== by 0x12121ACD:
>> bridges::cpp_uno::shared::unoInterfaceProxyDispatch(_uno_Interface*,
>> _typelib_TypeDescription const*, void*, void**, _uno_Any**)
>> (uno2cpp.cxx:440)
>> ==28574== by 0x15D5CE5E:
>> binaryurp::IncomingRequest::execute_throw(binaryurp::BinaryAny*,
>> std::vector<binaryurp::BinaryAny,
>> std::allocator<binaryurp::BinaryAny> >*) const (incomingrequest.cxx:251)
>> ==28574== by 0x15D5D8AB: binaryurp::IncomingRequest::execute()
>> const (incomingrequest.cxx:80)
>> ==28574== by 0x15D61B4C: request (reader.cxx:89)
>> ==28574== by 0x663A506: cppu_threadpool::JobQueue::enter(long,
>> unsigned char) (jobqueue.cxx:115)
>> ==28574== by 0x663B00D: cppu_threadpool::ORequestThread::run()
>> (thread.cxx:179)
>> ==28574== by 0x663B4E9: threadFunc (thread.hxx:196)
>> ==28574== by 0x4E467AE: osl_thread_start_Impl (thread.c:261)
>> ==28574== by 0x5E53EFB: start_thread (pthread_create.c:304)
>> ==28574== by 0x5B8E89C: clone (clone.S:112)
More information about the LibreOffice
mailing list