vcl::Window::dispose deadlock

Michael Stahl mstahl at redhat.com
Fri Jun 26 14:22:06 PDT 2015


On 09.06.2015 11:36, Michael Stahl wrote:
> On 09.06.2015 10:02, Noel Grandin wrote:
>> On 2015-06-09 09:58 AM, Stephan Bergmann wrote:
>>> On Windows, with master as of last night, "make check" happened to run into a deadlock in soffice.bin as below.  The
>>> main thread is trying to re-acquire the SolarMutex (after a SolarMutexReleaser) from within the event loop, while an
>>> incoming URP thread (apparently holding the SolarMutex) does vcl::Window::dispose which, on Windows, blocks on sending a
>>> message into the event loop.
> 
> this started happening for me in sc_unoapi on a --enable-dbgutil build
> almost always since half a year or so; before that it was less likely
> for whatever reason but still did happen.  (the deadlock moved from
> ~Window to Window::dispose due to VclPtr refactoring.)
> 
> the deadlock does not appear easily fixable; the last slide of my
> threading talk last year was about it actually.

now something i read in the MSDN documentation of GetMessage gave me an
idea, and it turns out that...

i think i understand one quite surprising aspect of the Win32 SAL
implementation:
the SolarMutex is not *really* a mutex, or rather, it is not implemented
purely as a mutex on this platform.

non-main threads block on a osl::Mutex mpSalYieldMutex to acquire the
SolarMutex, but the main thread does a very peculiar dance in
ImplSalYieldMutexAcquireWithWait():

>     do
>     {
>         if ( pInst->mpSalYieldMutex->tryToAcquire() )
>             bAcquire = TRUE;
>         else
>         {
>             pInst->mpSalWaitMutex->acquire();
>             if ( pInst->mpSalYieldMutex->tryToAcquire() )
>             {
>                 bAcquire = TRUE;
>                 pInst->mpSalWaitMutex->release();
>             }
>             else
>             {
>                 pInst->mnYieldWaitCount++;
>                 pInst->mpSalWaitMutex->release();
>                 MSG aTmpMsg;
>                 GetMessageW( &aTmpMsg, pInst->mhComWnd, SAL_MSG_RELEASEWAITYIELD, SAL_MSG_RELEASEWAITYIELD );
>                 pInst->mnYieldWaitCount--;
>                 if ( pInst->mnYieldWaitCount )
>                     PostMessageW( pInst->mhComWnd, SAL_MSG_RELEASEWAITYIELD, 0, 0 );
>             }
>         }
>     }
>     while ( !bAcquire );

if the mpSalYieldMutex is locked by another thread, the main thread does
not block on the mpSalYieldMutex, but instead blocks in Win32 GetMessage().

the reason why it does that, is that GetMessage() will not only get you
the message with the type you pass in as parameter; in fact the first
thing GetMessage() does is to look if another thread has a pending
SendMessage() to a window on the current thread, and if so immediately
dispatch that message.

this means that the main thread can in fact handle the SendMessage()
calls from other threads to create/destroy Windows and DCs without
deadlocking, if it waits in ImplSalYieldMutexAcquireWithWait().

to prevent deadlocks on Windows, as soon as other threads are running,
the main thread *must* acquire SolarMutex only via
ImplSalYieldMutexAcquireWithWait().

unfortunately the Application::AcquireSolarMutex() does not do this but
blocks on locking the mpSalYieldMutex.

this is used by SolarMutexReleaser, and thus SolarMutexReleaser causes
deadlocks on Windows (whereas all the entry points from Win32 WndProcs
already do the right thing).

my hope is that the main thread only first-acquires the SolarMutex via
VCL internal code or via SolarMutexReleaser, and not with the ordinary
SolarMutexGuard that is used in UNO entry points.

commit 482c52e91fe41a52e68827e9bf64a9736427d517 should fix this deadlock
scenario on master, and thus there are no known unfixable deadlock
issues that would prevent running "make check" on Windows tinderboxes.





More information about the LibreOffice mailing list