help needed: hsqldb shutdown race condition, how to fix without deadlock

Wed Jun 3 06:01:39 PDT 2015

On 03.06.2015 14:47, Lionel Elie Mamane wrote:
> On Wed, Jun 03, 2015 at 02:24:25PM +0200, Lionel Elie Mamane wrote:
>> On Wed, Jun 03, 2015 at 02:19:04PM +0200, Lionel Elie Mamane wrote:
> 
>>> But let's take it from the other side...
> 
>>> connectivity::hsqldb::ODriverDelegator::preCommit does:
> 
>>> 654	Reference<XConnection> xConnection(i->first,UNO_QUERY);
>>> 655	if ( xConnection.is() )
>>> 656	{
>>> 657	      Reference< XStatement> xStmt = xConnection->createStatement();
> 
>>> Now, this goes through a whole rigmarole:
> 
>>> #6  0x00002abc1b6e90d3 in AffineBridge::v_callInto_v (this=0x2d29b50, pCallee=0x2abbf94eedbe <s_pull(va_list*)>, 
>>>     pParam=0x2abc0fd9bdb0)
>>>     at /home/master/src/libreoffice/workdirs/libreoffice-5-1/cppu/source/AffineBridge/AffineBridge.cxx:250
>> (...)
>>> #18 0x00002abc1b8f0ccc in Proxy::dispatch (this=this at entry=0x3577660, 
>>>     pReturnTypeRef=pReturnTypeRef at entry=0x3023920, pParams=pParams at entry=0x0, nParams=nParams at entry=0, 
>>>     pMemberType=pMemberType at entry=0x3436a40, pReturn=pReturn at entry=0x2abc0fd9c2f0, pArgs=0x2abc0fd9c2e0, 
>>>     ppException=0x2abc0fd9c390)
>>>     at /home/master/src/libreoffice/workdirs/libreoffice-5-1/cppu/source/helper/purpenv/helper_purpenv_Proxy.cxx:445
>> (...)
>>> #21 0x00002abc0f3f2a6a in cpp_vtable_call (nFunctionIndex=<optimized out>, nVtableOffset=0, 
>>>     gpreg=0x2abc0fd9c710, fpreg=0x2abc0fd9c740, ovrflw=0x2abc0fd9c790, pRegisterReturn=0x2abc0fd9c6f0)
>>>     at /home/master/src/libreoffice/workdirs/libreoffice-5-1/bridges/source/cpp_uno/gcc3_linux_x86-64/cpp2uno.cxx:377
>>> #22 0x00002abc0f409c12 in privateSnippetExecutor ()
>>>    from /home/master/src/libreoffice/workdirs/libreoffice-5-1/instdir/program/libgcc3_uno.so
>>> #23 0x00002abc1b1a9fd8 in connectivity::hsqldb::ODriverDelegator::preCommit (this=0x2d24ef0, aEvent=...)
>>>     at /home/master/src/libreoffice/workdirs/libreoffice-5-1/connectivity/source/drivers/hsqldb/HDriver.cxx:657
>>> #24 0x00002abc18bb7081 in OStorage::BroadcastTransaction (this=this at entry=0x3578030, 
>>
>>> The problem is essentially the serializing that happens at
>>> AffineBridge::v_callInto_v.
>>
>>> Now, I guess that the whole privateSnippetExecutor / cpp_vtable_call /
>>> cpp2uno_call / s_Proxy_dispatch / ... / AffineBridge::v_callInto_v for
>>> a C++-to-C++ call is somehow linked to the fact that the calling code
>>> does not know the object that implements the XConnection
>>> interface. Can I avoid it in some way? For example, I *know* that
>>> m_xConnection is a connectivity::java_sql_connection.
>>
>>> If I use that fact, can I avoid going through AffineBridge::v_callInto_v ?
>>
>> I tried to do that in the attached patch, but as I was explained on
>> IRC, because the jdbc driver is in an affine component, that does not
>> work.
> 
> Since the problem is essentially that the two threads take the same mutexes
> in different order, here is a dirty hack that forces the hsqldb thread
> to take the "affine bridge" mutex before taking the "HSQL driver"
> mutex.

i'm not entirely sure this is sensible...

a component is (in general) responsible for its own thread safety, and
should not make assumptions about how other components are implemented,
in particular wrt. locking.  this implies that if a component takes a
lock, it must release the lock before calling a method that could end up
calling into a different component, to avoid deadlock.

secondly, those components which run inside the ":affine" purpose
environment can only be called from a single thread, so they do not need
a mutex of their own.  there may be multiple re-entrant calls into the
apartment, where the affine component calls out into a different
component (which is bridged to a thread that is different from both the
original calling thread and the affine thread), which calls back agian
into the affine component, so a non-recursive mutex could deadlock but a
recursive mutex can not.

> It is somewhat of a pity, but <shrug>. OTOH the fact that the jdbc
> library is thread-affine is itself an ugly "performance" hack :-|
> Here's the comment
> 
> <!-- Recent Java 6 VMs make calls to JNI Attach/DetachCurrentThread (which this
>      code does extensively) very expensive.  A follow-up JVM fix reduced the
>      overhead significantly again for all threads but the main thread.  So a
>      quick hack to improve performance of this component again is to confine it
>      in the affine apartment (where all code will run on a single, dedicated
>      thread that is guaranteed no to be the main thread).  However, a better fix
>      would still be to redesign the code so that it does not call
>      Attach/DetachCurrentThread so frequently:
> -->
> 
> The other solution would be that jdbc not be thread-affine. About
> that, could we, instead of confining it to one thread, just do
> something like that at each call to Attach/DetachCurrentThread:
> 
>  if(this_is_main_thread)
>  {
>     create thread, do stuff there
>  }
>  else
>  {
>     do stuff here
>  }
> 
> or maybe it would not be more expensive than the affine bridge to do
> "create thread, do stuff there" unconditionally?

that sounds like some custom overly complex work-around...

the affine bridge is at least a *generic* overly complex work-around :)

> Another, maybe ugly hack, idea: when the jdbc driver is loaded (or
> just always...), the event loop in the main thread forks itself into a
> new thread so that ... nothing at all is executed in the main thread?
> Too disruptive to the whole application just for the sake of the jdbc
> driver?

"on-demand" not possible on Windows - but "always" might be; i'm not
sure what "magic" properties the "main" thread has.