help needed: hsqldb shutdown race condition, how to fix without deadlock
Michael Stahl
mstahl at redhat.com
Wed Jun 3 06:01:39 PDT 2015
On 03.06.2015 14:47, Lionel Elie Mamane wrote:
> On Wed, Jun 03, 2015 at 02:24:25PM +0200, Lionel Elie Mamane wrote:
>> On Wed, Jun 03, 2015 at 02:19:04PM +0200, Lionel Elie Mamane wrote:
>
>>> But let's take it from the other side...
>
>>> connectivity::hsqldb::ODriverDelegator::preCommit does:
>
>>> 654 Reference<XConnection> xConnection(i->first,UNO_QUERY);
>>> 655 if ( xConnection.is() )
>>> 656 {
>>> 657 Reference< XStatement> xStmt = xConnection->createStatement();
>
>>> Now, this goes through a whole rigmarole:
>
>>> #6 0x00002abc1b6e90d3 in AffineBridge::v_callInto_v (this=0x2d29b50, pCallee=0x2abbf94eedbe <s_pull(va_list*)>,
>>> pParam=0x2abc0fd9bdb0)
>>> at /home/master/src/libreoffice/workdirs/libreoffice-5-1/cppu/source/AffineBridge/AffineBridge.cxx:250
>> (...)
>>> #18 0x00002abc1b8f0ccc in Proxy::dispatch (this=this at entry=0x3577660,
>>> pReturnTypeRef=pReturnTypeRef at entry=0x3023920, pParams=pParams at entry=0x0, nParams=nParams at entry=0,
>>> pMemberType=pMemberType at entry=0x3436a40, pReturn=pReturn at entry=0x2abc0fd9c2f0, pArgs=0x2abc0fd9c2e0,
>>> ppException=0x2abc0fd9c390)
>>> at /home/master/src/libreoffice/workdirs/libreoffice-5-1/cppu/source/helper/purpenv/helper_purpenv_Proxy.cxx:445
>> (...)
>>> #21 0x00002abc0f3f2a6a in cpp_vtable_call (nFunctionIndex=<optimized out>, nVtableOffset=0,
>>> gpreg=0x2abc0fd9c710, fpreg=0x2abc0fd9c740, ovrflw=0x2abc0fd9c790, pRegisterReturn=0x2abc0fd9c6f0)
>>> at /home/master/src/libreoffice/workdirs/libreoffice-5-1/bridges/source/cpp_uno/gcc3_linux_x86-64/cpp2uno.cxx:377
>>> #22 0x00002abc0f409c12 in privateSnippetExecutor ()
>>> from /home/master/src/libreoffice/workdirs/libreoffice-5-1/instdir/program/libgcc3_uno.so
>>> #23 0x00002abc1b1a9fd8 in connectivity::hsqldb::ODriverDelegator::preCommit (this=0x2d24ef0, aEvent=...)
>>> at /home/master/src/libreoffice/workdirs/libreoffice-5-1/connectivity/source/drivers/hsqldb/HDriver.cxx:657
>>> #24 0x00002abc18bb7081 in OStorage::BroadcastTransaction (this=this at entry=0x3578030,
>>
>>> The problem is essentially the serializing that happens at
>>> AffineBridge::v_callInto_v.
>>
>>> Now, I guess that the whole privateSnippetExecutor / cpp_vtable_call /
>>> cpp2uno_call / s_Proxy_dispatch / ... / AffineBridge::v_callInto_v for
>>> a C++-to-C++ call is somehow linked to the fact that the calling code
>>> does not know the object that implements the XConnection
>>> interface. Can I avoid it in some way? For example, I *know* that
>>> m_xConnection is a connectivity::java_sql_connection.
>>
>>> If I use that fact, can I avoid going through AffineBridge::v_callInto_v ?
>>
>> I tried to do that in the attached patch, but as I was explained on
>> IRC, because the jdbc driver is in an affine component, that does not
>> work.
>
> Since the problem is essentially that the two threads take the same mutexes
> in different order, here is a dirty hack that forces the hsqldb thread
> to take the "affine bridge" mutex before taking the "HSQL driver"
> mutex.
i'm not entirely sure this is sensible...
a component is (in general) responsible for its own thread safety, and
should not make assumptions about how other components are implemented,
in particular wrt. locking. this implies that if a component takes a
lock, it must release the lock before calling a method that could end up
calling into a different component, to avoid deadlock.
secondly, those components which run inside the ":affine" purpose
environment can only be called from a single thread, so they do not need
a mutex of their own. there may be multiple re-entrant calls into the
apartment, where the affine component calls out into a different
component (which is bridged to a thread that is different from both the
original calling thread and the affine thread), which calls back agian
into the affine component, so a non-recursive mutex could deadlock but a
recursive mutex can not.
> It is somewhat of a pity, but <shrug>. OTOH the fact that the jdbc
> library is thread-affine is itself an ugly "performance" hack :-|
> Here's the comment
>
> <!-- Recent Java 6 VMs make calls to JNI Attach/DetachCurrentThread (which this
> code does extensively) very expensive. A follow-up JVM fix reduced the
> overhead significantly again for all threads but the main thread. So a
> quick hack to improve performance of this component again is to confine it
> in the affine apartment (where all code will run on a single, dedicated
> thread that is guaranteed no to be the main thread). However, a better fix
> would still be to redesign the code so that it does not call
> Attach/DetachCurrentThread so frequently:
> -->
>
> The other solution would be that jdbc not be thread-affine. About
> that, could we, instead of confining it to one thread, just do
> something like that at each call to Attach/DetachCurrentThread:
>
> if(this_is_main_thread)
> {
> create thread, do stuff there
> }
> else
> {
> do stuff here
> }
>
> or maybe it would not be more expensive than the affine bridge to do
> "create thread, do stuff there" unconditionally?
that sounds like some custom overly complex work-around...
the affine bridge is at least a *generic* overly complex work-around :)
> Another, maybe ugly hack, idea: when the jdbc driver is loaded (or
> just always...), the event loop in the main thread forks itself into a
> new thread so that ... nothing at all is executed in the main thread?
> Too disruptive to the whole application just for the sake of the jdbc
> driver?
"on-demand" not possible on Windows - but "always" might be; i'm not
sure what "magic" properties the "main" thread has.
More information about the LibreOffice
mailing list