[Libreoffice] Helgrind results for LO

Thu Mar 31 12:55:02 PDT 2011

On Thu, 2011-03-31 at 20:58 +0200, Julian Seward wrote:
> On Thursday, March 31, 2011, Caolán McNamara wrote:
> > The first one at least seems to be the common enough pattern we have
> > where we grab our global mutex when initializing singletons on first
> > use/creation
> > 
> > so we have loads of warnings along the lines of "the last time you
> > accessed that singleton you took a mutex, but this time you didn't!"
> 
> /me slightly confused: IIUC you're referring to the fact that accesses
> to aFoo.uninit aren't consistently protected by a lock.  But it's not
> complaining about that -- it's complaining about a bunch of lock
> acquisition ordering inconsistencies.

Yeah, but lets expand out the first one on the list a bit...

==5655== Thread #1: lock order "0xDFB4E40 before 0x5089BA0" violated

==5655== Observed (incorrect) order is: acquisition of lock at 0x5089BA0

where 0x5089BA0 is our GlobalMutex which is used as the inside lock of
the double-locked singletons.

CommandLineArgs* Desktop::GetCommandLineArgs()
{
    static CommandLineArgs* pArgs = 0;
    if ( !pArgs )
    {
        ::osl::MutexGuard aGuard( ::osl::Mutex::getGlobalMutex() );
        if ( !pArgs )
            pArgs = new CommandLineArgs;
    }

    return pArgs;
}

==5655==  followed by a later acquisition of lock at 0xDFB4E40

where 0xDFB4E40 is a class member mutex of OServiceManager

OServiceManager::queryServiceFactories(...
{
    Sequence< Reference< XInterface > > ret;

    MutexGuard aGuard( m_mutex );

==5655== Required order was established by acquisition of lock at
0xDFB4E40

void OServiceManager::insert(...
{
...

    MutexGuard aGuard( m_mutex );

==5655==  followed by a later acquisition of lock at 0x5089BA0

cppu_detail_getUnoType(::cppu::UnoSequenceType< T > const *)
{
...
    static typelib_TypeDescriptionReference * p = 0;
    if (p == 0)
    {
	.. bunch of stuff that expands eventually to another
        double-lock and 
        ::osl::MutexGuard aGuard( ::osl::Mutex::getGlobalMutex() );
	p = something or other
    }
    return ::cppu::detail::getTypeFromTypeDescriptionReference(&p);
}

So the lock its complaining about is the one used in the double-lock so
those double-locked singletons are going to trigger loads of these
because we get basically arbitrary acquisitions of the globalmutex as
singletons get sparked into life :-(

> (In parentheses, the above fragment is the double-checked locking idiom,
yeah,

> > I guess we might need to sprinkle that
> > VALGRIND_HELGRIND_DISABLE_CHECKING(&pInstance, sizeof pInstance);
> 
> That stops it complaining about races, but not about lock order problems.

Ah, hmm, got the wrong end of the stick on that hack then. Plan B might
be to move our double-locked singletons to all use the rtl::Static
pattern and rework that with an ifdef HELGRIND or something to not
actually double-lock manually using a globalmutex but to instead just
use normal local statics (relying on gcc default -fthreadsafe-statics to
do the right thing, and maybe that is sufficient these days anyway and
we can micro-optimize the gcc case to do away with double-locking there)
which would remove those warnings from helgrind runs.

C.