atomic message unref issues on ARM

Thiago Macieira thiago at kde.org
Fri Jul 17 01:21:58 PDT 2009


Jim Harvy wrote:
>(gdb) f 3
>#3  0xae82236a in dbus_message_cache_or_finalize (message=0x18f420) at
>external/dbus/dbus/dbus- message.c:576
>576      _dbus_assert (message->refcount.value == 0);
>(gdb) p message->refcount
>$30 = {value = 1}
>
>and in the unref we get:
>
>1395      old_refcount = _dbus_atomic_dec (&message->refcount);
>1396
>1397      _dbus_assert (old_refcount >= 0);
>1398
>1399      if (old_refcount == 1)
>1400        {
>1401          /* Calls application callbacks! */
>1402          dbus_message_cache_or_finalize (message);
>1403        }
>1404    }
>1405
>1406    /**
>(gdb) p old_refcount
>$32 = 1

Now that's a hairy case...

If old_refcount is 1, that means the decrement function saw a 1, which 
means no other thread had a pointer to this DBusMessage. That would mean 
the refcount cannot increase back to 1 after dropping to zero.

Since you proved with debugging that it was 1, we have to conclude that 
the atomic-decrement mechanism isn't working. Either it's not decrementing 
at all (which I find unlikely, unless the compiler is miscompiling) or it's 
not working atomically.

So, is it possible that the refcount was 2, two threads simultaneously 
decremented (non-atomically) and the result was 1? Yes, that is possible, 
if the decrement operation requires two or more steps.

But here the case is even trickier: is it possible for two threads to 
decrement at the same time, one of them see a 1, decrement and afterwards 
still see a 1?

Now that's weird. On the ARM, a decrement has to be three instructions 
read, modify, update). One thread can only get a 1 when reading if the 
other thread has already done it's update cycle to 1. And for that other 
thread to write a 1, it has to have seen a 2.

This is not a simple Write-After-Write case. There is one point of 
synchronisation, which is that one of the threads has seen the result from 
the other. So I can't conclude that this is a memory bus fluke.

Could it happen with three threads? If we have three threads, then the 
refcount must have been 3 at the beginning, so I don't see how adding more 
threads to the mix could explain the situation we have.

I can't come up with an explanation under normal operating conditions. I 
can only come up with explanations if there's a bug in your code, Jim.

This could easily happen if the refcount was 1, one thread incremented and 
the other decremented. This is obviously a bug in the user code, not in 
libdbus.

>seems like the dbus_atomic_dec function is not really "atomic" (in the
> sense the value is not immediately updated).

See my explanation above. Even a non-atomic decrement could not generate 
the condition you reported. A bug in your code, however, can.

>and in the android dbus version we have the macro definition:
>
>#undef DBUS_USE_ATOMIC_INT_486
>
>#if (defined(__i386__) || defined(__x86_64__))
>#define DBUS_USE_ATOMIC_INT_486 1
>#endif
>
>i noticed this was changed in 1.2.4 to be a constant 1 (and thus use the
>"atomic_exchange_and_add" function).
>should this solve my problem?

No. You said ARM. The code above is only for x86 and x86-64.

But D-Bus 1.3.0 will have a better solution. We added the improvement 
yesterday.

But I repeat: I don't think this is a bug in D-Bus. It would have been 
reported long ago if it were. I think it's a bug in your code.
-- 
  Thiago Macieira  -  thiago (AT) macieira.info - thiago (AT) kde.org
    PGP/GPG: 0x6EF45358; fingerprint:
    E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.freedesktop.org/archives/dbus/attachments/20090717/e3adfede/attachment-0001.pgp 


More information about the dbus mailing list