atomic message unref issues on ARM
Thiago Macieira
thiago at kde.org
Fri Jul 17 01:21:58 PDT 2009
Jim Harvy wrote:
>(gdb) f 3
>#3 0xae82236a in dbus_message_cache_or_finalize (message=0x18f420) at
>external/dbus/dbus/dbus- message.c:576
>576 _dbus_assert (message->refcount.value == 0);
>(gdb) p message->refcount
>$30 = {value = 1}
>
>and in the unref we get:
>
>1395 old_refcount = _dbus_atomic_dec (&message->refcount);
>1396
>1397 _dbus_assert (old_refcount >= 0);
>1398
>1399 if (old_refcount == 1)
>1400 {
>1401 /* Calls application callbacks! */
>1402 dbus_message_cache_or_finalize (message);
>1403 }
>1404 }
>1405
>1406 /**
>(gdb) p old_refcount
>$32 = 1
Now that's a hairy case...
If old_refcount is 1, that means the decrement function saw a 1, which
means no other thread had a pointer to this DBusMessage. That would mean
the refcount cannot increase back to 1 after dropping to zero.
Since you proved with debugging that it was 1, we have to conclude that
the atomic-decrement mechanism isn't working. Either it's not decrementing
at all (which I find unlikely, unless the compiler is miscompiling) or it's
not working atomically.
So, is it possible that the refcount was 2, two threads simultaneously
decremented (non-atomically) and the result was 1? Yes, that is possible,
if the decrement operation requires two or more steps.
But here the case is even trickier: is it possible for two threads to
decrement at the same time, one of them see a 1, decrement and afterwards
still see a 1?
Now that's weird. On the ARM, a decrement has to be three instructions
read, modify, update). One thread can only get a 1 when reading if the
other thread has already done it's update cycle to 1. And for that other
thread to write a 1, it has to have seen a 2.
This is not a simple Write-After-Write case. There is one point of
synchronisation, which is that one of the threads has seen the result from
the other. So I can't conclude that this is a memory bus fluke.
Could it happen with three threads? If we have three threads, then the
refcount must have been 3 at the beginning, so I don't see how adding more
threads to the mix could explain the situation we have.
I can't come up with an explanation under normal operating conditions. I
can only come up with explanations if there's a bug in your code, Jim.
This could easily happen if the refcount was 1, one thread incremented and
the other decremented. This is obviously a bug in the user code, not in
libdbus.
>seems like the dbus_atomic_dec function is not really "atomic" (in the
> sense the value is not immediately updated).
See my explanation above. Even a non-atomic decrement could not generate
the condition you reported. A bug in your code, however, can.
>and in the android dbus version we have the macro definition:
>
>#undef DBUS_USE_ATOMIC_INT_486
>
>#if (defined(__i386__) || defined(__x86_64__))
>#define DBUS_USE_ATOMIC_INT_486 1
>#endif
>
>i noticed this was changed in 1.2.4 to be a constant 1 (and thus use the
>"atomic_exchange_and_add" function).
>should this solve my problem?
No. You said ARM. The code above is only for x86 and x86-64.
But D-Bus 1.3.0 will have a better solution. We added the improvement
yesterday.
But I repeat: I don't think this is a bug in D-Bus. It would have been
reported long ago if it were. I think it's a bug in your code.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.freedesktop.org/archives/dbus/attachments/20090717/e3adfede/attachment-0001.pgp
More information about the dbus
mailing list