Phantom "Out of Memory" error

Havoc Pennington hp at redhat.com
Wed Jul 18 03:00:52 PDT 2007


Hi,

If you browse through the code you will find other cases where the 
function is often called _and_unlock(), where the idea is that at the 
very end of the function the connection is unlocked then the application 
callback is invoked without the lock.

This is how the function you quoted should really work.

There are two reasons:
  - calling an app callback with the lock isn't OK since the app would
    deadlock if it tried to use the DBusConnection
  - if you call an app callback without the lock, there are all sorts of
    potential weird bugs due to reentrancy, perhaps an example is the
    OOM thing you are encountering

By calling the app callback only at the very end, the reentrancy issues 
are avoided.

The hard thing is that the patch to call the callback at the end has to 
change every function in the call stack, more or less. Another headache 
here is that the connection lock has to be held until we stop using the 
connection->timeouts object, so maybe dbus_timeout_list_add_timeout is 
changed to return the app callback as an invokable closure or something 
of that nature, and then the closure is passed up the call stack until 
we can unlock and invoke it. The closure would store 
timeout_list->add_timeout_function and timeout_list->timeout_data, but 
could not have a pointer to the timeout_list itself since that is 
protected by the connection lock.

Without fixing that, we would have to fix the reentrancy bug, which also 
looks hard. timeouts is set to NULL since it's going to be accessed 
outside the lock; if it were non-NULL in the case you're encountering it 
would be accessed from two threads concurrently without locks. This is 
just broken really, connection->timeouts should not be accessed without 
holding the connection lock.

It's wrong really that the function returns FALSE in this case, since 
FALSE should mean out-of-memory only. But, there isn't really anything 
else to do; if we change to TRUE then the function silently does 
nothing, which isn't what we want.

A full fix for the reentrancy would probably involve adding a separate 
lock on the TimeoutsList or some nightmare like that.

The simplest quick fix I can think of is to make 
timeout_list_add_timeout and friends return the closure thingy, then 
where the current code does timeouts=NULL and drops the lock, don't do 
the timeouts=NULL part since we won't need the timeout list, just the 
closure.

Havoc




More information about the dbus mailing list