Phantom "Out of Memory" error
Havoc Pennington
hp at redhat.com
Wed Jul 18 03:00:52 PDT 2007
Hi,
If you browse through the code you will find other cases where the
function is often called _and_unlock(), where the idea is that at the
very end of the function the connection is unlocked then the application
callback is invoked without the lock.
This is how the function you quoted should really work.
There are two reasons:
- calling an app callback with the lock isn't OK since the app would
deadlock if it tried to use the DBusConnection
- if you call an app callback without the lock, there are all sorts of
potential weird bugs due to reentrancy, perhaps an example is the
OOM thing you are encountering
By calling the app callback only at the very end, the reentrancy issues
are avoided.
The hard thing is that the patch to call the callback at the end has to
change every function in the call stack, more or less. Another headache
here is that the connection lock has to be held until we stop using the
connection->timeouts object, so maybe dbus_timeout_list_add_timeout is
changed to return the app callback as an invokable closure or something
of that nature, and then the closure is passed up the call stack until
we can unlock and invoke it. The closure would store
timeout_list->add_timeout_function and timeout_list->timeout_data, but
could not have a pointer to the timeout_list itself since that is
protected by the connection lock.
Without fixing that, we would have to fix the reentrancy bug, which also
looks hard. timeouts is set to NULL since it's going to be accessed
outside the lock; if it were non-NULL in the case you're encountering it
would be accessed from two threads concurrently without locks. This is
just broken really, connection->timeouts should not be accessed without
holding the connection lock.
It's wrong really that the function returns FALSE in this case, since
FALSE should mean out-of-memory only. But, there isn't really anything
else to do; if we change to TRUE then the function silently does
nothing, which isn't what we want.
A full fix for the reentrancy would probably involve adding a separate
lock on the TimeoutsList or some nightmare like that.
The simplest quick fix I can think of is to make
timeout_list_add_timeout and friends return the closure thingy, then
where the current code does timeouts=NULL and drops the lock, don't do
the timeouts=NULL part since we won't need the timeout list, just the
closure.
Havoc
More information about the dbus
mailing list