Phantom "Out of Memory" error

Tue Jul 17 08:50:31 PDT 2007

So I've been debugging a randomly occurring "Out of Memory" error.
In my trace of what is happening in my multithreaded program, I have
come across this function below as my problem.

static dbus_bool_t
protected_change_timeout (DBusConnection           *connection,
                          DBusTimeout              *timeout,
                          DBusTimeoutAddFunction    add_function,
                          DBusTimeoutRemoveFunction remove_function,
                          DBusTimeoutToggleFunction toggle_function,
                          dbus_bool_t               enabled)
{
  DBusTimeoutList *timeouts;
  dbus_bool_t retval;

  HAVE_LOCK_CHECK (connection);

  /* This isn't really safe or reasonable; a better pattern is the "do
everything, then
   * drop lock and call out" one; but it has to be propagated up
through all callers
   */

  timeouts = connection->timeouts;
  if (timeouts)
    {
      connection->timeouts = NULL;
      _dbus_connection_ref_unlocked (connection);
      CONNECTION_UNLOCK (connection);

      if (add_function)
        retval = (* add_function) (timeouts, timeout);
      else if (remove_function)
        {
          retval = TRUE;
          (* remove_function) (timeouts, timeout);
        }
      else
        {
          retval = TRUE;
          (* toggle_function) (timeouts, timeout, enabled);
        }

      CONNECTION_LOCK (connection);
      connection->timeouts = timeouts;
      _dbus_connection_unref_unlocked (connection);

      return retval;
    }
  else
    return FALSE;
}
Apparently DBus sets the timeout list temporarily to NULL and drops
the lock to the connection for changing the timeout .   However if
another thread acquires the lock and happens to get to this function
again, it fails out of the function because connection->timeouts is
null.   This is proprogated down the stack and eventually turns up as
an "Out of Memory" abort in the glib bindings.  So I have two
questions.

First does anyone know the designed purpose of this function and what
would be a safe way to modify the function so that this case doesn't
happen?   I can imply some meaning to the function, but hesitate to
quickly change some locking scheme I am not completely familiar with.
Secondly is there a good way we can modify the error return system to
give better output of what actually happened?  We can't change api,
but maybe FALSE can represent error, and whoever set FALSE can call a
dbus_set_current_error_string(char *), and outside program can print
out from dbus_get_current_error_string(char *)?

Keith Preston
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/dbus/attachments/20070717/c8eb7992/attachment.htm