A libev based mainloop and lost signals

Thu Aug 19 02:40:58 PDT 2010

Hi,
Excerpts from Thiago Macieira's message of Wed Aug 11 18:58:43 +0200 2010:
> > Excerpts from Thiago Macieira's message of Tue Aug 10 14:30:30 +0200 2010:
> > > To make this work, you must not use _and_block. You must instead use
> > > send_with_reply and make sure that the other thread doesn't pull your
> > > reply before you do.
> > 
> > Well, send_with_reply is not an overly nice API for a multithreaded setup
> > since I have a race between setting the DBusPendingCall's callback and the
> > dispatching of the reply in the background thread.
> 
> That can easily be solved by taking a mutex lock around the D-Bus dispatcher. 
> The background thread will be woken up out of select(2) or poll(2), but will 
> block on trying to acquire the mutex.
> 
> When the foreground thread is done calling send_with_reply, it only needs to 
> unlock that mutex.

You are of course right that this is easy to do, and I've implemented it this
way now. However, it still feels a little bit like I'm misunderstanding
something.

So, here's what I do:

struct wait_info { 
    pthread_mutex_t nMutex;
    pthread_cond_t nCond;
};

struct wait_info* new_wait_info() { 
    struct wait_info* pRet = malloc(sizeof(struct wait_info));
    assert(pRet);
    pthread_mutex_init(&pRet->nMutex, NULL);
    pthread_cond_init(&pRet->nCond, NULL);
    return pRet;
}

void free_wait_info(struct wait_info* pInfo) { 
    pthread_mutex_destroy(&pInfo->nMutex);
    pthread_cond_destroy(&pInfo->nCond);
    free(pInfo);
}

void pending_call_notifier_function(DBusPendingCall *pending, void *user_data) { 
    struct wait_info* pCtx = user_data;

    pthread_mutex_lock(&pCtx->nMutex);
    pthread_cond_broadcast(&pCtx->nCond);
    pthread_mutex_unlock(&pCtx->nMutex);
}

extern pthread_mutex_t g_nDispatchMutex;

DBusMessage* my_send_with_reply_and_block(DBusConnection* pConn, DBusMessage* pRequest, int nTimeOutMS, DBusError* pError) {
    DBusPendingCall* pPendingCall;
    DBusMessage* pReply = NULL;

    // avoid race in dbus_pending_call_set_notify
    struct wait_info* pInfo = new_wait_info();

    pthread_mutex_lock(&g_nDispatchMutex);
    pthread_mutex_lock(&pInfo->nMutex);

    if(!dbus_connection_send_with_reply(pConn, pRequest, &pPendingCall, 10000)) { 
        fprintf(stderr, "Failed to call DBus method!\n");
        //TODO: Set DBusError!
        pthread_mutex_unlock(&g_nDispatchMutex);
        return NULL;
    }
    dbus_pending_call_set_notify(pPendingCall, pending_call_notifier_function, pInfo, free_wait_info);
    pthread_mutex_unlock(&g_nDispatchMutex);

    pthread_cond_wait(&pInfo->nCond, &pInfo->nMutex);
    pthread_mutex_unlock(&pInfo->nMutex);

    if(dbus_pending_call_get_completed(pPendingCall)) { 
        pReply = dbus_pending_call_steal_reply(pPendingCall);

        if(pReply != NULL &&  pError != NULL && dbus_message_get_type(pReply) == DBUS_MESSAGE_TYPE_ERROR) { 
            const char* message = "";
            dbus_message_get_args(pReply, pError, DBUS_TYPE_STRING, &message, DBUS_TYPE_INVALID);
            dbus_set_error(pError, dbus_message_get_error_name(pReply), message);
            dbus_message_unref(pReply);
            pReply = NULL;
        }

    } else { 
        fprintf(stderr, "pending call notified us but is not completed.\n");
    }

    pthread_mutex_unlock(&pInfo->nMutex);
    dbus_pending_call_unref(pPendingCall);
    return pReply;    
}

While this seems to be rather stable (to the point that I started noticing that
dbus_bus_add_match_rule also uses dbus_connection_send_with_reply_and_block and
thus cause occasional hangs when used in my code :)), I still think I've
essentially reimplemented what I thought dbus_pending_call_block would do here.

I have also tried to actually replace my custom condition wait code with
dbus_pending_call_block, but that seems to produce very similar results to
simply using dbus_connection_send_with_reply_and_block directly.

> Though I agree that libdbus-1 should be doing this on its own. A 
> send_with_reply operation should not be interruptible. And I think it does 
> that, in D-Bus 1.3.x.

Hrm, I'm using what Debian currently ships in experimental, which looks like 
a git snapshot from July: 1.3.2~git20100715.821f99c-1, so I presume that will
not affect my problems.

> Note, however, that each send_with_reply_and_block, like I said above, will 
> try to change states of timers and fd-watchers. Your event loop implementation 
> must support changing state of those from *any* thread, not just the mainloop 
> thread. This was the source of a number of bugs in QtDBus until I finally 
> nailed it down around Qt 4.5.

Hm, really? I have debug output in my add_watch_function, toggle_watch_function
and remove_watch_function, and they never seem to get called when I do a
dbus_connection_send_with_reply_and_block from the main thread.

Now that you gave me the idea, I think this is the actual problem here -
dbus_connection_send_with_reply_and_block will take over the socket, but not
tell the main loop that it does so?

> If this is all that your code is doing, it's fine.
> 
> But what I had understood is that you're making calls from one thread that 
> should be processed by the mainloop thread. That is, you're making calls to 
> yourself.

No, I'm not making calls to myself. I'm calling a DBus method on another process.
Sometimes, this method will also emit a signal that I'm AddMatched to on the client,
and my initial concern was that this signal sometimes seemed to be lost (inside 
lidbus, since I could see it arrive in the process via various means).

ciao,
-- 
[*Thomas  Themel*]  This signature is only valid if displayed
[extended contact]  ~~     in the correct color _red_.     ~~
[info provided in]  Help fight misconfigured news readers,
[*message header*]       use "X-Signature-Color: red"!