From erik at slagter.name Mon Jan 13 13:14:40 2025 From: erik at slagter.name (Erik Slagter) Date: Mon, 13 Jan 2025 14:14:40 +0100 Subject: Please help with signals, some seem to get lost Message-ID: <1fc958e8-1457-4da9-baa9-dff7001e7c29@slagter.name> Hi there, I'll start with a terse description of what I am facing, maybe that is enough for you guys to point me in the right direction. If not, I'll elaborate. For my application I need to send signals to several listeners on the system bus. The signals are composed of a string of about 300 bytes. Sometimes some signals are sent in quick succession and that's where it goes wrong. I use this code (I'd like to keep things very simple, no main loops etc, the handler runs in it's own thread anyway): dbus_connection_read_write_dispatch(connection, -1) // blocks until a message is received, quit if negative result message = dbus_connection_pop_message(connection) Now every now and then dbus_connection_read_write_dispatch returns from blocking but dbus_connection_pop_messages returns NULL. Two interesting observations here: - other processes that are listening to the same signal do get the message indeed, it's just one (random) of them that gets the NULL message - if I apply rate limiting, the problem decreases and at a rate of 2 signals per second it's gone. So it looks like I am hitting a rate limiter somewhere. I really want to go full throttle here, is there a way to change this parameter? Is there another workaround for this? Thanks in advance! From thiago at kde.org Tue Jan 14 00:38:31 2025 From: thiago at kde.org (Thiago Macieira) Date: Mon, 13 Jan 2025 16:38:31 -0800 Subject: Please help with signals, some seem to get lost In-Reply-To: <1fc958e8-1457-4da9-baa9-dff7001e7c29@slagter.name> References: <1fc958e8-1457-4da9-baa9-dff7001e7c29@slagter.name> Message-ID: <2535448.XAFRqVoOGU@tjmaciei-mobl5> On Monday 13 January 2025 05:14:40 Pacific Standard Time Erik Slagter wrote: > Hi there, > > I'll start with a terse description of what I am facing, maybe that is > enough for you guys to point me in the right direction. If not, I'll > elaborate. > > For my application I need to send signals to several listeners on the > system bus. The signals are composed of a string of about 300 bytes. > Sometimes some signals are sent in quick succession and that's where it > goes wrong. Are they directed signals? That is, do they have a destination bus address set? > I use this code (I'd like to keep things very simple, no main loops etc, > the handler runs in it's own thread anyway): > > dbus_connection_read_write_dispatch(connection, -1) // blocks until a > message is received, quit if negative result > message = dbus_connection_pop_message(connection) > > Now every now and then dbus_connection_read_write_dispatch returns from > blocking but dbus_connection_pop_messages returns NULL. That's normal. It dispatches one read OR write. If it did the latter, then there is no message to be popped. > Two interesting observations here: > - other processes that are listening to the same signal do get the > message indeed, it's just one (random) of them that gets the NULL message It probably was sending something before that. Call it again, because after all this function is meant to be used in a loop, as its documentation shows: https://dbus.freedesktop.org/doc/api/html/ group__DBusConnection.html#ga580d8766c23fe5f49418bc7d87b67dc6 > - if I apply rate limiting, the problem decreases and at a rate of 2 > signals per second it's gone. > > So it looks like I am hitting a rate limiter somewhere. I really want to > go full throttle here, is there a way to change this parameter? Is there > another workaround for this? There isn't a rate limiter. -- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel DCAI Platform & System Engineering From erik at slagter.name Tue Jan 14 09:20:51 2025 From: erik at slagter.name (Erik Slagter) Date: Tue, 14 Jan 2025 10:20:51 +0100 Subject: Please help with signals, some seem to get lost In-Reply-To: <2535448.XAFRqVoOGU@tjmaciei-mobl5> References: <1fc958e8-1457-4da9-baa9-dff7001e7c29@slagter.name> <2535448.XAFRqVoOGU@tjmaciei-mobl5> Message-ID: Hi Thiago, Thank you very much for your interest! I will include a piece of dbus-monitor output here to hopefully answer some of your questions. Please note that consistently all sent signals are shown in the output, so I like to assume that the problem is on the receiving side. I left out the actual payload data because it's not really human readable. -------------------------------- >-8 -------------------------- signal time=1736793922.320478 sender=:1.20990 -> destination=(null destination) serial=98 path=/name/slagter/erik/espproxy; interface=name.slagter.erik.espproxy.signal.esp8266.display.graphic; member=push_command string "..." (~400 bytes of opaque data) signal time=1736793922.321147 sender=:1.20990 -> destination=(null destination) serial=99 path=/name/slagter/erik/espproxy; interface=name.slagter.erik.espproxy.signal.esp8266.display.graphic; member=push_command string "..." (~400 bytes of opaque data) signal time=1736793982.495803 sender=:1.20991 -> destination=(null destination) serial=95 path=/name/slagter/erik/espproxy; interface=name.slagter.erik.espproxy.signal.esp8266.display.graphic; member=push_command string "..." (~400 bytes of opaque data) signal time=1736793982.496052 sender=:1.20991 -> destination=(null destination) serial=96 path=/name/slagter/erik/espproxy; interface=name.slagter.erik.espproxy.signal.esp8266.display.graphic; member=push_command string "..." (~400 bytes of opaque data) signal time=1736793982.496299 sender=:1.20991 -> destination=(null destination) serial=97 path=/name/slagter/erik/espproxy; interface=name.slagter.erik.espproxy.signal.esp8266.display.graphic; member=push_command string "..." (~400 bytes of opaque data) -------------------------------- >-8 -------------------------- ES> For my application I need to send signals to several listeners on the ES> system bus. The signals are composed of a string of about 300 bytes. ES> Sometimes some signals are sent in quick succession and that's where it ES> goes wrong. > Are they directed signals? That is, do they have a destination bus address > set? I just learned about this option yesterday, as a warning to avoid that (apparently such messages are prone to be discarded). so I guess, no ;-). The targeted "interface" is "name.slagter.erik.espproxy.signal.esp8266.display.graphic" ES> dbus_connection_read_write_dispatch(connection, -1) ES> message = dbus_connection_pop_message(connection) ES> ES> Now every now and then dbus_connection_read_write_dispatch returns from ES> blocking but dbus_connection_pop_messages returns NULL. > That's normal. It dispatches one read OR write. If it did the latter, then > there is no message to be popped. Okay, I had this hunch already. So I changed the code to repeatedly call both methods until an actual message is received and report when the first method returns but no message could be popped. Now it gets interesting I think: the total amount of the first method (dbus_connection_read_write_dispatch) returning is exactly the same on all listening clients. Just, randomly distributed over the listening clients, one or more instances do not yield to a pop-able message. Please see this debug output to see what I mean. Please also note (see dbus-monitor output) I am sending 5 signals and only 4 are received (this is consistent). -------------------------------- >-8 -------------------------- an 14 10:12:11 artemis espif[2789853]: signal received, interface: name.slagter.erik.espproxy.signal.esp8266.display.graphic, method: push_command jan 14 10:12:11 artemis espif[2789853]: signal received, interface: name.slagter.erik.espproxy.signal.esp8266.display.graphic, method: push_command jan 14 10:12:11 artemis espif[2789853]: signal received, interface: name.slagter.erik.espproxy.signal.esp8266.display.graphic, method: push_command jan 14 10:12:11 artemis espif[2789853]: signal received, interface: name.slagter.erik.espproxy.signal.esp8266.display.graphic, method: push_command *** jan 14 10:12:11 artemis espif[2789855]: signal received, interface: name.slagter.erik.espproxy.signal.esp8266.display.graphic, method: push_command jan 14 10:12:11 artemis espif[2789855]: signal received, interface: name.slagter.erik.espproxy.signal.esp8266.display.graphic, method: push_command jan 14 10:12:11 artemis espif[2789855]: no messages in queue, attempt: 0 jan 14 10:12:11 artemis espif[2789855]: signal received, interface: name.slagter.erik.espproxy.signal.esp8266.display.graphic, method: push_command -------------------------------- >-8 -------------------------- ES> Two interesting observations here: ES> - other processes that are listening to the same signal do get the ES> message indeed, it's just one (random) of them that gets the NULL message > It probably was sending something before that. Call it again, because after > all this function is meant to be used in a loop, as its documentation shows: > https://dbus.freedesktop.org/doc/api/html/ > group__DBusConnection.html#ga580d8766c23fe5f49418bc7d87b67dc6 How do you mean exactly? "It" is the client I assume? The client is not really sending things, it is only responding to one type of method call, which works without any issue and is fully completed before the signals are received. After sending the reply I do a dbus_connection_flush(), should I add a call to dbus_connection_read_write_dispatch there (after/before?). > There isn't a rate limiter. So it's a race condition ;-) Now just find out where it is... From erik at slagter.name Tue Jan 14 10:32:36 2025 From: erik at slagter.name (Erik Slagter) Date: Tue, 14 Jan 2025 11:32:36 +0100 Subject: Please help with signals, some seem to get lost In-Reply-To: References: <1fc958e8-1457-4da9-baa9-dff7001e7c29@slagter.name> <2535448.XAFRqVoOGU@tjmaciei-mobl5> Message-ID: <7c8afdd9-2ef9-464a-860b-0a5e07d40dc9@slagter.name> Update! Some experimentation led to this code which seems to work! The main culprit seems to be that I called dbus_connection_read_write_dispatch and then dbus_connection_pop_message, which apparently works for methods but not (always) for signals. Looks like some signal messages where overwritten before they could be processed. Please comment on my current approach (code simplified for optimal presentation). Please also note that dbus_connection_pop_message now always succeeds at the first occurrence (sometimes) or the second occurrence (mostly), The code after the second occurrence [while(dbus_connection_dispatch(bus_connection) != DBUS_DISPATCH_COMPLETE) etc.] doesn't seem to be necessary, at least up until now it has never run. Apparently no "dispatch" is necessary. The dbus_connection_flush statement may not be necessary as well. I now send 16 test signals in one go and all of them are received! get_message(...) { if((pending_message = dbus_connection_pop_message(connection))) goto done; dbus_connection_flush(bus_connection); if(!dbus_connection_read_write(connection, -1)) throw("failed"); if((pending_message = dbus_connection_pop_message(connection))) goto done; while(dbus_connection_dispatch(connection) != DBUS_DISPATCH_COMPLETE) { print "call dispatch"; if((pending_message = dbus_connection_pop_message(connection))) { print << "pop succeeded @3"; goto done; } print "repeating dispatch\n"; } if(!(pending_message = dbus_connection_pop_message(connection))) { print "pop succeeded @4"; goto done; } throw("failed")); }