qmi-proxy lockup, transaction timeouts
Martin Maurer
martin.maurer at mmeacs.de
Wed Jul 31 16:36:05 UTC 2024
Hi,
could be similar to
https://gitlab.freedesktop.org/mobile-broadband/libqmi/-/issues/112
Best regards,
Martin
Am 31.07.2024 um 18:01 schrieb Aleksander Morgado:
> Hey,
>
> On Wed, Jul 31, 2024 at 5:14 PM Eduard Strehlau <eduard at lionizers.com>
> wrote:
>
> Hi,
> I have observed some lockups with qmi-proxy and I think I narrowed
> down the issue.
>
> Replicatable setup:
> 1. Start an instance of qmi-proxy.
> 2. Run multiple concurrent qmicli commands with -p for example:
> "qmicli -p -d /dev/cdc-wdm0 --device-open-version-info
> --dms-noop & qmicli -p -d /dev/cdc-wdm0 --device-open-version-info
> --dms-noop & qmicli -p -d /dev/cdc-wdm0 --device-open-version-info
> --dms-noop"
> The following or similar error should appear:
> "error: couldn't create client for the 'dms' service: CID
> allocation failed in the CTL client: Transaction timed out"
> If the error does not appear kill the running qmi-proxy and repeat
> the steps.
>
> Probable cause:
> There is a race condition that allows the same qmi-devices to be
> open multiple times if proxy requests are made in quick succession.
> This race condition is known and handled:
> https://gitlab.freedesktop.org/mobile-broadband/libqmi/-/blob/main/src/libqmi-glib/qmi-proxy.c#L371
>
> Sadly this handling does not seem to be enough, I have observed
> qmi responses to arrive on the transiently opened device.
>
>
> Oh, that is bad indeed, and needs to be handled.
>
> This causes breakdown of the transaction handling (all requests
> get a timeout) and eventual hangs of qmi-proxy
> where it does not respond to exit signals anymore (I am not sure
> how, since I am not familiar with the libqmi code).
>
> I have made an strace of qmi-proxy and attached a full version for
> further debugging, but the behaviour described above is visible:
> Abbreviated trace (...) first number is line number second PID:
> 1297 5436 openat(AT_FDCWD, "/dev/cdc-wdm0",
> O_RDWR|O_EXCL|O_NOCTTY|O_NONBLOCK|O_LARGEFILE) = 8
> 1433 5436 openat(AT_FDCWD, "/dev/cdc-wdm0",
> O_RDWR|O_EXCL|O_NOCTTY|O_NONBLOCK|O_LARGEFILE) = 10
> 1577 5436 openat(AT_FDCWD, "/dev/cdc-wdm0",
> O_RDWR|O_EXCL|O_NOCTTY|O_NONBLOCK|O_LARGEFILE) = 12
> 1676 5436 write(8, "\1\v\0\0\0\0\0\1!\0\0\0", 12) = 12
> 1697 5436 write(8, "\1\v\0\0\0\0\0\2!\0\0\0", 12) = 12
> 1760 5436 read(8, "\1\204\0\200\0\0\1\1!\0y" ... , 2048) = 133
> 1784 5436 read(12, "\1\204\0\200\0\0\1\2!\0y" ... , 2048) = 133
> 1830 5436 write(8, "\1\v\0\0\0\0\0\3!\0\0\0", 12) = 12
> 1850 5436 write(8, "\1\v\0\0\0\0\0\4!\0\0\0", 12) = 12
> 1880 5436 read(8, "\1\204\0\200\0\0\1\3!\0y" ... , 2048) = 133
> 1890 5436 write(1, ... "[30 Apr 2024, 12:05:52] [Debug]
> [/dev/cdc-wdm0] No transaction matched in received message\n,
> 4096) = 4096
>
> In my opinion this is a fairly severe bug since it defeats the
> entire existence of qmi-proxy, it cannot be reliably used to
> concurrently use a qmi device.
>
>
> The issue happens exclusively when the proxy is trying to open the
> control port for the first time, right? I don't think we have observed
> this in the wild too much, e.g. when using ModemManager, because in
> this case there is a single process trying to open the proxy
> connection initially, not more than one.
>
> A proper fix would probably prevent the same device being open
> multiple times in the first place.
> I would love to open a pull request, but since I am not a glib
> developer and it is not trivial to do I cannot justify it.
>
>
> I understand.
>
> I was going to ask you open an issue in gitlab, but I see you already
> did that at
> https://gitlab.freedesktop.org/mobile-broadband/libqmi/-/issues/113
> Thanks for that.
>
> Let me see how we can solve this.
> Thanks!
>
> --
> Aleksander
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libqmi-devel/attachments/20240731/786fe874/attachment-0001.htm>
More information about the libqmi-devel
mailing list