qmi-proxy lockup, transaction timeouts
Aleksander Morgado
aleksandermj at chromium.org
Wed Jul 31 16:01:12 UTC 2024
Hey,
On Wed, Jul 31, 2024 at 5:14 PM Eduard Strehlau <eduard at lionizers.com>
wrote:
> Hi,
> I have observed some lockups with qmi-proxy and I think I narrowed down
> the issue.
>
> Replicatable setup:
> 1. Start an instance of qmi-proxy.
> 2. Run multiple concurrent qmicli commands with -p for example:
> "qmicli -p -d /dev/cdc-wdm0 --device-open-version-info --dms-noop &
> qmicli -p -d /dev/cdc-wdm0 --device-open-version-info --dms-noop & qmicli
> -p -d /dev/cdc-wdm0 --device-open-version-info --dms-noop"
> The following or similar error should appear:
> "error: couldn't create client for the 'dms' service: CID allocation
> failed in the CTL client: Transaction timed out"
> If the error does not appear kill the running qmi-proxy and repeat the
> steps.
>
> Probable cause:
> There is a race condition that allows the same qmi-devices to be open
> multiple times if proxy requests are made in quick succession.
> This race condition is known and handled:
>
> https://gitlab.freedesktop.org/mobile-broadband/libqmi/-/blob/main/src/libqmi-glib/qmi-proxy.c#L371
>
> Sadly this handling does not seem to be enough, I have observed qmi
> responses to arrive on the transiently opened device.
>
Oh, that is bad indeed, and needs to be handled.
> This causes breakdown of the transaction handling (all requests get a
> timeout) and eventual hangs of qmi-proxy
> where it does not respond to exit signals anymore (I am not sure how,
> since I am not familiar with the libqmi code).
>
> I have made an strace of qmi-proxy and attached a full version for further
> debugging, but the behaviour described above is visible:
> Abbreviated trace (...) first number is line number second PID:
> 1297 5436 openat(AT_FDCWD, "/dev/cdc-wdm0",
> O_RDWR|O_EXCL|O_NOCTTY|O_NONBLOCK|O_LARGEFILE) = 8
> 1433 5436 openat(AT_FDCWD, "/dev/cdc-wdm0",
> O_RDWR|O_EXCL|O_NOCTTY|O_NONBLOCK|O_LARGEFILE) = 10
> 1577 5436 openat(AT_FDCWD, "/dev/cdc-wdm0",
> O_RDWR|O_EXCL|O_NOCTTY|O_NONBLOCK|O_LARGEFILE) = 12
> 1676 5436 write(8, "\1\v\0\0\0\0\0\1!\0\0\0", 12) = 12
> 1697 5436 write(8, "\1\v\0\0\0\0\0\2!\0\0\0", 12) = 12
> 1760 5436 read(8, "\1\204\0\200\0\0\1\1!\0y" ... , 2048) = 133
> 1784 5436 read(12, "\1\204\0\200\0\0\1\2!\0y" ... , 2048) = 133
> 1830 5436 write(8, "\1\v\0\0\0\0\0\3!\0\0\0", 12) = 12
> 1850 5436 write(8, "\1\v\0\0\0\0\0\4!\0\0\0", 12) = 12
> 1880 5436 read(8, "\1\204\0\200\0\0\1\3!\0y" ... , 2048) = 133
> 1890 5436 write(1, ... "[30 Apr 2024, 12:05:52] [Debug] [/dev/cdc-wdm0]
> No transaction matched in received message\n, 4096) = 4096
>
> In my opinion this is a fairly severe bug since it defeats the entire
> existence of qmi-proxy, it cannot be reliably used to concurrently use a
> qmi device.
>
The issue happens exclusively when the proxy is trying to open the control
port for the first time, right? I don't think we have observed this in the
wild too much, e.g. when using ModemManager, because in this case there is
a single process trying to open the proxy connection initially, not more
than one.
A proper fix would probably prevent the same device being open multiple
> times in the first place.
> I would love to open a pull request, but since I am not a glib developer
> and it is not trivial to do I cannot justify it.
>
>
I understand.
I was going to ask you open an issue in gitlab, but I see you already did
that at https://gitlab.freedesktop.org/mobile-broadband/libqmi/-/issues/113
Thanks for that.
Let me see how we can solve this.
Thanks!
--
Aleksander
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libqmi-devel/attachments/20240731/d9713531/attachment.htm>
More information about the libqmi-devel
mailing list