qmi-proxy lockup, transaction timeouts

Aleksander Morgado aleksandermj at chromium.org
Wed Jul 31 16:01:12 UTC 2024


Hey,

On Wed, Jul 31, 2024 at 5:14 PM Eduard Strehlau <eduard at lionizers.com>
wrote:

> Hi,
> I have observed some lockups with qmi-proxy and I think I narrowed down
> the issue.
>
> Replicatable setup:
> 1. Start an instance of qmi-proxy.
> 2. Run multiple concurrent qmicli commands with -p for example:
>    "qmicli -p -d /dev/cdc-wdm0 --device-open-version-info --dms-noop &
> qmicli -p -d /dev/cdc-wdm0 --device-open-version-info --dms-noop & qmicli
> -p -d /dev/cdc-wdm0 --device-open-version-info --dms-noop"
> The following or similar error should appear:
>    "error: couldn't create client for the 'dms' service: CID allocation
> failed in the CTL client: Transaction timed out"
> If the error does not appear kill the running qmi-proxy and repeat the
> steps.
>
> Probable cause:
> There is a race condition that allows the same qmi-devices to be open
> multiple times if proxy requests are made in quick succession.
> This race condition is known and handled:
>
> https://gitlab.freedesktop.org/mobile-broadband/libqmi/-/blob/main/src/libqmi-glib/qmi-proxy.c#L371
>
> Sadly this handling does not seem to be enough, I have observed qmi
> responses to arrive on the transiently opened device.
>

Oh, that is bad indeed, and needs to be handled.


> This causes breakdown of the transaction handling (all requests get a
> timeout) and eventual hangs of qmi-proxy
> where it does not respond to exit signals anymore (I am not sure how,
> since I am not familiar with the libqmi code).
>
> I have made an strace of qmi-proxy and attached a full version for further
> debugging, but the behaviour described above is visible:
> Abbreviated trace (...) first number is line number second PID:
>   1297  5436  openat(AT_FDCWD, "/dev/cdc-wdm0",
> O_RDWR|O_EXCL|O_NOCTTY|O_NONBLOCK|O_LARGEFILE) = 8
>   1433  5436  openat(AT_FDCWD, "/dev/cdc-wdm0",
> O_RDWR|O_EXCL|O_NOCTTY|O_NONBLOCK|O_LARGEFILE) = 10
>   1577  5436  openat(AT_FDCWD, "/dev/cdc-wdm0",
> O_RDWR|O_EXCL|O_NOCTTY|O_NONBLOCK|O_LARGEFILE) = 12
>   1676  5436  write(8, "\1\v\0\0\0\0\0\1!\0\0\0", 12) = 12
>   1697  5436  write(8, "\1\v\0\0\0\0\0\2!\0\0\0", 12) = 12
>   1760  5436  read(8, "\1\204\0\200\0\0\1\1!\0y" ... , 2048) = 133
>   1784  5436  read(12, "\1\204\0\200\0\0\1\2!\0y" ... , 2048) = 133
>   1830  5436  write(8, "\1\v\0\0\0\0\0\3!\0\0\0", 12) = 12
>   1850  5436  write(8, "\1\v\0\0\0\0\0\4!\0\0\0", 12) = 12
>   1880  5436  read(8,  "\1\204\0\200\0\0\1\3!\0y" ... , 2048) = 133
>   1890 5436  write(1, ... "[30 Apr 2024, 12:05:52] [Debug] [/dev/cdc-wdm0]
> No transaction matched in received message\n, 4096) = 4096
>
> In my opinion this is a fairly severe bug since it defeats the entire
> existence of qmi-proxy, it cannot be reliably used to concurrently use a
> qmi device.
>

The issue happens exclusively when the proxy is trying to open the control
port for the first time, right? I don't think we have observed this in the
wild too much, e.g. when using ModemManager, because in this case there is
a single process trying to open the proxy connection initially, not more
than one.

A proper fix would probably prevent the same device being open multiple
> times in the first place.
> I would love to open a pull request, but since I am not a glib developer
> and it is not trivial to do I cannot justify it.
>
>
I understand.

I was going to ask you open an issue in gitlab, but I see you already did
that at https://gitlab.freedesktop.org/mobile-broadband/libqmi/-/issues/113
Thanks for that.

Let me see how we can solve this.
Thanks!

-- 
Aleksander
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libqmi-devel/attachments/20240731/d9713531/attachment.htm>


More information about the libqmi-devel mailing list