MTU issues.

Amol Lad Amol.Lad at 4rf.com
Fri Feb 7 03:23:06 UTC 2020


Hi,

Is similar change needed in cdc_mbim as well?

Amol


The information in this email communication (inclusive of attachments) is confidential to 4RF Limited and the intended recipient(s). If you are not the intended recipient(s), please note that any use, disclosure, distribution or copying of this information or any part thereof is strictly prohibited and that the author accepts no liability for the consequences of any action taken on the basis of the information provided. If you have received this email in error, please notify the sender immediately by return email and then delete all instances of this email from your system. 4RF Limited will not accept responsibility for any consequences associated with the use of this email (including, but not limited to, damages sustained as a result of any viruses and/or any action or lack of action taken in reliance on it).-----Original Message-----
From: libqmi-devel <libqmi-devel-bounces at lists.freedesktop.org> On Behalf Of Daniele Palmas
Sent: Thursday, 6 February 2020 5:45 PM
To: Paul Gildea <gildeap at tcd.ie>
Cc: libqmi (development) <libqmi-devel at lists.freedesktop.org>; Bjørn Mork <bjorn at mork.no>
Subject: Re: MTU issues.

Hi Bjørn and Paul,

Il giorno gio 6 feb 2020 alle ore 12:36 Paul Gildea <gildeap at tcd.ie> ha scritto:
>
> Hi,
>
> Thanks for that Bjørn, helped me understand! Dug deeper into this and found the issue and a solution.
> Packets are being rejected in the ring buffer used by the xHCI controller.
>
> Enabled traces for usbnet.c and xhci-ring.c file in debugfs as follows:
>
> mount -t debugfs none /sys/kernel/debug echo -n 'file usbnet.c +p' >
> /sys/kernel/debug/dynamic_debug/control
> echo -n 'file xhci-ring.c +p' >
> /sys/kernel/debug/dynamic_debug/control
>
> Saw a lot of errors indicating URB with len=0 when it was expecting len=1430. This was preceded by an URB of len 1024 bytes:
>
> xhci_hcd 0000:00:14.0: Babble error for slot 10 ep 12 on endpoint
> xhci_hcd 0000:00:14.0: Giveback URB ffff8802193be540, len = 1024,
> expected = 1430, status = -75 xhci_hcd 0000:00:14.0: Transfer error
> for slot 10 ep 12 on endpoint /repeated
>
> Error -75 is EOVERFLOW, which makes sense as per
> include/uapi/asm-generic/errno.h
>
> #define EOVERFLOW 75 /* Value too large for defined data type */
>
> "Babble error": When a packet larger than MTU arrives in Linux from the modem and is discarded with -EOVERFLOW error.
> This is seen on USB3.0 and USB2.0 busses. This is essentially because the MRU (Max Receive Size) is not a separate entity to the MTU (Max Transmit Size) and the received packets can be larger than those transmitted.
>
> "Endless input error": Following the babble error we see an endless supply of zero-length URBs which are rejected with -EPROTO (increasing the rx input error counter each time).
> This is only seen on USB3.0. These continue to come ad infinitum until the modem is shutdown.
>
> There appears to be a bug in the core USB handling code in Linux that doesn't deal well with network MTUs smaller than 1500 bytes. By default the dev->hard_mtu (the "real" MTU) is in lockstep with dev->rx_urb_size (essentially an MRU), and it's the latter that is causing trouble. This has nothing to do with the modems; the issue can be reproduced by getting a USB-Ethernet dongle, setting the MTU to 1430, and pinging with size greater than 1406.
>
> Will submit the below patch that solves the issue, if that is acceptable?
>
>
> +diff -Naur linux-4.14.73/drivers/net/usb/qmi_wwan.c
> +linux-4.14.73-rx_size_fix/drivers/net/usb/qmi_wwan.c
> +--- linux-4.14.73/drivers/net/usb/qmi_wwan.c 2018-09-29
> +11:06:07.000000000 +0100
> ++++ linux-4.14.73-rx_size_fix/drivers/net/usb/qmi_wwan.c 2020-01-31
> ++++ 18:05:07.709008785 +0000
> +@@ -740,6 +740,14 @@
> + }
> + dev->net->netdev_ops = &qmi_wwan_netdev_ops; sysfs_groups[0] =
> + dev->net->&qmi_wwan_sysfs_attr_group;
> ++
> ++ /* LTE Networks don't always respect their own MTU on receive side;
> ++ * e.g. AT&T pushes 1430 MTU but still allows 1500 byte packets from
> ++ * far-end network. Make receive buffer large enough to accommodate
> ++ * them, and add four bytes so MTU does not equal MRU on network
> ++ * with 1500 MTU otherwise usbnet_change_mtu() will change both.
>
> ++     * This is a sufficient max receive buffer as over 1500 MTU,
>
> ++     * USB driver issues are not seen.
>
> ++ */
> ++ dev->rx_urb_size = ETH_DATA_LEN + 4;
> + err:
> + return status;
> + }
>

could it make sense to have rx_urb_size configurable from userspace (e.g. sysfs file)?

This is useful also when changing downlink maximum packet size with QMI_WDA_SET_DATA_FORMAT and is required for getting high-cat modems maximum throughput.

Regards,
Daniele

>
>
> Regards,
>
> --
> Paul
>
> On Mon, 27 Jan 2020 at 18:04, Bjørn Mork <bjorn at mork.no> wrote:
>>
>> I cannot exlain the issues you are having, but I can try to explain
>> the reasons behind the RX_KILL stuff since I happened to write that..
>>
>> One of my modems would sometimes end up in a state where it flooded
>> the host with 0 length frames.  This happened at a high enough rate
>> to bring the host to a complete halt, since usbnet was desperately
>> trying to allocate new max sized skbs at the same rate...  So I
>> figured it was better to let the host have a break now and then when
>> the incoming error rate was above some arbitrary threshold.  This
>> allowed the host to e.g reset the modem to bring it out of the faulty state.
>>
>> The underlying issue was of course a modem firmware bug, and there
>> wasn't much we could do about that. I wonder if that might be the
>> case for you to?  What does the frames triggering these rx errors look like?
>> Myabe you could snoop on the USB bus to see if there is something
>> obviously wrong?
>>
>>
>>
>> Bjørn
>>
>> Paul Gildea <gildeap at tcd.ie> writes:
>>
>> > Hi.
>> >
>> > I was mistaken about this ever happening with a 1500 MTU, it does not.
>> > Looking into this further it seems to be an issue with the usbnet
>> > driver that qmi_wwan is leveraging.
>> > Turning on debugfs and enabling debugging for usbnet.c shows the errors:
>> >
>> > mount -t debugfs none /sys/kernel/debug echo -n 'file usbnet.c +p'
>> > > /sys/kernel/debug/dynamic_debug/control
>> >
>> > Looking at the log you can see that the rxqlen counter rapidly
>> > increases until it is full and a throttle occurs, this throttle
>> > error then repeats forever (quite rapidly) until the modem is
>> > reset, even if the ping is
>> > stopped:
>> >
>> > Sep 23 10:16:16 T6 kernel: [  715.615456] qmi_wwan 2-2:1.8 wwan0:
>> > rx throttle -71 Sep 23 10:16:16 T6 kernel: [  715.743152] qmi_wwan
>> > 2-2:1.8 wwan0: rxqlen
>> > 281 --> 291
>> > Sep 23 10:16:16 T6 kernel: [  715.743307] qmi_wwan 2-2:1.8 wwan0:
>> > rxqlen
>> > 291 --> 301
>> > Sep 23 10:16:16 T6 kernel: [  715.743320] qmi_wwan 2-2:1.8 wwan0:
>> > rxqlen
>> > 301 --> 311
>> > Sep 23 10:16:16 T6 kernel: [  715.743331] qmi_wwan 2-2:1.8 wwan0:
>> > rxqlen
>> > 311 --> 318
>> > Sep 23 10:16:16 T6 kernel: [  715.744947] qmi_wwan 2-2:1.8 wwan0:
>> > rx throttle -71 Sep 23 10:16:16 T6 kernel: [  715.871077] qmi_wwan
>> > 2-2:1.8 wwan0: rxqlen
>> > 281 --> 291
>> > Sep 23 10:16:16 T6 kernel: [  715.871115] qmi_wwan 2-2:1.8 wwan0:
>> > rxqlen
>> > 291 --> 301
>> > Sep 23 10:16:16 T6 kernel: [  715.871128] qmi_wwan 2-2:1.8 wwan0:
>> > rxqlen
>> > 301 --> 311
>> > Sep 23 10:16:16 T6 kernel: [  715.871138] qmi_wwan 2-2:1.8 wwan0:
>> > rxqlen
>> > 311 --> 318
>> > Sep 23 10:16:16 T6 kernel: [  715.874448] qmi_wwan 2-2:1.8 wwan0:
>> > rx throttle -71 Sep 23 10:16:17 T6 kernel: [  716.007133] qmi_wwan
>> > 2-2:1.8 wwan0: rxqlen
>> > 280 --> 290
>> > Sep 23 10:16:17 T6 kernel: [  716.007172] qmi_wwan 2-2:1.8 wwan0:
>> > rxqlen
>> > 290 --> 300
>> > Sep 23 10:16:17 T6 kernel: [  716.007185] qmi_wwan 2-2:1.8 wwan0:
>> > rxqlen
>> > 300 --> 310
>> > Sep 23 10:16:17 T6 kernel: [  716.007197] qmi_wwan 2-2:1.8 wwan0:
>> > rxqlen
>> > 310 --> 318
>> >
>> >
>> > This happens on USB3 and not USB2 (unless I pick a small MTU like 500).
>> > With USB2 I generally don't get any of the above errors and can
>> > then go back to pinging with regular sized packets no problem.
>> > The above error, 71 refers to:
>> >
>> > #define EPROTO 71 /* Protocol error */
>> >
>> > The rxqlen error is coming from usbnet_bh () which is described as
>> > *“tasklet (work deferred from completions, in_irq) or timer”* in the source code.
>> > This is too large to paste, but looks like the cleanup function to
>> > release the Socket Buffers (skb) and USB Request Blocks (urb).
>> > Interesting snippet of code from this function, after releasing the buffers:
>> >
>> >       /* restart RX again after disabling due to high error rate */
>> >       clear_bit(EVENT_RX_KILL, &dev->flags);
>> >
>> > Buffers are coming in with -EPROTO at a high rate so the usbnet
>> > driver disables RX. The buffers get released, which explains the
>> > packet loss, and Rx is restarted. Obviously whatever issue was
>> > underlying is still present and so Rx is disabled again and the dance restarts.
>> >
>> > There are a few scenarios it can occur, but the two easiest to
>> > reproduce
>> > are:
>> >
>> >    1. Send pings larger than the non-1500 MTU, and they occur after
>> >    ~300seconds.
>> >    2. Set a non 1500 MTU.  reset the modem without sending anything, and
>> >    they occur immediately afterwards.
>> >
>> >
>> > Still trying to figure it out,
>> >
>> > --
>> > Paul
>> >
>> > On Fri, 24 Jan 2020 at 17:26, Paul Gildea <gildeap at tcd.ie> wrote:
>> >
>> >> Hi guys, have done some more testing with this. Using ubuntu 16.04
>> >> I installed libqmi (1.16) to test that also, on a fresh system (a laptop).
>> >> Set the MTU of the private network to 1430 and pinged with large
>> >> packets
>> >> (2000) and after a couple of hundred pings this behaviour repeated itself.
>> >> When I thought that the system was fine yesterday, it turns out it
>> >> just took more pings for it to fall over. At 1500 MTU and 300
>> >> pings the same thing occurred with the private network.
>> >> Massive amounts of input errors, modem becomes unusable until reset.
>> >>
>> >> I can see pings arriving at the back end, it's when they are
>> >> returning to the modem that there appears to be an issue.
>> >> This is happening with SW and Telit with varying modems and
>> >> multiple networks.
>> >> It seems to me to be a qmi_wwan driver issue, what do you think?
>> >>
>> >> Regards
>> >>
>> >> --
>> >> Paul
>> >>
>> >> On Thu, 23 Jan 2020 at 17:15, Paul Gildea <gildeap at tcd.ie> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> We are seeing MTU issues with the ATT network and were just
>> >>> wondering if you had heard of anything like this happening
>> >>> before? ATT push an MTU of 1430, we apply that to our linux interface and everything works fine.
>> >>> However when we try and ping with a large packet and the pings
>> >>> fail instead of fragmenting correctly, we also see input errors on the linux interface.
>> >>>
>> >>> In some scenarios the input errors never stop (getting thousands
>> >>> of them every second) after we stop pinging and the modem needs
>> >>> to be rebooted to pass any traffic again. Changed MTU values to a
>> >>> lot of different things and this always occurs.
>> >>>
>> >>> On a private network here (MTU 1404) and Vodafone (MTU 1500)
>> >>> doing the same thing with the network pushed MTU values causes no
>> >>> issues and everything works fine. I tested moving away from 1500
>> >>> on vodafone to lower values and once the issues reoccurred, same as with ATT.
>> >>>
>> >>> Libqmi 1.24 and I have seen this behaviour with all modems tested so far:
>> >>> MC7455, EM7565, EM7511 and Telit LM960A18.
>> >>>
>> >>> Regards,
>> >>>
>> >>> --
>> >>> Paul
>> >>>
>> >>
>> > _______________________________________________
>> > libqmi-devel mailing list
>> > libqmi-devel at lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/libqmi-devel
>
> _______________________________________________
> libqmi-devel mailing list
> libqmi-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/libqmi-devel
_______________________________________________
libqmi-devel mailing list
libqmi-devel at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libqmi-devel


More information about the libqmi-devel mailing list