MTU issues.

Paul Gildea gildeap at tcd.ie
Mon Jan 27 16:57:14 UTC 2020


Hi.

I was mistaken about this ever happening with a 1500 MTU, it does not.
Looking into this further it seems to be an issue with the usbnet driver
that qmi_wwan is leveraging.
Turning on debugfs and enabling debugging for usbnet.c shows the errors:

mount -t debugfs none /sys/kernel/debug
echo -n 'file usbnet.c +p' > /sys/kernel/debug/dynamic_debug/control

Looking at the log you can see that the rxqlen counter rapidly increases
until it is full and a throttle occurs, this throttle error then repeats
forever (quite rapidly) until the modem is reset, even if the ping is
stopped:

Sep 23 10:16:16 T6 kernel: [  715.615456] qmi_wwan 2-2:1.8 wwan0: rx
throttle -71
Sep 23 10:16:16 T6 kernel: [  715.743152] qmi_wwan 2-2:1.8 wwan0: rxqlen
281 --> 291
Sep 23 10:16:16 T6 kernel: [  715.743307] qmi_wwan 2-2:1.8 wwan0: rxqlen
291 --> 301
Sep 23 10:16:16 T6 kernel: [  715.743320] qmi_wwan 2-2:1.8 wwan0: rxqlen
301 --> 311
Sep 23 10:16:16 T6 kernel: [  715.743331] qmi_wwan 2-2:1.8 wwan0: rxqlen
311 --> 318
Sep 23 10:16:16 T6 kernel: [  715.744947] qmi_wwan 2-2:1.8 wwan0: rx
throttle -71
Sep 23 10:16:16 T6 kernel: [  715.871077] qmi_wwan 2-2:1.8 wwan0: rxqlen
281 --> 291
Sep 23 10:16:16 T6 kernel: [  715.871115] qmi_wwan 2-2:1.8 wwan0: rxqlen
291 --> 301
Sep 23 10:16:16 T6 kernel: [  715.871128] qmi_wwan 2-2:1.8 wwan0: rxqlen
301 --> 311
Sep 23 10:16:16 T6 kernel: [  715.871138] qmi_wwan 2-2:1.8 wwan0: rxqlen
311 --> 318
Sep 23 10:16:16 T6 kernel: [  715.874448] qmi_wwan 2-2:1.8 wwan0: rx
throttle -71
Sep 23 10:16:17 T6 kernel: [  716.007133] qmi_wwan 2-2:1.8 wwan0: rxqlen
280 --> 290
Sep 23 10:16:17 T6 kernel: [  716.007172] qmi_wwan 2-2:1.8 wwan0: rxqlen
290 --> 300
Sep 23 10:16:17 T6 kernel: [  716.007185] qmi_wwan 2-2:1.8 wwan0: rxqlen
300 --> 310
Sep 23 10:16:17 T6 kernel: [  716.007197] qmi_wwan 2-2:1.8 wwan0: rxqlen
310 --> 318


This happens on USB3 and not USB2 (unless I pick a small MTU like 500).
With USB2 I generally don't get any of the above errors and can then go
back to pinging with regular sized packets no problem.
The above error, 71 refers to:

#define EPROTO 71 /* Protocol error */

The rxqlen error is coming from usbnet_bh () which is described as *“tasklet
(work deferred from completions, in_irq) or timer”* in the source code.
This is too large to paste, but looks like the cleanup function to release
the Socket Buffers (skb) and USB Request Blocks (urb). Interesting snippet
of code from this function, after releasing the buffers:

	/* restart RX again after disabling due to high error rate */
	clear_bit(EVENT_RX_KILL, &dev->flags);

Buffers are coming in with -EPROTO at a high rate so the usbnet driver
disables RX. The buffers get released, which explains the packet loss, and
Rx is restarted. Obviously whatever issue was underlying is still present
and so Rx is disabled again and the dance restarts.

There are a few scenarios it can occur, but the two easiest to reproduce
are:

   1. Send pings larger than the non-1500 MTU, and they occur after
   ~300seconds.
   2. Set a non 1500 MTU.  reset the modem without sending anything, and
   they occur immediately afterwards.


Still trying to figure it out,

--
Paul

On Fri, 24 Jan 2020 at 17:26, Paul Gildea <gildeap at tcd.ie> wrote:

> Hi guys, have done some more testing with this. Using ubuntu 16.04 I
> installed libqmi (1.16) to test that also, on a fresh system (a laptop).
> Set the MTU of the private network to 1430 and pinged with large packets
> (2000) and after a couple of hundred pings this behaviour repeated itself.
> When I thought that the system was fine yesterday, it turns out it just
> took more pings for it to fall over. At 1500 MTU and 300 pings the same
> thing occurred with the private network.
> Massive amounts of input errors, modem becomes unusable until reset.
>
> I can see pings arriving at the back end, it's when they are returning to
> the modem that there appears to be an issue.
> This is happening with SW and Telit with varying modems and multiple
> networks.
> It seems to me to be a qmi_wwan driver issue, what do you think?
>
> Regards
>
> --
> Paul
>
> On Thu, 23 Jan 2020 at 17:15, Paul Gildea <gildeap at tcd.ie> wrote:
>
>> Hi,
>>
>> We are seeing MTU issues with the ATT network and were just wondering if
>> you had heard of anything like this happening before? ATT push an MTU of
>> 1430, we apply that to our linux interface and everything works fine.
>> However when we try and ping with a large packet and the pings fail instead
>> of fragmenting correctly, we also see input errors on the linux interface.
>>
>> In some scenarios the input errors never stop (getting thousands of them
>> every second) after we stop pinging and the modem needs to be rebooted to
>> pass any traffic again. Changed MTU values to a lot of different things and
>> this always occurs.
>>
>> On a private network here (MTU 1404) and Vodafone (MTU 1500) doing the
>> same thing with the network pushed MTU values causes no issues and
>> everything works fine. I tested moving away from 1500 on vodafone to lower
>> values and once the issues reoccurred, same as with ATT.
>>
>> Libqmi 1.24 and I have seen this behaviour with all modems tested so far:
>> MC7455, EM7565, EM7511 and Telit LM960A18.
>>
>> Regards,
>>
>> --
>> Paul
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libqmi-devel/attachments/20200127/7a899b4c/attachment.htm>


More information about the libqmi-devel mailing list