Disconnect on Data Transfer

John Marrett johnf at zioncluster.ca
Sun Apr 14 13:19:28 UTC 2019


I'm experiencing an issue with disconnect during downloads on the AT&T
network using Jasper managed IoT SIMs on a private APN. Our devices are
quite stable for long periods of time when running with normal operational
traffic for embedded devices. When we attempt to download a few megabytes
of data to a device it will sometimes stop receiving data from the mobile
network. Limiting the transfer rate reduces the chances of the issue
occurring. Disabling the modem (mmcli -m ? --disable) usually restores the
connection. We do not see this behaviour on other networks, including, we
believe, devices roaming on the AT&T network from other carriers. We are
working with AT&T to resolve the issue but so far they haven't been able to
identify the cause of the issue.

This morning I recreated the issue running qmicli 1.22.2 and mmcli 1.10.0
on top of a 3.14.26 kernel. Modems are Sierra Wireless MC7354 running
SWI9X15C_05.05.58.00 r27038 carmd-fwbuild1 2015/03/04 21:30:23. Initially
my connection was stable for about 100MB of transfer. I then cut power to
the modem to force a complete reconnection to the mobile network.
Immediately following this I was able to reproduce the issue on my first
10MB transfer attempt. This is representative of the intermittent behaviour
seen in the field. I know that the network serving these devices has some
form of redundant active active architecture and that devices will be
servced by one or the other when they connect to the network. I don't have
much more detail on the network architecture.

During each of these tests I took packet captures on the test client and an
Internet based server while downloading a file and simultaneously running a
ping. Based on this test I have a number of observations about the
behaviour. When the download stalls we stop receiving ping responses so the
issue isn't related to the specific TCP flow. The server continues to see
the ping requests from the client and send responses; the modem is properly
connected to the mobile network but either isn't receiving traffic or
successfully transmitting it to the OS/QMI stack. If I disable the modem it
immediately reconnects to the network and receives the same IP address;
this is different from most modem disables after issues as we ordinarily
receive a different IP. Radio network control captures (GTPv2) don't appear
to show any events during this type of reconnection, they do show bearer
tear down and build messages when a disable results in a new IP.

I don't believe that the modem is disconnected from the mobile network when
it experiences the issue, I think the problem may lie with communications
between the modem and the OS.

In this test the last packet we received was at 14:15:22.430650. Here are
some MM logs from around that time:

We see the mobile network go idle shortly after the disconnect:

Apr 14 14:15:59 hostname ModemManager[947]: [/dev/cdc-wdm0] received
generic indication (translated)... <<<<<< QMUX: <<<<<<   length  = 16
<<<<<<   flags   = 0x80 <<<<<<   service = "wds" <<<<<<   client  = 8
<<<<<< QMI: <<<<<<   flags       = "indication" <<<<<<   transaction = 0
<<<<<<   tlv_length  = 4 <<<<<<   message     = "Event Report" (0x0001)
<<<<<< TLV: <<<<<<   type       = "Dormancy Status" (0x18) <<<<<<
length     = 1 <<<<<<   value      = 01 <<<<<<   translated =
traffic-channel-dormant

We also see reports that seem to indicate we are connected to the network:

Apr 14 14:16:26 hostname ModemManager[947]: [/dev/cdc-wdm0] received
generic response (translated)... <<<<<< QMUX: <<<<<<   length  = 96
<<<<<<   flags   = 0x80 <<<<<<   service = "nas" <<<<<<   client  = 1
<<<<<< QMI: <<<<<<   flags       = "response" <<<<<<   transaction = 37
<<<<<<   tlv_length  = 84 <<<<<<   message     = "Get Serving System"
(0x0024) <<<<<< TLV: <<<<<<   type       = "Result" (0x02) <<<<<<
length     = 4 <<<<<<   value      = 00:00:00:00 <<<<<<   translated =
SUCCESS <<<<<< TLV: <<<<<<   type       = "MNC PCS Digit Include Status"
(0x27) <<<<<<   length     = 5 <<<<<<   value      = 36:01:9A:01:01
<<<<<<   translated = [ mcc = '310' mnc = '410' includes_pcs_digit = 'yes'
] <<<<<< TLV: <<<<<<   type       = "LTE TAC" (0x24) <<<<<<   length     =
2 <<<<<<   value      = 2A:71 <<<<<<   translated = 28970 <<<<<< TLV:
<<<<<<   type       = "Detailed Service Status" (0x21) <<<<<<   length
= 5 <<<<<<   value      = 02:03:00:01:00 <<<<<<   translated = [ status =
'available' capability = 'cs-

Data counters before and after the time of the disconnect show that "Rx
Bytes Ok" remains at the same value:

Apr 14 14:14:58 hostname ModemManager[947]: [/dev/cdc-wdm0] received
generic response (translated)... <<<<<< QMUX: <<<<<<   length  = 41
<<<<<<   flags   = 0x80 <<<<<<   service = "wds" <<<<<<   client  = 8
<<<<<< QMI: <<<<<<   flags       = "response" <<<<<<   transaction = 13
<<<<<<   tlv_length  = 29 <<<<<<   message     = "Get Packet Statistics"
(0x0024) <<<<<< TLV: <<<<<<   type       = "Result" (0x02) <<<<<<
length     = 4 <<<<<<   value      = 00:00:00:00 <<<<<<   translated =
SUCCESS <<<<<< TLV: <<<<<<   type       = "Rx Bytes Ok" (0x1a) <<<<<<
length     = 8 <<<<<<   value      = 8F:D5:52:00:00:00:00:00 <<<<<<
translated = 5428623 <<<<<< TLV: <<<<<<   type       = "Tx Bytes Ok" (0x19)
<<<<<<   length     = 8 <<<<<<   value      = CA:7B:01:00:00:00:00:00
<<<<<<   translated = 97226

Apr 14 14:15:28 hostname ModemManager[947]: [/dev/cdc-wdm0] received
generic response (translated)... <<<<<< QMUX: <<<<<<   length  = 41
<<<<<<   flags   = 0x80 <<<<<<   service = "wds" <<<<<<   client  = 8
<<<<<< QMI: <<<<<<   flags       = "response" <<<<<<   transaction = 14
<<<<<<   tlv_length  = 29 <<<<<<   message     = "Get Packet Statistics"
(0x0024) <<<<<< TLV: <<<<<<   type       = "Result" (0x02) <<<<<<
length     = 4 <<<<<<   value      = 00:00:00:00 <<<<<<   translated =
SUCCESS <<<<<< TLV: <<<<<<   type       = "Rx Bytes Ok" (0x1a) <<<<<<
length     = 8 <<<<<<   value      = 17:03:C7:00:00:00:00:00 <<<<<<
translated = 13042455 <<<<<< TLV: <<<<<<   type       = "Tx Bytes Ok"
(0x19) <<<<<<   length     = 8 <<<<<<   value      =
05:22:04:00:00:00:00:00 <<<<<<   translated = 270853

Apr 14 14:16:28 hostname ModemManager[947]: [/dev/cdc-wdm0] received
generic response (translated)... <<<<<< QMUX: <<<<<<   length  = 41
<<<<<<   flags   = 0x80 <<<<<<   service = "wds" <<<<<<   client  = 8
<<<<<< QMI: <<<<<<   flags       = "response" <<<<<<   transaction = 16
<<<<<<   tlv_length  = 29 <<<<<<   message     = "Get Packet Statistics"
(0x0024) <<<<<< TLV: <<<<<<   type       = "Result" (0x02) <<<<<<
length     = 4 <<<<<<   value      = 00:00:00:00 <<<<<<   translated =
SUCCESS <<<<<< TLV: <<<<<<   type       = "Rx Bytes Ok" (0x1a) <<<<<<
length     = 8 <<<<<<   value      = 17:03:C7:00:00:00:00:00 <<<<<<
translated = 13042455 <<<<<< TLV: <<<<<<   type       = "Tx Bytes Ok"
(0x19) <<<<<<   length     = 8 <<<<<<   value      =
8D:2A:04:00:00:00:00:00 <<<<<<   translated = 273037

I don't see anything else that appears interesting or relevant in the log
messages. I do have plenty of other messages including from the
reconnection and can look for anything particularly interesting. I'm ready
and willing to gather other info and perform other tests. I also can
request specific captures from the AT&T team supporting us.

-JohnF
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/modemmanager-devel/attachments/20190414/a2adb2fd/attachment.html>


More information about the ModemManager-devel mailing list