Infinite connection loop

Sven Schwermer sven at svenschwermer.de
Wed Oct 24 14:46:24 UTC 2018


It looks like I celebrated prematurely. As soon as I disabled the debug-level logging for NetworkManager and ModemManager, the issue started to appear again. This tells me that there is some sort of race condition. I’ll look into this more but I’d appreciate if one of the maintainers could explain the desired/expected series of events when the peer hangs up and pppd terminates.

Thank you!

Sven

> On 23 Oct 2018, at 17:18, Sven Schwermer <sven at svenschwermer.de> wrote:
> 
> Hi again,
> 
> I think, I have found a fix for this. With the following two patches applied to NetworkManager 1.12.2, the issue seems to be gone (final confirmation pending):
> 
> https://gitlab.freedesktop.org/NetworkManager/NetworkManager/commit/1913a4d2590d42413b04c0cf86321286bb2e2dbd <https://gitlab.freedesktop.org/NetworkManager/NetworkManager/commit/1913a4d2590d42413b04c0cf86321286bb2e2dbd>
> https://gitlab.freedesktop.org/NetworkManager/NetworkManager/commit/3392c439582b4ba5da19423fbaeed5843aeae9a9 <https://gitlab.freedesktop.org/NetworkManager/NetworkManager/commit/3392c439582b4ba5da19423fbaeed5843aeae9a9>
> 
> I suspect, the latter made the difference because pppd basically hangs when the bus method call doesn’t return.
> 
> Sven
> 
>> On 23 Oct 2018, at 11:37, Sven Schwermer <sven at svenschwermer.de <mailto:sven at svenschwermer.de>> wrote:
>> 
>> Hi Aleksander, Matthew,
>> 
>> I looked into this and the issue that Aleksander referred to in the other thread (https://bugzilla.gnome.org/show_bug.cgi?id=734347 <https://bugzilla.gnome.org/show_bug.cgi?id=734347>). In the issue, Aleksander mentions that MM needs to be informed about the broken connection after pppd has terminated so that the CLOCAL setting is restored when MM interacts with the TTY again. However, if I interpret my logs correctly, MM detects itself that the connection is broken based on the modem’s response to AT+CGACT? (+CGACT: 1,0). This then triggers the notification signal MM_PORT_CONNECTED (with connected=false) which makes port_connected try to acquire the serial port lock which then fails.
>> 
>> Sven
>> 
>>> On 22 Oct 2018, at 16:52, Matthew Starr <mstarr at hedonline.com <mailto:mstarr at hedonline.com>> wrote:
>>> 
>>> Aleksander,
>>> 
>>> This is the exact same issue I reported a couple months ago:
>>> https://lists.freedesktop.org/archives/modemmanager-devel/2018-August/006593.html <https://lists.freedesktop.org/archives/modemmanager-devel/2018-August/006593.html>
>>> 
>>> I am currently working around it by adding a NetworkManager dispatcher script that on ppp0 going down, I restart ModemManager.
>>> 
>>> I never did find a solution to the issue even after trying some modifications to the serial force close code (src/mm-port-serial.c) in ModemManager.  It looks like the force_closed boolean gets set and is never reset, which causes the serial port to never retry communications over the serial port again.
>>> 
>>> Regards,
>>> Matthew Starr
>>> 
>>>> Date: Mon, 22 Oct 2018 11:39:37 +0200
>>>> From: Sven Schwermer <sven at svenschwermer.de <mailto:sven at svenschwermer.de>>
>>>> To: Aleksander Morgado <aleksander at aleksander.es <mailto:aleksander at aleksander.es>>
>>>> Cc: modemmanager-devel at lists.freedesktop.org <mailto:modemmanager-devel at lists.freedesktop.org>
>>>> Subject: Re: Infinite connection loop
>>>> Message-ID: <41689DBE-1755-4F85-BFA7-
>>>> 97CF6E588F41 at svenschwermer.de <mailto:97CF6E588F41 at svenschwermer.de>>
>>>> Content-Type: text/plain; charset="utf-8"
>>>> 
>>>> I looked a little closer into this now.
>>>> 
>>>> Here’s what I think is happening:
>>>> 1. My cell operator terminates the LCP after some time of inactivity. This is
>>>> confirmed by my operator.
>>>> 2. pppd receives the termination request, acknowledges it and the modem
>>>> hangs up the connection.
>>>> 3. ModemManager fails to re-acquire serial port lock because of the hangup.
>>>> 4. ModemManager sets the connection state to ‘registered’.
>>>> 5. ModemManager force-closes the serial port.
>>>> 6. NetworkManager registers the lost connection and tries to re-establish it.
>>>> 7. The ModemManager fails to re-open serial port because it had been force-
>>>> closed.
>>>> 
>>>> pppd doesn’t appear to be re-launched. How is this all supposed to work?
>>>> I’m sure the software stack should be able to handle a terminated
>>>> connection and re-dial it?
>>>> 
>>>> Please find attached the debug-level log. The LCP termination happens at Oct
>>>> 21 14:53:51.
>>>> 
>>>> Best regards,
>>>> Sven
>>>> 
>>>> -------------- next part --------------
>>>> A non-text attachment was scrubbed...
>>>> Name: excerpt.long.gz
>>>> Type: application/x-gzip
>>>> Size: 53662 bytes
>>>> Desc: not available
>>>> URL: <https://lists.freedesktop.org/archives/modemmanager- <https://lists.freedesktop.org/archives/modemmanager->
>>>> devel/attachments/20181022/1e9e348d/attachment.bin>
>>>> -------------- next part --------------
>>>> 
>>>> 
>>>>> On 20 Oct 2018, at 16:26, Sven Schwermer <sven at svenschwermer.de <mailto:sven at svenschwermer.de>>
>>>> wrote:
>>>>> 
>>>>> Hi again,
>>>>> 
>>>>> I now managed to capture the event in the journal. I limited the auto
>>>> connect retried to 4 in order to keep the log size manageable. I didn’t have
>>>> debug-level logging active for NetworkManager at the time. I’ll enable that
>>>> now and try to reproduce. Please find the log attached. The error seems to
>>>> be happening at Oct 20 07:44:26.
>>>>> 
>>>>> Best regards,
>>>>> Sven
>>>>> 
>>>>> <ed6e56cf2ac5b6bdb2e3ca97276ebd6d.journal.log>
>>>>> 
>>>>>> On 20 Oct 2018, at 13:47, Aleksander Morgado
>>>> <aleksander at aleksander.es <mailto:aleksander at aleksander.es>> wrote:
>>>>>> 
>>>>>> Hey,
>>>>>> 
>>>>>>> I am running an embedded system with a u-blox SARA-U270 modem and
>>>> NetworkManager 1.12.2 / ModemManager 1.8.0. For some reason, the
>>>> modem connection seems to break over time. I am not quite sure why that’s
>>>> the case, but when I look into the journal, I see NetworkManager and
>>>> ModemManager spewing out log messages at a very high rate. This actually
>>>> causes significant system load (mostly by dbus-daemon). Here’s an excerpt
>>>> of the messages:
>>>>>>> 
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started...
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>> (4/8): Wait to get fully enabled
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>> (5/8): Register
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>> (6/8): Bearer
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started...
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>> (4/8): Wait to get fully enabled
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>> (5/8): Register
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>> (6/8): Bearer
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn>
>>>> [1539780869.7776] modem-broadband[ttyACM0]: failed to connect modem:
>>>> Could not open serial device ttyACM0: it has been forced close
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>> [1539780869.7778] device (ttyACM0): state change: prepare -> failed (reason
>>>> 'unknown', sys-iface-state: 'managed')
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>> [1539780869.7807] manager: NetworkManager state is now
>>>> CONNECTED_LOCAL
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn>
>>>> [1539780869.7840] device (ttyACM0): Activation: failed for connection
>>>> 'gprsconnection'
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>> [1539780869.7860] device (ttyACM0): state change: failed -> disconnected
>>>> (reason 'none', sys-iface-state: 'managed')
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>> [1539780869.7923] policy: auto-activating connection 'gprsconnection'
>>>> (14e02637-0ffb-3a28-806f-ad60904afe7f)
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>> [1539780869.8005] device (ttyACM0): Activation: starting connection
>>>> 'gprsconnection' (14e02637-0ffb-3a28-806f-ad60904afe7f)
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>> [1539780869.8015] device (ttyACM0): state change: disconnected -> prepare
>>>> (reason 'none', sys-iface-state: 'managed')
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>> [1539780869.8043] manager: NetworkManager state is now CONNECTING
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started...
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>> (4/8): Wait to get fully enabled
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>> (5/8): Register
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>> (6/8): Bearer
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started...
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>> (4/8): Wait to get fully enabled
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>> (5/8): Register
>>>>>>> Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
>>>> (6/8): Bearer
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn>
>>>> [1539780869.9321] modem-broadband[ttyACM0]: failed to connect modem:
>>>> Could not open serial device ttyACM0: it has been forced close
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>> [1539780869.9323] device (ttyACM0): state change: prepare -> failed (reason
>>>> 'unknown', sys-iface-state: 'managed')
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>> [1539780869.9350] manager: NetworkManager state is now
>>>> CONNECTED_LOCAL
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn>
>>>> [1539780869.9399] device (ttyACM0): Activation: failed for connection
>>>> 'gprsconnection'
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>> [1539780869.9430] device (ttyACM0): state change: failed -> disconnected
>>>> (reason 'none', sys-iface-state: 'managed')
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>> [1539780869.9490] policy: auto-activating connection 'gprsconnection'
>>>> (14e02637-0ffb-3a28-806f-ad60904afe7f)
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>> [1539780869.9569] device (ttyACM0): Activation: starting connection
>>>> 'gprsconnection' (14e02637-0ffb-3a28-806f-ad60904afe7f)
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>> [1539780869.9585] device (ttyACM0): state change: disconnected -> prepare
>>>> (reason 'none', sys-iface-state: 'managed')
>>>>>>> Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
>>>> [1539780869.9653] manager: NetworkManager state is now CONNECTING
>>>>>>> 
>>>>>>> It continues like that forever. My modem network connection config is
>>>> as follows:
>>>>>>> 
>>>>>>> [connection]
>>>>>>> id=gprsconnection
>>>>>>> type=gsm
>>>>>>> autoconnect=true
>>>>>>> autoconnect-retries=0
>>>>>>> [gsm]
>>>>>>> number=*99#
>>>>>>> [ipv4]
>>>>>>> dns-search=
>>>>>>> method=auto
>>>>>>> may-fail=false
>>>>>>> 
>>>>>>> I have been looking through the code and what I don’t understand is
>>>> why the ttyACM0 would ever be forcefully closed. What could be an
>>>> explanation for this?
>>>>>>> 
>>>>>>> Any insight is appreciated.
>>>>>>> 
>>>>>> 
>>>>>> Do you have a full log of when this issue happens? If you can
>>>>>> reproduce this more or less frequently, could you setup both
>>>>>> ModemManager and NetworkManager to dump debug logs and try to get
>>>> it
>>>>>> reproduced? This issue could be to a wrong synchronization between MM
>>>>>> and NM/pppd, in particular related to the CLOCAL settings in each case
>>>>>> (i.e. MM shouldn't get HUPs in the serial port unless for the cases
>>>>>> where the port is going away).
>>>>>> 
>>>>>> --
>>>>>> Aleksander
>>>>>> https://aleksander.es <https://aleksander.es/>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/modemmanager-devel/attachments/20181024/e69c04e5/attachment-0001.html>


More information about the ModemManager-devel mailing list