[Networkmanager] Preventing unnecessary and unwanted WiFi roaming

Bjørn Mork bjorn at mork.no
Thu Nov 2 14:51:48 UTC 2023


Not exactly sure where to address this. Both wpa_supplicant and
NetworkManager will probably need changes if there is something to be
fixed.  Which I'm not 100% sure there is....

Let me describe what I see, and you can decide if there is a problem or
not.

My laptop will often roam back and forth between the two APs it can see
in my home, without any movement or other obvious reason for a change.
The signal from both APs are pretty good in most of the house.  But the
important factor is that they are almost identical in the spot where I
spend most time working.

This is a typical example of what happens:

Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0: Scan results matching the currently selected network
Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0: 0: 80:2a:a8:d2:3f:15 freq=5500 level=-63 snr=29 est_throughput=280821
Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0: 1: e6:38:83:e5:85:d5 freq=5180 level=-67 snr=25 est_throughput=248651
Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0: Selecting BSS from priority group 0
Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0: 0: 80:2a:a8:d2:3f:15 ssid='Kjellerbod' wpa_ie_len=0 rsn_ie_len=20 caps=0x111 level=-63 freq=5500
Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0:    selected based on RSN IE
Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0:    selected BSS 80:2a:a8:d2:3f:15 ssid='Kjellerbod'
Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0: Considering within-ESS reassociation
Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0: Current BSS: e6:38:83:e5:85:d5 freq=5180 level=-67 snr=25 est_throughput=248651
Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0: Selected BSS: 80:2a:a8:d2:3f:15 freq=5500 level=-63 snr=29 est_throughput=280821
Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0: Using signal poll values for the current BSS: level=-64 snr=28 est_throughput=274981
Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0: Allow reassociation - selected BSS has better estimated throughput
Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0: Considering connect request: reassociate: 0  selected: 80:2a:a8:d2:3f:15  bssid: e6:38:83:e5:85:d5  pending: 00:00:00:00:00:00  wpa_state: COMPLETED  ss>
Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0: Request association with 80:2a:a8:d2:3f:15
Nov 02 14:32:21 miraculix wpa_supplicant[704]: wlan0: Re-association to the same ESS


The scan results shows that the two APs are pretty equal. And
wpa_supplicant estimates a reasonable throughput that's more than good
enough for both.  The throughput estimate is basically a table lookup
based on the SNR.  There is only a marginal SNR difference of 4 dB, so
we end up with a marginal throughput difference of ~30 Mbps.

This is obviously not significant compared to 250 Mbps, reflecting an
SNR difference well within the expected variance over time.

So why does wpa_supplicant decide to switch APs?  Turns out it has a
hard coded limit 5 Mbps throughput limit:

https://w1.fi/cgit/hostap/tree/wpa_supplicant/events.c#n2183

Which IMHO is a little too trigger-happy, without considering other
factors like current throuhput or historical handovers.

The real problem here is of course that the 4 dB SNR difference is
completely arbitrary. The next measurement might show the exact opposite
result, without any changes to client or APs.  The result is ping-pong
roaming. This is obviously unwanted, since each handover takes some time
and comes with a risk of failure.  The overall user experience is
affected.  And it would have been signficantly better if we could just
say that any estimated throughput over 200 Mbps (or whatver) is good
enough.

This is where NetworkManager could (and should?) take charge.  Currently
you set up static scanning policies based on mostly a simple AP count:

https://github.com/NetworkManager/NetworkManager/blob/main/src/core/supplicant/nm-supplicant-config.c#L595C13-L595C13

This sets up rather aggressive scanning if there are more than one AP
available, using the wpa_supplicant bgscan policy "simple:30:-65:300".

Doing this when the signal is OK makes the problem worse.  It would be
much better if the regular scan intervals were longer and the aggressive
mode was enabled only when the signal was getting "low".  This must of
course happen well before it gets "bad". But I believe there is plenty
of room for improvement here.

One could also imagine more advanced policies, considering signal
trends.  But setting a limit where the signal is "good enough" is a good
start.

Someone (right...) should probably look at the wpa_supplicant
interaction too. Shouldn't really NetworkManage be in charge of those
roaming decisions after all?  Why is that a hard coded policy deep
inside wpa_supplicant?

Well, enough ranting for today I guess :-)



Bjørn


More information about the Networkmanager mailing list