Remote desktop, keyboard auto-repeat, and network jitter

Tue Jan 7 00:50:13 UTC 2025

(Note: to try to avoid overloading the word "client", I refer to the
wayland client as "the application" and explicitly say "the remote
desktop client" to refer to the machine the user is connecting from.)

They Wayland protocol currently delegates handling of keyboard
auto-repeat to the application:

 * The compositor informs the application of the user's current repeat
settings via wl_keyboard::repeat_info.
 * The compositor sends one key-pressed event when the key is first
pressed, and a key-released event when it is ultimately released.
 * The application is responsible for applying the settings from
repeat_info to generate repeated characters for the held key, if
appropriate.

This works great locally, and gives the application flexibility to
handle a held key however makes sense for the context. Unfortunately,
it can run into trouble in the context of a remote desktop session.

When connecting to a remote machine over the internet, it's not
uncommon to experience a network hiccup of a second or more,
especially if connecting via a less-reliable medium (cellular, weak
wifi, satellite, et cetera). This can result in a key up event being
delayed with respect to its associated key down event, leading to
undesired and annoying (and occasionally dangerous) key repeats.

Currently, the only solution is to turn off key repeat. This is safer,
but obviously has a usability impact. The remote desktop tool might
choose to work around the usability impact by synthesizing a key up +
key down pair of events every time the key is auto-repeated on the
remote desktop client, but this has the issue of making it impossible
to actually hold down an auto-repeating key.

Ideally, it would be possible to guarantee that a key will be repeated
only once the user has actually held down the key a sufficient length
of time on the remote desktop client, not subject to network jitter.
Leaving aside remote desktop tool <-> compositor communication for the
moment (likely a topic for a different list), I can think of a couple
of different approaches to adjusting the compositor -> application
protocol to enable this.

Idea 1: Signal the application to start repeating the
most-recently-pressed key, if applicable.

Repeat would initially be disabled. The remote desktop tool would
inform the compositor when the user had been holding a key down for
longer than the delay threshold, as measured on the remote desktop
client (removing any network delay issues), at which point the
compositor would inform the application that it should commence
repeating.

This might be achieved by specifying that a new repeat_info event
should be applied immediately to the most recently-pressed,
currently-held key, if any (though this is not how toolkits currently
handle it), or by specifying a new event type.

This approach will at least ensure that the user intended to start
repeating, but could still potentially result in more repeats than
intended. E.g., if the video stream freezes due to a network hiccup,
the user might immediately release the key to try to avoid too many
repetitions, but the application would keep generating repeats until
the key up event was received.

Idea 2: Send explicit individual repeat events to the application.

Both the delay and repeat rate would be applied on the client machine,
ensuring that the number of repeats is commensurate with the actual
length of time the user held down the key, regardless of network
conditions. The remote desktop tool would generate a key-down event,
some number of key-repeat events, and a key-up event.

Via the Wayland protocol, the application would receive a repeat_info
event signaling no auto repeat (rate of zero), or perhaps a special
value signalling that repeats will be provided by the compositor. It
would then receive a key-pressed event, some number of key events with
a new "repeat" key_state, followed by a key-released event. The
application would be free to process or ignore the repeat events
depending on the current context.

I would lean toward this approach, barring some technical reason it
wouldn't work well in practice.

Would either of these approaches be tenable? Obviously, they both
require updates to toolkits, but I don't think that's avoidable. Older
applications would fall back to having repeat disabled when using a
remote desktop tool, which might be annoying until they are updated,
but would at least be safe. Is there any other solution I'm
overlooking?

Thanks!
Erik