Improving input latency
Tiago Vignatti
vignatti at c3sl.ufpr.br
Tue Jul 29 18:50:22 PDT 2008
Hi,
I posted this summary on my blog but I'll send it here as well. The
feedback that I receive in mailing lists is usually better than in blog :)
Thank you,
Tiago
= Improving input latency =
GSoC summary #1 - July 29
The current implementation of X Window System relies in a signal scheme
to manage the input event coming from hardware devices. This scheme
frequently get blocked when lot of IO is occurring (for instance, when
the process is swapping in/out). Get blocked means for instance a
jumping cursor on the screen and in GUI is always desirable to
prioritize the system responsiveness for end users. The human/computer
interface should be smooth and this is the most user visible aspect of a
system.
Besides the need for improvement in system responsiveness, the current
design of the event stream has some oddities, probably due historical
reasons, such as the cursor update done in user-space or the huge path
that takes to draw the cursor instead just connect the mouse hardware
directly with the cursor position update in-kernel. Moreover there is no
fundamental reason to input drivers be dependent of DDX part of the X
server. Therefore a design of the input subsystem must be carefully
redone to improve such issues.
Our project try to solve all this problems. In summary the goal is: to
get a path from hardware input event to client delivery that cannot be
blocked by rendering or IO operations, meaning we always have very low
latency on input events. Moreover, a redesign of such event stream could
improve the overall X graphics stack, which must be considered as well.
So far three strategies were explored to achieve the goal:
1. put X input generation stage in a separate thread
2. put X input generation and processing stages others threads
3. shortcut the kernel input layer with drm to decrease the cursor
update latency
Basically 1. and 2. tries to solve the issue of blocking signals and 3.
would be a completely redesign in input infrastructure. Anyway, the 3.
strategy would impact in 1. and 2. but these could be implemented in
parallel with the third strategy. The following sections details each
strategy.
== strategy #1 ==
Strategy 1 does not uses a signal handler anymore to wake up the event
generation code. It simply poll for device's socket and giving that this
code is under a separate thread this is a win for the CPUs.
With the separate thread taking care only the input code, it was
expected that the cursor footprint always lived on resident memory when
the mouse stills in movement. Unfortunately this was not true. For some
reason it swaps back to disk. Maybe some scheduler adjusts would help
here. A memory lock scheme was tried to do lock the cursor footprint
always in physical memory without success.
This strategy is basically what we've been done is the first GSoC. This
is pretty much implemented. It would not require much trouble to push it
to X server from upstream. The code is here:
http://cgit.freedesktop.org/~vignatti/xserver/
== strategy #2 ==
This strategy can be thought as an improvement of #1. It can be
separated in two models of implementation:
Model one:
thread #1 deals with
- injection and processing of input events
thread #2 deals with
- requests from known clients
- new client that tries to connect
It would be very very nice to let both threads totally independents. But
we cannot. The event delivery depends on window structure and the first
thread must always wake up the second. Also, sometimes the processing of
events take a while and the injection of events stays stucked in this
model. So we came with this another:
Model two:
thread #1 deals with
- injection of input events from devices
thread #2 deals with
- processing of input events to clients
thread #3 deals with
- requests from known clients
- new client that tries to connect
With this model the first and the second thread become not so tied and
given that we're using non blocking fds to wake up each thread (through
a pipe), CPU "enjoys" the effect of threads. For instance, under heavy
drawing primitives only thread #3 would wake up.
We had a proof-of-concept of this last model and it workish
(occasionally seeing some segfaults probably due of some critical
regions we forgot to lock - now the only mutex that exists is inside the
server queue of events).
It's hard to imagine other threaded models mainly because the way X
deals with clients are very tied in every piece of the server and it
would require a lot of mutexes.
== strategy #3 ==
For sure this strategy is the most shocking one :) The idea is to
connect the mouse hardware directly to the cursor position update
function, all inside kernel. We'd then rewrite the event stream from the
pointer device to an absolute position. Transform the relative mouse
motion into an absolute screen position seems to be not that
complicated, but this strategy would involve acceleration and cursor
limits inside kernel as well (the current implementation of accel deals
with floats, so we would have to adapt it to live in kernel).
It is a _very_ _large_ amount of codification. It would require changes
to the X server, DDX driver and its corresponding kernel DRM drivers,
drm library and kernel input drivers. A mini-input driver *inside* drm
is also needed. We would add complexities of the connection between
input device and output device to the kernel (in my proof-of-concept
implementation evdev is dependent of drm. Yeah, really weird world).
Moreover, we would have to avoid somehow two differents sets of the
exact same code in different contexts in the case of sw cursors (think
MPX). It's a completely redesign. Things would have to go incrementally.
But why this strategy? Well, this would solve all the current issues
with input latency. For instance with the current design of the kernel
modesetting - which seems the future - the cursor is jumping a lot, much
more than with current implementation. Try to call a xrandr instance and
move the mouse with kernel modesetting. xrandr will do DDC communication
which will blocked X in the kernel. So with the handling and update of
the cursor inside the kernel all would work fine (and my
proof-of-concept already showed this).
Moreover, I believe the current implementation remained until now due
historical reasons. Ultrix systems placed the entire input subsystem in
the kernel. What is the problem to do this in Linux (and others) as well
(besides massive codification)?
and non-dri drivers? Should we forget them?
EOF
--
Tiago Vignatti
C3SL - Centro de Computação Científica e Software Livre
www.c3sl.ufpr.br
More information about the xorg
mailing list