Input thread on EVoC

Tue Jun 8 11:08:13 PDT 2010

Hi,

Most of the time I spent on this was trying to understand the problems X was
facing and in the end I didn't come up with any ready-to-fly implementation :/
Even though, I was quite happy cause I could at least identify more clearly
the issues we're needing improvement. I can mention some here:

1. general input performance.
Server's role is to render graphics and send to screen, deal with clients and 
control input devices. This is too much for a single loop of the events. So
the straightforward idea is to separate the input piece in another
process/thread, where the server could perform in parallel the other remaining
tasks. 

The code of event generation (i.e., coming from devices and creating X event)
is very decoupled already from the rest - the current fired by SIGIO handler.
This means that it shouldn't bring many critical regions and, therefore a
natural performance improvement could be achieved there. The last of
implementation is probably this one:
    http://lists.freedesktop.org/archives/xorg/2008-October/039107.html

2. cursor rendering.
The most apparent problem when delaying an input event happens with the
cursor. 

It's very visible to see cursors not smooth and jumping in the screen if the
"system is lagged". There are too main reasons - CPU and I/O bounded - for
that, which will depend if the cursor is rendered in sw or hw. But the cool
thing is that both would be improved by having an input thread.

3. input latency:
Pick the case of cursor drawing device. It basically follows this path: device
emits event -> kernel input drv -> X input drv -> X server -> X output drv ->
kernel drv -> screen updated. The latency is really huge and possibly we could
decrease a bit it.

One way was to overcome the cursor for instance, is to shortcut some code
inside the kernel. However, there's some other problem that we can face when
doing this, for instance where to put the pointer acceleration (should go
inside the kernel? but we don't have float type there, and what about
the accuracy in the velocity?) and input transformation. Anyway, I did some
work here as well:
    http://www.mail-archive.com/xorg@lists.freedesktop.org/msg03671.html

On Fri, Jun 04, 2010 at 05:28:51AM +0200, ext Fernando Carrijo wrote:
> 
> Nevertheless, I wonder how many of Daniel's threefold "really" refer to the
> inherent complexity of threading event delivery, and how many of them concern
> the obviously huge amout of mechanical work needed to acquire and release the
> aforementioned mutex by certain of those routines which encompass the server
> dispatch tables. Any idea?

it's not that straightforward given, as the guys said already, X event
dequeuing is very tied with clients and the server may spend considerable
amount of time doing the locking/unlocking dance only. 

So let's focus first in the input generation code first. And as I said already
to you in private, I do really care to test performance-wise this new approach
and stress the efficiency of it. This is something I haven't done at all and
may consume a lot of time from you if done properly. 

There's also other missing points from that implementation I originally done.
For instance it's hard to predict which process will get scheduled for the
CPU - precisely, if the input process will get scheduled at the right moment
to not. I came with one approach of locking ELF segments of the server on
memory, but maybe this is a cannon to kill a mosquito. We would have to check
this either.

> Deviating a little from the above: do you think that a multithreaded X server
> capable of servicing client requests concurrently is a realistic goal for the
> long run? In particular, do you foresee any possible devilish traps resulted by
> interactions between threaded event delivery and threaded request processing?

Hard to say. But definitely start to chop of parts and thread them is one way
to figure it out :)

Thanks,
             Tiago