Kernel scheduling algorithm and X.Org performance

Wed Aug 31 10:22:48 PDT 2005

I believe this is primarily a result how the XAA drivers we currently
use are implemented.

They do very stupid things like busy waiting on hardware in user space
(using cycles gratuitously), and never releasing the CPU for other use.

The problem with this, beyond burning lots of cycles, is that it
completely confuses any sane OS scheduler into thinking this is a
compute bound process, when, if the drivers were kind enough to let the
CPU do something else until the display hardware is free, would think
that the X server is a highly interactive process, and give it the
appropriate scheduling boost.  And on Linux 2.4 systems shipped by many
commercial Linux vendors, this was further worsened by the sorts of
scheduler modifications many of those vendor's shipped (Linux 2.6 does
much better).

Shall we say, that on previous UNIX systems, where we had kernel driver
support to enable the user mode drivers to give the CPU back,
interactive response was *much* better, despite being on much slower
hardware than today's hardware.  Many of those systems also had more
sophisticated schedulers; Linux 2.6 is the first Linux system to have a
decent process scheduler (IMHO).
				- Jim

On Wed, 2005-08-31 at 16:47 +0400, Dmitry Shatrov wrote:
> Good day, xorg list. If you think this worths another list please
> redirect me.
> 
> I would like to talk about X-based desktop responsiveness and how 
> close
> it turns out to be related to the kernel's process scheduling 
> algorithm.
> 
> Recently I had some time to play with my friend's Windows box. And
> somehow one thing that I never noticed amazed me. Windows desktop 
> looks
> perfectly responsive when I drag windows on top of each other. You do
> not notice that all the applications whose windows were obscured by a
> window you drag wake up and re-parent themselves. It looks just like 
> the
> pictures were already there. Compare this to the common X-based 
> desktop
> behavior: pick up any window, drag it over your desktop. Soon you'll
> notice that you're able to erase the contents of any window so that it
> looks filled with its background color, the desktop icons disappear,
> things flicker here and there. The very first thing I hear from my
> friends when I first install Linux on their computers (happens from 
> time
> to time) is that after they drag a window and see it working as an
> eraser they say: "Woo, it's slow". What can I answer then? Tell them
> that this thing is network-transparent? I know it is, but they don't 
> get
> it and tell their friends that "that Linux thing is slow". Sad.
> 
> Still I realize that each component of the desktop in front of me is
> carefully designed by its developers and in theory should work pretty
> well. Pretending to be a hacker-minded person I tried to understand 
> why
> doesn't it work well in the real world.
> 
> First, how is the Windows (Microsoft Windows) rendering scheme 
> different
> from what we have in X? Let's assume we are on a single physical
> machine, no networking. Then it turns out that there's not so many
> differences.
> 
> Actually, the only major point that I can mention is that Windows
> graphics subsystem is 'in-kernel'. This only means that it runs in
> kernel space, because it is evident that it is separated enough from 
> the
> rest of the NT kernel. Graphics system is completely event-driven, so 
> it
> really doesn't matter where to place its code. In-kernel solution 
> means
> that:
> 
>    1. There is one less context switch (i.e one) to transfer a request
> from a client to the graphics system.
>    2. There is one less context switch (i.e. zero) to transfer an 
> input
> event from the kernel to the graphics system.
> 
> Guess we all know general pros. and cons. of in-kernel vs user-space
> graphics so I'll not reemerge them here. It is only relevant that
> in-kernel approach is faster because of 1 and 2.
> 
> Do these additional context switches matter? My point is that the 
> answer
> is that no, they do not. X is able to be just as fast as Windows,
> especially if properly programmed.
> 
> But let's look from the other side. If a context switch is to happen,
> then _when_ should it happen? This question does not exist in MS 
> Windows
> which is equivalent to that it is solved the perfect way. What about 
> X?
> 
> Three players come up here.
> 
> The first player is X server itself. It is a user process that it is a
> subject to schedule. The second player is the client. The third is a
> window manager, which is also a separate process. The whole team of
> processes is being driven by the kernel.
> 
> The first problem here is that at this point the inter-operation of
> simple parts becomes really complicated. The second problem is that 
> the
> kernel treats the players as functionally equivalent entities - user
> processes, and applies a general-purpose scheduling algorithm which is
> itself quite complicated, gives priority bonuses to processes that are
> "interactive enough" etc. Linux 2.6's O(1) scheduler is not so 
> difficult
> to understand (I'd recommend Robert Love's explanation in his "Linux
> Kernel Development", 2nd ed. - it's clean and compact), but it is 
> really
> difficult to track in the case of X + WM + client interaction.
> 
> My guess is that to get the Windows-like picture of operation on X one
> should probably do the following:
> 1) renice Xorg process to -6
> 2) renice window manager's process to 1 (it's metacity in my case)
> 
> Just try it - and - voila. The dragging window is not an eraser any
> more. It really feels like on Windows. Lets look what happens in this
> case:
> 
> 0. Our system is currently idle. There's nothing running in the
> background, we are dragging a window and there is a window of another
> application in the background (e.g. nautilus file manager's window).
> 
> 1. We press a mouse button on a window manager's title. The kernel
> delivers some data to X through a character device (say, /dev/mosue).
> X server wakes up immediately, because it is a highest-priority 
> process.
> It receives SIGIO, recognizes a mouse event and queues it, then it 
> wakes
> up on Select() and transfers the event to the window manager.
> 
> At this point we have the whole system working only for the graphics
> system, nothing preempts us: we get a mouse interrupt, immediately
> handle it in a mouse interrupt handler, then immediately follows a
> softirq of the mouse driver that pushes data through /dev/mouse, then
> immediately schedule() is called and X server wakes up, and it pushes
> data to WM. I bet this is perfectly fast.
> 
> 2. X has nothing to do now, so WM wakes up. It reads data and sees 
> that
> a window should be moved (lets assume that we're already dragging a
> window). It performs some geometry calculations and issues certain
> commands to X, flushes them out and goes to sleep. Everything is ok at
> this point too.
> 
> 3. X wakes up and reads WM's commands. It says to move a window, so X
> actually moves it, but a part of the window at the background is to be
> repainted. X issues a repaint command to that client. And also it is
> very likely that part of WM's decorations is to be redrawn as well 
> (this
> point confuses me - when does WM actually paint its decorations? 
> correct
> me please), so it also gets some events to dispatch. Done with that, X
> goes to sleep.
> 
> Notice that here we are likely to be fast enough not to receive new
> mouse events, because we did not do actual drawing - only some
> calculation and three context switches. BUT if a new mouse event 
> comes,
> X will issue commands to the window manager again... And if WM is then
> scheduled before the client (whose window is at the background) it 
> will
> again issue commands to X server and X will wake up because it is a
> high-priority process (if X is at the same priority then it is not 
> that
> good too: we see that every day), and this way, especially if repeated
> several times, the client misses its time to repaint its window and 
> its
> content is erased. This is what users call "slow". This is why we set
> WM's priority to 1 (the least of the three): this way the client will
> wake up first.
> 
> 4. Now, the client wakes up. It repaints itself, but this is likely to
> take long time (modern desktop programs are quite complicated), and if
> somewhere at this point we get a mouse event, the client will be
> preempted by the X server, and X will transfer an event to WM. But WM 
> is
> a least priority and thus our client will be scheduled back and what 
> it
> does will still be adequate: WM didn't make any geometry 
> recalculations
> yet. If WM is a higher priority then things may go the bad way as I 
> just
> described.
> 
> 5. Now the client is done with repainting and if a mouse event has
> already been processed by X, WM wakes up. The whole cycle repeats from
> the point "2". Or from "1", if the mouse hasn't moved yet.
> 
> In fact, this picture is even more complicated, because window 
> manager's
> decoration are subject to be redrawn as well, and (just for an 
> example)
> metacity is quite slow in this. Slow enough to see it visually. The 
> box
> that I test this on is a slow one, 700Mhz Celeron + Matrox G400, but I
> think it allows me to see the real picture more cleanly.
> 
> The most visible problem with these renices is that WM feels a bit
> laggy, the window movement is not smooth enough. But on the other 
> side,
> windows do not erase each other's contents and thus the overall
> impression of the system is better. The other nice point is that 
> virtual
> desktops seem to switch faster with low WM priority, because
> applications redraw themselves before WM draws its decorations: users
> look at the applications first, and their windows swap faster.
> 
> Frankly speaking, I'd like to play with the beast by myself a bit more
> first and then show you what I've got, but the system looks so
> complicated that without your advice there are too many roads to go.
> 
> I've prepared two _really trivial_ patches, the are not fixes, nor
> features. They are intended for more clear understanding of the 
> problem,
> and to show how trivial the necessary modifications could be.
> 
> The first is for Xorg's DIX Dispatch() function (it's against
> X11R6.8.1). My point there is that We should give a priority to 
> handling
> client's commands first, and when we're done with that, handle the
> input. I don't base this on some deep research, just an abstract 
> thought
> about order-of-service. This may event not to make any sense, I wanted
> at least to make sure that the build system works and it's possible to
> make patches. Though it seems to me that X performs better after the
> change than before. Anyway, it seems too simple to me to treat input
> events and clients events as something equal. There could be a better
> way.
> 
> The second patch against linux-2.6.12.5 makes O(1) scheduler algorithm
> as simple as a stick, eliminating temporal priority boosts. It feels 
> bad
> on a slightly-loaded system, but with 98% of time being idle it is ok,
> and the responsiveness of the desktop graphics is much easier to
> understand with a simpler scheduling algorithm.
> 
> Finally, what I'm suggesting to do:
> 
> My point is that it is time to figure out what scheduling algorithm is
> the best for X.Org and implement it in the Linux kernel. X server and 
> WM
> processes may even be considered special in the kernel and handled not
> like conventional processes. It may involve some tricks, maybe 
> something
> beyond a trivial model (say, divide WM's decoration painting from
> geometry recalculation somehow, or whatever matters).
> 
> The whole concept of process priorities and process timeslices, and 
> how
> a higher-priority process first takes its whole timeslice (500ms is a
> normal amount here) and only then all other processes receive control
> and we get a "ping - 0.5s hang - ping - 0.5s hang" system.. this 
> doesn't
> suit well in a desktop system. And time doesn't make things better. I
> upgraded from the 2.6.5 kernel to 2.6.10 and then 2.6.12 kernel 
> recently
> (I have to patch the kernel for my project each time so do it rarely)
> and got the degraded performance. On 2.6.12 I often see how my mouse
> freezes or slows down much more often than on 2.6.5. I even had to
> "iptables .. -j DROP" some of our LAN's ip addresses because they 
> didn't
> let me work normally while stressing my Samba server. It is evident 
> that
> an ordinary user will not end up doing duff -u
> linux-2.6.5/kernel/sched.c linux-2.6.12.5/kernel/sched.c, so the way 
> to
> go should be well-defined or we'll loose it.
> 
> Addressing the problem of X vs WM vs clients scheduling may really
> improve visible X responsiveness. Each component of the system itself 
> is
> already good enough. They just do not interoperate the most efficient
> way. So what is the most efficient way? What do you think?
> 
> Best regards, Dmitry
> _______________________________________________
> xorg mailing list
> xorg at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/xorg