[Spice-devel] synchronous io considered harmful

John A. Sullivan III jsullivan at opensourcedevel.com
Fri Jun 17 19:43:19 PDT 2011


On Fri, 2011-06-17 at 09:34 +0200, Gerd Hoffmann wrote:
> On 06/16/11 21:21, Alon Levy wrote:
> > Hi,
> >
> > I just had a short conversation with Marc-Andre about a bug in
> > spice/qxl and it turns out there are a couple of possible solutions
> > and I require some more input.
> >
> > The problem: spice was stalling in flush_display_commands caused by
> > QXL_UPDATE_AREA. Since the vcpu thread is holding the global qemu
> > mutex then qemu is unresponsive during this time to any other
> > clients, specifically a libvirt call via virt-manager
> > (virDomainGetXMLDesc) was blocked.
> 
> Yes, qemu doesn't respond to anything.  I've seen up to 50 miliseconds 
> latency (spice client connected via loopback), which is enougth to 
> disturb EHCI emulation.
> 
> > There are a number of wrong things here, I'll start from what I think
> > we should fix but any of these would solve this specific problem:
> 
> > 1. QXL_IO_UPDATE_AREA is synchronous. If we made it async, by using an
> > interrupt to notify of it's completion, then the io would complete
> > practicaly immediately, and the mutex would be relinquished.
> 
> QXL_IO_UPDATE_AREA isn't the only one, although it is the most 
> important.  And, yes, this is what we should do IMHO.  Make the I/O 
> complete immediately, then set a bit in a status register and raise an 
> IRQ when done.  This is how real hardware usually works.
> 
> > 2. flush_display_commands is waiting for the client if the PIPE is too
> > large. This is wrong - why should a slow client prevent the guest
> > from doing an update area? Presumably we do this to keep the pipe
> > length limited, which presumably limits the amount of memory. I don't
> > have any numbers, but this looks wrong to me.
> 
> Sounds like this should be done anyway.  The spice client shouldn't be 
> able to delay update area requests of the guests, even if they run 
> asynchronously.
> 
> > 3. we don't drop qemu's global mutex when dispatching a message. Since
>  > an IO can happen from any vcpu we would need to add our own dispatcher
> > mutex in spice.
> 
> I've already looked briefly how to fix that best.
> 
> One option is to simply kick off a thread in qemu to run the update area 
> request, and also handle the locking inside qemu.  Advantage is that we 
> don't need libspice-server api changes.  It is a bit unclean though, 
> especially as the work gets offloaded to a thread inside libspice-server 
> anyway.
> 
> The other option is to extend the libspice-server API and add async 
> versions of all syncronous requests with completion callbacks.
> 
> No matter how we fix it on the qemu/spice-server side we need driver 
> updates for the guests ...
> 
<snip>
Out of curiosity and from an end user perspective, would this lead to
the scrolling and lag problem we see.  For the most part, we have been
very impressed with SPICE in our three days of testing.  We do not see
the severe banding that we see with other protocols but we do see delay,
almost as if the screen refresh is done all at once so, instead of
banding, we get nothing for a moment and then, all at once, the screen
appears as it is supposed to.

We have noticed scrolling is confusingly slow. For example, we tested
scrolling on a poorly designed web page with a huge background graphic.
Our testers might do three rolls of the mouse wheel and nothing would
happen.  Then, two or three seconds later, the page scrolls once.
Another two or three seconds and it scrolls again.  Another two or three
seconds and it scrolls the third time.  Any links on the page are
inaccessible until the scrolling stops.  Even scrolling through large
Open Office documents is distractingly delayed; one scrolls the mouse
and the scroll does not start for a fraction of a second.  It is only a
fraction of a second but it is different from the local experience and
so throws our testers.

Relatedly, we noticed that even keystrokes seem delayed when there is a
lot of screen activity.  We kicked off an Internet video and, while it
was running, tried to type in a new URL in the web browser.  The typing
was seriously delayed (as in seconds).

So, I was wondering if this discussion from a coding perspective is
about fixing these kinds of end user experiences.  Thanks - John



More information about the Spice-devel mailing list