[Spice-devel] synchronous io considered harmful

Fri Jun 17 00:34:12 PDT 2011

On 06/16/11 21:21, Alon Levy wrote:
> Hi,
>
> I just had a short conversation with Marc-Andre about a bug in
> spice/qxl and it turns out there are a couple of possible solutions
> and I require some more input.
>
> The problem: spice was stalling in flush_display_commands caused by
> QXL_UPDATE_AREA. Since the vcpu thread is holding the global qemu
> mutex then qemu is unresponsive during this time to any other
> clients, specifically a libvirt call via virt-manager
> (virDomainGetXMLDesc) was blocked.

Yes, qemu doesn't respond to anything.  I've seen up to 50 miliseconds 
latency (spice client connected via loopback), which is enougth to 
disturb EHCI emulation.

> There are a number of wrong things here, I'll start from what I think
> we should fix but any of these would solve this specific problem:

> 1. QXL_IO_UPDATE_AREA is synchronous. If we made it async, by using an
> interrupt to notify of it's completion, then the io would complete
> practicaly immediately, and the mutex would be relinquished.

QXL_IO_UPDATE_AREA isn't the only one, although it is the most 
important.  And, yes, this is what we should do IMHO.  Make the I/O 
complete immediately, then set a bit in a status register and raise an 
IRQ when done.  This is how real hardware usually works.

> 2. flush_display_commands is waiting for the client if the PIPE is too
> large. This is wrong - why should a slow client prevent the guest
> from doing an update area? Presumably we do this to keep the pipe
> length limited, which presumably limits the amount of memory. I don't
> have any numbers, but this looks wrong to me.

Sounds like this should be done anyway.  The spice client shouldn't be 
able to delay update area requests of the guests, even if they run 
asynchronously.

> 3. we don't drop qemu's global mutex when dispatching a message. Since
 > an IO can happen from any vcpu we would need to add our own dispatcher
> mutex in spice.

I've already looked briefly how to fix that best.

One option is to simply kick off a thread in qemu to run the update area 
request, and also handle the locking inside qemu.  Advantage is that we 
don't need libspice-server api changes.  It is a bit unclean though, 
especially as the work gets offloaded to a thread inside libspice-server 
anyway.

The other option is to extend the libspice-server API and add async 
versions of all syncronous requests with completion callbacks.

No matter how we fix it on the qemu/spice-server side we need driver 
updates for the guests ...

cheers,
   Gerd