[Spice-devel] synchronous io considered harmful

Sun Jun 19 00:43:32 PDT 2011

Hi,
On 06/17/2011 07:12 PM, Alon Levy wrote:
> On Fri, Jun 17, 2011 at 09:34:12AM +0200, Gerd Hoffmann wrote:
>> On 06/16/11 21:21, Alon Levy wrote:
>>> Hi,
>>>
>>> I just had a short conversation with Marc-Andre about a bug in
>>> spice/qxl and it turns out there are a couple of possible solutions
>>> and I require some more input.
>>>
>>> The problem: spice was stalling in flush_display_commands caused by
>>> QXL_UPDATE_AREA. Since the vcpu thread is holding the global qemu
>>> mutex then qemu is unresponsive during this time to any other
>>> clients, specifically a libvirt call via virt-manager
>>> (virDomainGetXMLDesc) was blocked.
>>
>> Yes, qemu doesn't respond to anything.  I've seen up to 50
>> miliseconds latency (spice client connected via loopback), which is
>> enougth to disturb EHCI emulation.
>>
>>> There are a number of wrong things here, I'll start from what I think
>>> we should fix but any of these would solve this specific problem:
>>
>>> 1. QXL_IO_UPDATE_AREA is synchronous. If we made it async, by using an
>>> interrupt to notify of it's completion, then the io would complete
>>> practicaly immediately, and the mutex would be relinquished.
>>
>> QXL_IO_UPDATE_AREA isn't the only one, although it is the most
>> important.  And, yes, this is what we should do IMHO.  Make the I/O
>> complete immediately, then set a bit in a status register and raise
>> an IRQ when done.  This is how real hardware usually works.
>>
>>> 2. flush_display_commands is waiting for the client if the PIPE is too
>>> large. This is wrong - why should a slow client prevent the guest
>> >from doing an update area? Presumably we do this to keep the pipe
>>> length limited, which presumably limits the amount of memory. I don't
>>> have any numbers, but this looks wrong to me.
>>
>> Sounds like this should be done anyway.  The spice client shouldn't
>> be able to delay update area requests of the guests, even if they
>> run asynchronously.

Actually, all operations in the guest that push commands, can be delayed 
by the client (WaitForCmdRing) - since we don't process commands as long 
as the pipe size is to big. Which make sense, since we want to keep the 
rates of the producer and consumer balanced in order to keep the memory 
usage stable and not trigger QXL_IO_NOTIFY_OOM.
I agree that update_area is exceptional because it might wait for more 
than one pipe item to be sent.
First, as already been discussed previously, we probably need to limit 
the pipe according to the total size of elements it holds, and not by 
its own size. Second, in the update area case, maybe we can exceed the 
pipe size, but keep track of memory usage and in order to avoid 
succeeding OOMs we can free other drawables the worker holds (i.e., do 
what we actually do in OOM).
>>
>>> 3. we don't drop qemu's global mutex when dispatching a message. Since
>>> an IO can happen from any vcpu we would need to add our own dispatcher
>>> mutex in spice.
>>
>> I've already looked briefly how to fix that best.
>>
>> One option is to simply kick off a thread in qemu to run the update
>> area request, and also handle the locking inside qemu.  Advantage is
>> that we don't need libspice-server api changes.  It is a bit unclean
>> though, especially as the work gets offloaded to a thread inside
>> libspice-server anyway.
>>
>> The other option is to extend the libspice-server API and add async
>> versions of all syncronous requests with completion callbacks.
>>
>> No matter how we fix it on the qemu/spice-server side we need driver
>> updates for the guests ...
>
> Yes. My current thinking (and I'm continuing with UPDATE_AREA as the example,
> but like you noted this needs to be done for any of the other IO's that are
> causing possible delays, like DESTROY_PRIMARY, CREATE_PRIMARY) is to add
> UPDATE_AREA_ASYNC to not break current drivers and be able to do the change
> in steps (qemu update, win driver update, lin driver update). Regarding the linux
> driver, we would need to have one for this to work - currently we don't have an
> interrupt handler at all.
>
> One issue is if we want to handle multiple outstanding update area requests or
> not. Doing a single one would be just adding a field for surface_id in the
> QXLRam, and adding an interrupt value (we use two out of 32 bits, so we have
> quite a few left).
>
You mean that while the driver preforms update area, another driver 
thread attempts to call update area as well? Interesting if this really 
happens. I think succeeding calls to UpdateArea can be kept synchronous, 
i.e., we can use a mutex for this calls in the driver.
> If we want to handle multiple it becomes more difficult. It seems to me it would be
> worth adding a ring in such a case, since we can't reuse the existing command ring for
> update_area because the whole point of that request is to flush the command ring (or
> at least only those operations affecting that surface, but there is no way to know that
> without going over the whole ring). Otoh maybe we could reuse the command ring
> I guess, we could even convert these update_area calls to something like a
> fence? but I haven't thought about that yet.
>
> There is a problem with reusing the command ring - if it is full you can't push anything into
> it, including an QXLUpdateArea command..
>
> So to sum the points:
>   1. UPDATE_AREA_ASYNC, linux driver for X usage, on windows driver already has an interrupt
>   handler. Single outstanding request availble (so driver will have to serialize them)
>   2. Need to consider if we want to have multiple outstanding updates.
>
> In general I think we should be aiming to eliminate update_area usage as much as possible (because
> it requires a vmexit), so I guess maybe 1 will be enough in the long run.
>
>>
>> cheers,
>>    Gerd
>> _______________________________________________
>> Spice-devel mailing list
>> Spice-devel at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/spice-devel