[Spice-devel] synchronous io considered harmful
Alon Levy
alevy at redhat.com
Sun Jun 19 11:57:36 PDT 2011
On Sun, Jun 19, 2011 at 10:43:32AM +0300, Yonit Halperin wrote:
> Hi,
> On 06/17/2011 07:12 PM, Alon Levy wrote:
> >On Fri, Jun 17, 2011 at 09:34:12AM +0200, Gerd Hoffmann wrote:
> >>On 06/16/11 21:21, Alon Levy wrote:
> >>>Hi,
> >>>
> >>>I just had a short conversation with Marc-Andre about a bug in
> >>>spice/qxl and it turns out there are a couple of possible solutions
> >>>and I require some more input.
> >>>
> >>>The problem: spice was stalling in flush_display_commands caused by
> >>>QXL_UPDATE_AREA. Since the vcpu thread is holding the global qemu
> >>>mutex then qemu is unresponsive during this time to any other
> >>>clients, specifically a libvirt call via virt-manager
> >>>(virDomainGetXMLDesc) was blocked.
> >>
> >>Yes, qemu doesn't respond to anything. I've seen up to 50
> >>miliseconds latency (spice client connected via loopback), which is
> >>enougth to disturb EHCI emulation.
> >>
> >>>There are a number of wrong things here, I'll start from what I think
> >>>we should fix but any of these would solve this specific problem:
> >>
> >>>1. QXL_IO_UPDATE_AREA is synchronous. If we made it async, by using an
> >>>interrupt to notify of it's completion, then the io would complete
> >>>practicaly immediately, and the mutex would be relinquished.
> >>
> >>QXL_IO_UPDATE_AREA isn't the only one, although it is the most
> >>important. And, yes, this is what we should do IMHO. Make the I/O
> >>complete immediately, then set a bit in a status register and raise
> >>an IRQ when done. This is how real hardware usually works.
> >>
> >>>2. flush_display_commands is waiting for the client if the PIPE is too
> >>>large. This is wrong - why should a slow client prevent the guest
> >>>from doing an update area? Presumably we do this to keep the pipe
> >>>length limited, which presumably limits the amount of memory. I don't
> >>>have any numbers, but this looks wrong to me.
> >>
> >>Sounds like this should be done anyway. The spice client shouldn't
> >>be able to delay update area requests of the guests, even if they
> >>run asynchronously.
>
> Actually, all operations in the guest that push commands, can be
> delayed by the client (WaitForCmdRing) - since we don't process
> commands as long as the pipe size is to big. Which make sense, since
> we want to keep the rates of the producer and consumer balanced in
> order to keep the memory usage stable and not trigger
> QXL_IO_NOTIFY_OOM.
It makes sense to consume as fast as you produce, but it doesn't make sense
to have the client keep the guest waiting for that, that's a consequence of
two things:
* we don't copy anything out of the pci, we keep references to it - this
makes sense to reduce copies, but causes the option to OOM if guest produces
faster then server consumes.
* our send-surfaces-on-demand implementation hardcodes the pipe length, so it
relies on it being limited in advance.
I don't have concrete suggestions, just pointing this out. The second problem should
be possible to fix by at the minimum just reallocating the surface_id array (which
has the option of oom by virtue of slow client, but we can still limit it artificially,
just higher).
> I agree that update_area is exceptional because it might wait for
> more than one pipe item to be sent.
> First, as already been discussed previously, we probably need to
> limit the pipe according to the total size of elements it holds, and
> not by its own size. Second, in the update area case, maybe we can
> exceed the pipe size, but keep track of memory usage and in order to
> avoid succeeding OOMs we can free other drawables the worker holds
> (i.e., do what we actually do in OOM).
So track memory used by the guest by following the commands we read from
the ring? sounds like we should do that anyway for correct pipe accounting, like
you pointed out that was already discussed.
> >>
> >>>3. we don't drop qemu's global mutex when dispatching a message. Since
> >>>an IO can happen from any vcpu we would need to add our own dispatcher
> >>>mutex in spice.
> >>
> >>I've already looked briefly how to fix that best.
> >>
> >>One option is to simply kick off a thread in qemu to run the update
> >>area request, and also handle the locking inside qemu. Advantage is
> >>that we don't need libspice-server api changes. It is a bit unclean
> >>though, especially as the work gets offloaded to a thread inside
> >>libspice-server anyway.
> >>
> >>The other option is to extend the libspice-server API and add async
> >>versions of all syncronous requests with completion callbacks.
> >>
> >>No matter how we fix it on the qemu/spice-server side we need driver
> >>updates for the guests ...
> >
> >Yes. My current thinking (and I'm continuing with UPDATE_AREA as the example,
> >but like you noted this needs to be done for any of the other IO's that are
> >causing possible delays, like DESTROY_PRIMARY, CREATE_PRIMARY) is to add
> >UPDATE_AREA_ASYNC to not break current drivers and be able to do the change
> >in steps (qemu update, win driver update, lin driver update). Regarding the linux
> >driver, we would need to have one for this to work - currently we don't have an
> >interrupt handler at all.
> >
> >One issue is if we want to handle multiple outstanding update area requests or
> >not. Doing a single one would be just adding a field for surface_id in the
> >QXLRam, and adding an interrupt value (we use two out of 32 bits, so we have
> >quite a few left).
> >
> You mean that while the driver preforms update area, another driver
> thread attempts to call update area as well? Interesting if this
> really happens. I think succeeding calls to UpdateArea can be kept
> synchronous, i.e., we can use a mutex for this calls in the driver.
Agreed about mutex. No idea if this happens in practice (but easy to try to
"catch in the wild")
> >If we want to handle multiple it becomes more difficult. It seems to me it would be
> >worth adding a ring in such a case, since we can't reuse the existing command ring for
> >update_area because the whole point of that request is to flush the command ring (or
> >at least only those operations affecting that surface, but there is no way to know that
> >without going over the whole ring). Otoh maybe we could reuse the command ring
> >I guess, we could even convert these update_area calls to something like a
> >fence? but I haven't thought about that yet.
> >
> >There is a problem with reusing the command ring - if it is full you can't push anything into
> >it, including an QXLUpdateArea command..
> >
> >So to sum the points:
> > 1. UPDATE_AREA_ASYNC, linux driver for X usage, on windows driver already has an interrupt
> > handler. Single outstanding request availble (so driver will have to serialize them)
> > 2. Need to consider if we want to have multiple outstanding updates.
> >
> >In general I think we should be aiming to eliminate update_area usage as much as possible (because
> >it requires a vmexit), so I guess maybe 1 will be enough in the long run.
> >
> >>
> >>cheers,
> >> Gerd
> >>_______________________________________________
> >>Spice-devel mailing list
> >>Spice-devel at lists.freedesktop.org
> >>http://lists.freedesktop.org/mailman/listinfo/spice-devel
>
More information about the Spice-devel
mailing list