[virglrenderer-devel] coherent memory access for virgl

Wed Sep 26 07:51:15 UTC 2018

On Wed, Sep 26, 2018 at 3:29 AM Gurchetan Singh <gurchetansingh at chromium.org>
wrote:

> On Tue, Sep 25, 2018 at 2:10 AM Gerd Hoffmann <kraxel at redhat.com> wrote:
> >
> >   Hi,
> >
> > > > Who will do the actual allocations?  I expect we need new
> virglrenderer
> > > > functions for that?
> > >
> > > The decision to back memory via iovecs or host memory is up to the
> > > VMM.
> >
> > What exactly do you mean with "via iovecs"?  The current way to allocate
> > resources?  They are guest-allocated and the iovecs passed to
> > virglrenderer point into guest memory.  So that clearly is *not* in the
> > hands of the VMM.  Or do you mean something else?
>
> I guess I'm just brainstorming about one-copy with virgl and how we
> want to implement it.
>
> The simplest case that can be improved with host memory is:
>
> (1) Guest app maps the buffer (glMapBufferRange -->
> virgl_buffer_transfer_map).
>       (i)  If the buffer is not marked clean (texture buffers, SSBOs)
> this will trigger a TRANSFER_FROM_HOST_3D (copies++)
> (2) The guest app copies the buffer data (unavoidable - copies++)
> (3) Guest unmaps the buffer, triggering a TRANSFER_TO_HOST (copies++).
>
> For host GL buffers, the copies are done in
> {vrend_renderer_transfer_write_iov, vrend_renderer_transfer_send_iov}.
> If there are N iovecs backing the guest resource, we will have N
> copies (see vrend_write_to_iovec, vrend_read_from_iovec).
>
> udmabuf could be helpful, since it bundles up the iovecs and it will
> make the N small copies into one big copy.  udmabuf could also
> eliminate some copies for textures completely.  Right now, for most
> textures, virglrenderer copies iovecs into a temporary buffer (see
> read_transfer_data), and then calls glTexSubImage2D*.  Just mmaping
> the udmabuf and calling glTexSubImage2D* is definite win.
>
> But making host memory guest visible will bring the worst-case buffer
> copies from 3 to 1.  For textures, if we start counting when the GPU
> buffer gets detiled, there will be 5 copies currently, 3 with udmabuf,
> and 1 with host exposed memory.
>
> >
> >
> > To make sure we all are on the same page wrt. resource allocation, the
> > workflow we have now looks like this:
> >
> >   (1) guest virtio-gpu driver allocates resource.  Uses normal (guest)
> ram.
> >       Resources can be scattered.
> >   (2) guest driver creates resources (RESOURCE_CREATE_*).
> >   (3) qemu (virgl=off) or virglrenderer (virgl=on) creates host resource.
> >       virglrenderer might use a different format (tiling, ...).
> >   (4) guest sets up backing storage (RESOURCE_ATTACH_BACKING).
> >   (5) qemu creates a iovec for the guest resource.
> >   (6) guest writes data to resource.
> >   (7) guest requests a transfer (TRANSFER_TO_HOST_*).
> >   (8) qemu or virglrenderer copy data from guest resource to
> >       host resource, possibly converting (again tiling, ...).
> >   (9) guest can use the resource now ...
> >
> >
> > One thing I'm prototyping right now is zerocopy resources, the workflow
> > changes to look like this:
> >
> >   (2) guest additionally sets a flag to request a zerocopy buffer.
> >   (3) not needed (well, the bookkeeping part of it is still needed, but
> >       it would *not* allocate a host resource).
> >   (5) qemu additionally creates a host dma-buf for the guest resource
> >       using the udmabuf driver.
> >   (7+8) not needed.
> >
> > Right now I have (not tested yet) code to handle dumb buffers.
> > Interfacing to guest userspace (virtio-gpu driver ioctls) is not
> > there yet.  Interfacing with virglrenderer isn't there yet either.
> >
> > I expect that doesn't solve the coherent mapping issue.  The host gpu
> > could import the dma-buf of the resource, but as it has no control over
> > the allocation it might not be able to use it without copying.
> >
> >
> > I'm not sure how the API for coherent resources should look like.
> > One option I see is yet another resource flag, so the workflow would
> > change like this (with virgl=on only ...):
> >
> >   (2) guest additionally sets a flag to request a coherent resource.
> >   (3) virglrenderer would create a coherent host resource.
> >   (4) guest finds some address space in the (new) pci bar and asks
> >       for the resource being mapped there (new command needed for
> >       this).
> >   (5) qemu maps the coherent resource into the pci bar.
> >   (7+8) not needed.
> >
> > Probably works for GL_MAP_COHERENT_BIT use case.  Dunno about vulkan.
> >
> > Interfaces to guest userspace and virglrenderer likewise need updates
> > to support this.
> >
> >
> > > A related question: are we going to also expose host memory to the
> > > guest for the non-{GL_MAP_COHERENT_BIT,
> > > VK_MEMORY_PROPERTY_HOST_COHERENT_BIT} cases?
> >
> > The guest should be able to do that, yes.  In case both coherent and
> > zerocopy resources are supported by the host it can even pick.
> >
> > coherent resources will be limited though (pci bar size, also because
> > we don't want allow guests allocate unlimited host memory for security
> > reasons), so using them for everything is probably not a good idea.
>
> The 64-bit BAR should be enough, especially if it's managed
> intelligently.  Vulkan may take some time and I don't think stacks
> with host GLES drivers support GL_MAP_COHERENT_BIT, so there will be
> cases when that space goes unused.
>
> Here's one possible flow:
>
> i) virtio_gpu_resource_create_coherent -- for strictly coherent needs
> (i.e, no unmap needed)
> ii) virtio_gpu_resource_create_3d  -- may or may not be host backed
> (depends on the PCI bar size, platform-specific information -- guest
> doesn't need to know)
>
> The guest would still issue the transfer ioctls for the
> virtio_gpu_resource_create_3d resource, but the work performed would
> be pared down when backed by host memory.
>
> This will require increased VMM <--> virglrenderer inter-op.  Maybe
> behind a flag that QEMU doesn't set, but cros_vm will.  WDTY?
>
I think that's a pretty good middle ground. The amount of inter-op between
vmm and virglrenderer is already fairly high, especially with the stuff
crosvm does to ensure virtio_gpu resource are allocated in a way so that
they can't be sent over wayland connections.

>
> >
> > cheers,
> >   Gerd
> >
> _______________________________________________
> virglrenderer-devel mailing list
> virglrenderer-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/virglrenderer-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/virglrenderer-devel/attachments/20180926/0b73fced/attachment-0001.html>