[virglrenderer-devel] coherent memory access for virgl
Zach Reizner
zachr at google.com
Wed Sep 26 07:51:15 UTC 2018
On Wed, Sep 26, 2018 at 3:29 AM Gurchetan Singh <gurchetansingh at chromium.org>
wrote:
> On Tue, Sep 25, 2018 at 2:10 AM Gerd Hoffmann <kraxel at redhat.com> wrote:
> >
> > Hi,
> >
> > > > Who will do the actual allocations? I expect we need new
> virglrenderer
> > > > functions for that?
> > >
> > > The decision to back memory via iovecs or host memory is up to the
> > > VMM.
> >
> > What exactly do you mean with "via iovecs"? The current way to allocate
> > resources? They are guest-allocated and the iovecs passed to
> > virglrenderer point into guest memory. So that clearly is *not* in the
> > hands of the VMM. Or do you mean something else?
>
> I guess I'm just brainstorming about one-copy with virgl and how we
> want to implement it.
>
> The simplest case that can be improved with host memory is:
>
> (1) Guest app maps the buffer (glMapBufferRange -->
> virgl_buffer_transfer_map).
> (i) If the buffer is not marked clean (texture buffers, SSBOs)
> this will trigger a TRANSFER_FROM_HOST_3D (copies++)
> (2) The guest app copies the buffer data (unavoidable - copies++)
> (3) Guest unmaps the buffer, triggering a TRANSFER_TO_HOST (copies++).
>
> For host GL buffers, the copies are done in
> {vrend_renderer_transfer_write_iov, vrend_renderer_transfer_send_iov}.
> If there are N iovecs backing the guest resource, we will have N
> copies (see vrend_write_to_iovec, vrend_read_from_iovec).
>
> udmabuf could be helpful, since it bundles up the iovecs and it will
> make the N small copies into one big copy. udmabuf could also
> eliminate some copies for textures completely. Right now, for most
> textures, virglrenderer copies iovecs into a temporary buffer (see
> read_transfer_data), and then calls glTexSubImage2D*. Just mmaping
> the udmabuf and calling glTexSubImage2D* is definite win.
>
> But making host memory guest visible will bring the worst-case buffer
> copies from 3 to 1. For textures, if we start counting when the GPU
> buffer gets detiled, there will be 5 copies currently, 3 with udmabuf,
> and 1 with host exposed memory.
>
> >
> >
> > To make sure we all are on the same page wrt. resource allocation, the
> > workflow we have now looks like this:
> >
> > (1) guest virtio-gpu driver allocates resource. Uses normal (guest)
> ram.
> > Resources can be scattered.
> > (2) guest driver creates resources (RESOURCE_CREATE_*).
> > (3) qemu (virgl=off) or virglrenderer (virgl=on) creates host resource.
> > virglrenderer might use a different format (tiling, ...).
> > (4) guest sets up backing storage (RESOURCE_ATTACH_BACKING).
> > (5) qemu creates a iovec for the guest resource.
> > (6) guest writes data to resource.
> > (7) guest requests a transfer (TRANSFER_TO_HOST_*).
> > (8) qemu or virglrenderer copy data from guest resource to
> > host resource, possibly converting (again tiling, ...).
> > (9) guest can use the resource now ...
> >
> >
> > One thing I'm prototyping right now is zerocopy resources, the workflow
> > changes to look like this:
> >
> > (2) guest additionally sets a flag to request a zerocopy buffer.
> > (3) not needed (well, the bookkeeping part of it is still needed, but
> > it would *not* allocate a host resource).
> > (5) qemu additionally creates a host dma-buf for the guest resource
> > using the udmabuf driver.
> > (7+8) not needed.
> >
> > Right now I have (not tested yet) code to handle dumb buffers.
> > Interfacing to guest userspace (virtio-gpu driver ioctls) is not
> > there yet. Interfacing with virglrenderer isn't there yet either.
> >
> > I expect that doesn't solve the coherent mapping issue. The host gpu
> > could import the dma-buf of the resource, but as it has no control over
> > the allocation it might not be able to use it without copying.
> >
> >
> > I'm not sure how the API for coherent resources should look like.
> > One option I see is yet another resource flag, so the workflow would
> > change like this (with virgl=on only ...):
> >
> > (2) guest additionally sets a flag to request a coherent resource.
> > (3) virglrenderer would create a coherent host resource.
> > (4) guest finds some address space in the (new) pci bar and asks
> > for the resource being mapped there (new command needed for
> > this).
> > (5) qemu maps the coherent resource into the pci bar.
> > (7+8) not needed.
> >
> > Probably works for GL_MAP_COHERENT_BIT use case. Dunno about vulkan.
> >
> > Interfaces to guest userspace and virglrenderer likewise need updates
> > to support this.
> >
> >
> > > A related question: are we going to also expose host memory to the
> > > guest for the non-{GL_MAP_COHERENT_BIT,
> > > VK_MEMORY_PROPERTY_HOST_COHERENT_BIT} cases?
> >
> > The guest should be able to do that, yes. In case both coherent and
> > zerocopy resources are supported by the host it can even pick.
> >
> > coherent resources will be limited though (pci bar size, also because
> > we don't want allow guests allocate unlimited host memory for security
> > reasons), so using them for everything is probably not a good idea.
>
> The 64-bit BAR should be enough, especially if it's managed
> intelligently. Vulkan may take some time and I don't think stacks
> with host GLES drivers support GL_MAP_COHERENT_BIT, so there will be
> cases when that space goes unused.
>
> Here's one possible flow:
>
> i) virtio_gpu_resource_create_coherent -- for strictly coherent needs
> (i.e, no unmap needed)
> ii) virtio_gpu_resource_create_3d -- may or may not be host backed
> (depends on the PCI bar size, platform-specific information -- guest
> doesn't need to know)
>
> The guest would still issue the transfer ioctls for the
> virtio_gpu_resource_create_3d resource, but the work performed would
> be pared down when backed by host memory.
>
> This will require increased VMM <--> virglrenderer inter-op. Maybe
> behind a flag that QEMU doesn't set, but cros_vm will. WDTY?
>
I think that's a pretty good middle ground. The amount of inter-op between
vmm and virglrenderer is already fairly high, especially with the stuff
crosvm does to ensure virtio_gpu resource are allocated in a way so that
they can't be sent over wayland connections.
>
> >
> > cheers,
> > Gerd
> >
> _______________________________________________
> virglrenderer-devel mailing list
> virglrenderer-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/virglrenderer-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/virglrenderer-devel/attachments/20180926/0b73fced/attachment-0001.html>
More information about the virglrenderer-devel
mailing list