[virglrenderer-devel] coherent memory access for virgl
Gurchetan Singh
gurchetansingh at chromium.org
Wed Sep 26 01:28:55 UTC 2018
On Tue, Sep 25, 2018 at 2:10 AM Gerd Hoffmann <kraxel at redhat.com> wrote:
>
> Hi,
>
> > > Who will do the actual allocations? I expect we need new virglrenderer
> > > functions for that?
> >
> > The decision to back memory via iovecs or host memory is up to the
> > VMM.
>
> What exactly do you mean with "via iovecs"? The current way to allocate
> resources? They are guest-allocated and the iovecs passed to
> virglrenderer point into guest memory. So that clearly is *not* in the
> hands of the VMM. Or do you mean something else?
I guess I'm just brainstorming about one-copy with virgl and how we
want to implement it.
The simplest case that can be improved with host memory is:
(1) Guest app maps the buffer (glMapBufferRange --> virgl_buffer_transfer_map).
(i) If the buffer is not marked clean (texture buffers, SSBOs)
this will trigger a TRANSFER_FROM_HOST_3D (copies++)
(2) The guest app copies the buffer data (unavoidable - copies++)
(3) Guest unmaps the buffer, triggering a TRANSFER_TO_HOST (copies++).
For host GL buffers, the copies are done in
{vrend_renderer_transfer_write_iov, vrend_renderer_transfer_send_iov}.
If there are N iovecs backing the guest resource, we will have N
copies (see vrend_write_to_iovec, vrend_read_from_iovec).
udmabuf could be helpful, since it bundles up the iovecs and it will
make the N small copies into one big copy. udmabuf could also
eliminate some copies for textures completely. Right now, for most
textures, virglrenderer copies iovecs into a temporary buffer (see
read_transfer_data), and then calls glTexSubImage2D*. Just mmaping
the udmabuf and calling glTexSubImage2D* is definite win.
But making host memory guest visible will bring the worst-case buffer
copies from 3 to 1. For textures, if we start counting when the GPU
buffer gets detiled, there will be 5 copies currently, 3 with udmabuf,
and 1 with host exposed memory.
>
>
> To make sure we all are on the same page wrt. resource allocation, the
> workflow we have now looks like this:
>
> (1) guest virtio-gpu driver allocates resource. Uses normal (guest) ram.
> Resources can be scattered.
> (2) guest driver creates resources (RESOURCE_CREATE_*).
> (3) qemu (virgl=off) or virglrenderer (virgl=on) creates host resource.
> virglrenderer might use a different format (tiling, ...).
> (4) guest sets up backing storage (RESOURCE_ATTACH_BACKING).
> (5) qemu creates a iovec for the guest resource.
> (6) guest writes data to resource.
> (7) guest requests a transfer (TRANSFER_TO_HOST_*).
> (8) qemu or virglrenderer copy data from guest resource to
> host resource, possibly converting (again tiling, ...).
> (9) guest can use the resource now ...
>
>
> One thing I'm prototyping right now is zerocopy resources, the workflow
> changes to look like this:
>
> (2) guest additionally sets a flag to request a zerocopy buffer.
> (3) not needed (well, the bookkeeping part of it is still needed, but
> it would *not* allocate a host resource).
> (5) qemu additionally creates a host dma-buf for the guest resource
> using the udmabuf driver.
> (7+8) not needed.
>
> Right now I have (not tested yet) code to handle dumb buffers.
> Interfacing to guest userspace (virtio-gpu driver ioctls) is not
> there yet. Interfacing with virglrenderer isn't there yet either.
>
> I expect that doesn't solve the coherent mapping issue. The host gpu
> could import the dma-buf of the resource, but as it has no control over
> the allocation it might not be able to use it without copying.
>
>
> I'm not sure how the API for coherent resources should look like.
> One option I see is yet another resource flag, so the workflow would
> change like this (with virgl=on only ...):
>
> (2) guest additionally sets a flag to request a coherent resource.
> (3) virglrenderer would create a coherent host resource.
> (4) guest finds some address space in the (new) pci bar and asks
> for the resource being mapped there (new command needed for
> this).
> (5) qemu maps the coherent resource into the pci bar.
> (7+8) not needed.
>
> Probably works for GL_MAP_COHERENT_BIT use case. Dunno about vulkan.
>
> Interfaces to guest userspace and virglrenderer likewise need updates
> to support this.
>
>
> > A related question: are we going to also expose host memory to the
> > guest for the non-{GL_MAP_COHERENT_BIT,
> > VK_MEMORY_PROPERTY_HOST_COHERENT_BIT} cases?
>
> The guest should be able to do that, yes. In case both coherent and
> zerocopy resources are supported by the host it can even pick.
>
> coherent resources will be limited though (pci bar size, also because
> we don't want allow guests allocate unlimited host memory for security
> reasons), so using them for everything is probably not a good idea.
The 64-bit BAR should be enough, especially if it's managed
intelligently. Vulkan may take some time and I don't think stacks
with host GLES drivers support GL_MAP_COHERENT_BIT, so there will be
cases when that space goes unused.
Here's one possible flow:
i) virtio_gpu_resource_create_coherent -- for strictly coherent needs
(i.e, no unmap needed)
ii) virtio_gpu_resource_create_3d -- may or may not be host backed
(depends on the PCI bar size, platform-specific information -- guest
doesn't need to know)
The guest would still issue the transfer ioctls for the
virtio_gpu_resource_create_3d resource, but the work performed would
be pared down when backed by host memory.
This will require increased VMM <--> virglrenderer inter-op. Maybe
behind a flag that QEMU doesn't set, but cros_vm will. WDTY?
>
> cheers,
> Gerd
>
More information about the virglrenderer-devel
mailing list