[virglrenderer-devel] multiprocess model and GL

Mon Feb 3 09:53:08 UTC 2020

On Fri, Jan 31, 2020 at 12:00:06PM -0800, Chia-I Wu wrote:
> On Fri, Jan 31, 2020 at 2:41 AM Gerd Hoffmann <kraxel at redhat.com> wrote:
> >
> >   Hi,
> >
> > memory-v4 branch pushed.
> >
> > Went with the single-ioctl approach.  Renamed back to CREATE as we don't
> > have separate "allocate resource id" and "initialize resource" steps any
> > more.
> >
> > So, virgl/vulkan resources would be created via execbuffer, get an
> > object id attached to them so they can be referenced, then we'll create
> > a resource from that.  The single ioctl which will generate multiple
> > virtio commands.
> Does it support cmd_size==0 and object_id!=0?  That is useful for
> cases where execbuffer and resourece_create happen at different times.

Not sure yet.  At the end of the day it boils down to the question
whenever we want allow allocation via EXECBUFFER ioctl, then create
resources later via separate CREATE_BLOB ioctl.

I'd tend to support one model:  Either two ioctls, or execbuffer
included in CREATE_BLOB.  Or would it be useful for userspace to have
both, then pick one at runtime on a case-by-case base?

> > Dumb resources will be created with the same ioctl, just with the DUMB
> > instead of the EXECBUFFER flag set.  The three execbuffer fields will be
> > unused.
> I think the three execbuffer fields can be in a union:
> 
> union {
>     struct {
>       the-three-fields;
>     } execbuffer;
> 
>     __u32 pads[16];
> };
> 
> The alloc type decides which of the fields, if any, is used.  This
> gives us some leeway when a future alloc type needs something else.

Also makes the interface more clear.

> > To be discussed:
> >
> > (1) Do we want/need both VIRTGPU_RESOURCE_FLAG_STORAGE_SHARED_ALLOW and
> >     VIRTGPU_RESOURCE_FLAG_STORAGE_SHARED_REQUIRE?
> The host always have direct access to the guest shmem.  I can see
> three cases when a host accesses the shmem
> 
>  - transfers data into and out of the guest shmem
>  - direct access in CPU domain (CPU access or GPU access w/ userptr)
>  - direct access in device domain (GPU access w/ udmabuf)

Ok, some background is needed here I think:

Guest memory is backed by memfd memory.  Guest resources are scattered
there.  This is where udmabuf comes into play: The host can create a
dmabuf for the scattered pages that way.

The host can mmap() the dmabuf and get a linear mapping of the resource
for cpu access.  That allows operating directly on the resources.
virglrenderer can skip copying the iov into a linear buffer.

virglrenderer could also try import the udmabuf so the gpu can directly
access it.

For the most part this is a host-side optimization.  The only reason
the guest has to worry about this is that with a udmabuf-based shared
mapping the host might see guest changes without explicit TRANSFER
command.  Which breaks mesa.

So my original plan was that the guest can allow to host to use this
optimization (SHARED_ALLOW).  Then it's up to the host to figure
whenever it actually wants create a udmabuf or not.  For small resources
the mmap() overhead might not pay off.  Probably not so much a problem
for vulkan thanks to memory pooling, but for opengl where every object
has its own resource we probably want the option to *not* use a dmabuf.

I some cases it might be useful for the guest to force the host using a
udmabuf, this is what SHARED_REQUIRE is for.

Question is do we want/need them both?  We could drop SHARED_ALLOW.
In that case the guest has to decide on the mmap vs. copy performance
traceoff and pick SHADOW or SHARED accordingly.

> VIRTGPU_RESOURCE_FLAG_STORAGE_SHADOW says the host can access the
> shmem only in response to transfer commands.  It is not very useful
> and can probably be removed.

Well, mesa breaks if the guest can see changes without explicit
TRANSFER, so I think we will need that.

> VIRTGPU_RESOURCE_FLAG_STORAGE_SHARED_CPU says the host can and must
> access the shmem in CPU domain.  The kernel always maps the shmem
> cached and the userspace knows it is coherent.
> 
> VIRTGPU_RESOURCE_FLAG_STORAGE_SHARED_DEVICE says the host can and must
> access the shmem in device domain.  The userspace can ask the kernel
> to give it a coherent mapping or not.  For a coherent mapping, it can
> be wc or wb depending on the platform.  For a incoherent mapping, the
> userspace can use transfers to flush/invalidate cpu cache.

On the host side both are essentially "create and use udmabuf".  So do
we need separate CPU/DEVICE flags here?

> > (2) How to integrate gbm/gralloc allocations best?  Have a
> >     VIRTGPU_RESOURCE_FLAG_ALLOC_GBM, then pass args in the execbuffer?
> >     Or better have a separate RESOURCE_CREATE_GBM ioctl/command and
> >     define everything we need in the virtio spec?
> Instead of RESOURCE_CREATE_GBM, I would replace the three execbuffer
> fields with a union, and add VIRTGPU_RESOURCE_FLAG_ALLOC_GBM and a new
> field to the union.

Yes, that would work too.

> If we were to pass args in the execbuffer, what would be wrong with a
> generic gpu context type that is allocation-only?

Well, if you want use the gbm-allocated resources with virgl/vulkan
anyway this introduces some overhead.  You would need two /dev/dri/card0
handles, one in generic-gpu mode for allocation, one in virgl/vulkan
mode, then go allocate stuff with one, then export + import into the
other ...

Also I think GBM and DUMB resources are quite simliar.

cheers,
  Gerd