[virglrenderer-devel] multiprocess model and GL

Tue Feb 4 03:32:27 UTC 2020

On Mon, Feb 3, 2020 at 2:17 AM Gerd Hoffmann <kraxel at redhat.com> wrote:
> Not sure about that.  Having one function per job which you can freely
> combine as needed tends to work better than a single does-everything
> function.

Now we agree one-ioctl, one kick is desirable, how to batch (either in
the kernel or in hypervisor) is an implementation detail.

The current virglrenderer model is styled with a single resource creation step.

With the new ioctl, we want to do the same + zero-copy mechanisms +
multiple allocators.  We just need the resource id, opaque resource
creation metadata [again, it's not necessarily an execbuffer]) to do
the allocation and setup the resource id -> struct resource mapping.

I think resource map can be a separate step, since it requires the
resource to be already fully initialized.

> Does it make sense to pass the scatter list at resource creation time
(for SHADOW+SHARED objects)?

Yes, we need to import into EGL/VK sometimes (need resource metadata,
which allocator).  AFAICT the initial design had ATTACH_BACKING as an
artifact of TTM, and there's a lot of cruft associated with that..  Of
course, for host-only resources the number of the entries in the
sg-list will be zero.

[snip]
> (1) Do we want/need both VIRTGPU_RESOURCE_FLAG_STORAGE_SHARED_ALLOW and
>     VIRTGPU_RESOURCE_FLAG_STORAGE_SHARED_REQUIRE?

STORAGE_SHARED_REQUIRED is very useful ... userspace needs to know
that the hypervisor will indeed create the dma-buf.  SHARED_ALLOW
doesn't offer that same guarantee, which user space needs to avoid
transfer ioctls.

A hypervisor created dma-buf -- whether from system ram or a dedicated
heap -- is likely to work out of the box with free and open source GPU
drivers when Vulkan HOST_COHERENT_BIT | HOST_CACHED_BIT bits are set.
Even without the HOST_CACHED_BIT, we can do cache management
operations before importing if that's desired.

Many GPUs (all recent Intel devices) only have a cache-coherent memory
type, and the decision to import will be based on GPU capabilities.
Since udmabuf is a mainline kernel driver and QEMU will have support
for it, virtgpu should have support for it as well.

[snip]
> (2) How to integrate gbm/gralloc allocations best?  Have a
>     VIRTGPU_RESOURCE_FLAG_ALLOC_GBM, then pass args in the execbuffer?
>     Or better have a separate RESOURCE_CREATE_GBM ioctl/command and
>     define everything we need in the virtio spec?

With ALLOC_EXECBUFFER (or equivalent), no separate flag or ioctl is
needed.  Whether the resource is allocated by VK, GL, GBM, liballoc
can be an internal renderer library detail known by host/guest user
space only.  That's the direction I want to go.

However, since EXECBUFFER is one part of the virglrenderer protocol
(see my reply to Chia), I would rename the flags in memory-v4 to:

VIRTGPU_RESOURCE_FLAG_ALLOC_DUMB ==> VIRTGPU_RESOURCE_FLAG_ALLOC_2D
VIRTGPU_RESOURCE_FLAG_ALLOC_EXECBUFFER ==> VIRTGPU_RESOURCE_FLAG_ALLOC_3D
__u64 command ==> __u64 3d_resource_create_cmd;

That way, all 3D resource creation requests will just need to specify
3d_resource_create_cmd, and then we're done.

[snip]
> I'd tend to support one model:  Either two ioctls, or execbuffer
> included in CREATE_BLOB.  Or would it be useful for userspace to have
> both, then pick one at runtime on a case-by-case base?

Have both.  There are good use cases for both -- the right decision,
as always, will be made by userspace.