[virglrenderer-devel] vulkan + virgl ioctl vs command submission

Thu Feb 27 22:07:16 UTC 2020

On Thu, Feb 27, 2020 at 11:45 AM Dave Airlie <airlied at gmail.com> wrote:
>
> Realised you might not be reading the list, or I asked too hard a question :-P
Sorry that I missed this.
>
> On Tue, 25 Feb 2020 at 12:59, Dave Airlie <airlied at gmail.com> wrote:
> >
> > Okay I think I'm following along the mutiprocess model, and the object
> > id stuff, and I'm mostly coming around to the ideas presented.
> >
> > One question I have is how do we envisage the userspace vulkan driver
> > using things.
> >
> > I kinda feel I'm missing the difference between APIs that access
> > things on the CPU side and command for accessing things on the GPU
> > side in the proposal. In the gallium world the "screen" allocates
> > resources (memory + properties) synchronously on the API being
> > accessed, the context is then for operating on GPU side things where
> > we batch up a command stream and it is processed async.
> >
> > From the Vulkan API POV the application API is multi-thread safe, and
> > we should avoid if we can taking too many locks under the covers, esp
> > in common paths. Vulkan applications are also encouraged to allocate
> > memory in large chunks and subdivide them between resources.
> >
> > I'm concerned that we are thinking of batching allocations in the
> > userspace driver (or in the kernel) and how to flush those to the host
> > side etc. If we have two threads in userspace allocate memory from the
> > vulkan API, and one then does a transfer into the memory, how do we
> > envisage that being flushed to the host side? Like if I allocate
> > memory in one thread, then create images from that memory in another,
> > how does that work out?
> >

The goal of encoding vkAllocateMemory in the execbuffer command stream
is not for batching.  It is to reuse the mechanism to send
API-specific opaque alloc command to the host, and to allow
allocations without resources (e.g., non-shareable allocations from a
non-mappable heap do not need resources).

In the current (but outdated) code[1], there is a per-VkInstance
execbuffer command stream struct (struct vn_cs).  Encoding to the
vn_cs requires a per-instance lock to be taken.  There is also a
per-VkCommandBuffer vn_cs.  Encoding to that vn_cs requires no
locking.  Multiple-threading is only beneficial when the app uses that
to build their VkCommandBuffers.

But vkAllocateMemory can be changed to use a local vn_cs or a local
template to be lock-free.  It will be like

  mem->object_id = next_object_id();

  local_cmd_templ[ALLOCATION_SIZE] = info->allocationSize;
  local_cmd_templ[MEMORY_TYPE_INDEX] = info->memoryTypeIndex;
  local_cmd_templ[OBJECT_ID] = mem->object_id;

  // when a resource is needed;  otherwise, use EXECBUFFER instead
  struct drm_virtgpu_resource_create_blob args = {
    .size = info->allocationSize,
    .flags = VIRTGPU_RESOURCE_FLAG_STORAGE_HOSTMEM,
    .cmd_size = sizeof(local_cmd_templ),
    .cmd = local_cmd_templ,
    .object_id = mem->object_id
  };
  drmIoctl(fd, DRM_IOCTL_VIRTIO_GPU_RESOURCE_CREATE_BLOB, &args);

  mem->resource_id = args.res_handle;
  mem->bo = args.bo_handle;

I think Gurchetan's proposal will look similar, except that the
command stream will be replaced by something more flexible such that
object id is optional.

In the current design (v2), the host will

 - allocate a VkDeviceMemory from the app's VkInstance
 - export an fd (or whatever keeps the underlying storage alive)
 - create a global resource struct to manage the fd

For comparison, it is possible to extend the existing mechanism (v1)
in a different way such that the host will

 - allocate a VkDeviceMemory from a global allocator VkInstance
 - create a resource struct to manage the VkDeviceMemory
 - attach the resource struct to the app's VkInstance
 - export an fd and import its as a VkDeviceMemory in app's VkInstance

One (v2) exports app's VkDeviceMemory as a global resource while the
other (extended v1) imports a global resource as app's VkDeviceMemory.

[1] https://gitlab.freedesktop.org/olv/mesa/commits/venus-wsi

> > I don't see what having a command stream for allocating memory is a
> > good thing here, apart from batching up things that Vulkan really
> > wants the app to batch up itself anyways.
> >
> > Hopefully I'm missing something in the previous discussions.
> >
> > Dave.