[virglrenderer-devel] vulkan + virgl ioctl vs command submission

Thu Feb 27 22:28:17 UTC 2020

On Thu, Feb 27, 2020 at 2:07 PM Chia-I Wu <olvaffe at gmail.com> wrote:
>
> On Thu, Feb 27, 2020 at 11:45 AM Dave Airlie <airlied at gmail.com> wrote:
> >
> > Realised you might not be reading the list, or I asked too hard a question :-P
> Sorry that I missed this.
> >
> > On Tue, 25 Feb 2020 at 12:59, Dave Airlie <airlied at gmail.com> wrote:
> > >
> > > Okay I think I'm following along the mutiprocess model, and the object
> > > id stuff, and I'm mostly coming around to the ideas presented.
> > >
> > > One question I have is how do we envisage the userspace vulkan driver
> > > using things.
> > >
> > > I kinda feel I'm missing the difference between APIs that access
> > > things on the CPU side and command for accessing things on the GPU
> > > side in the proposal. In the gallium world the "screen" allocates
> > > resources (memory + properties) synchronously on the API being
> > > accessed, the context is then for operating on GPU side things where
> > > we batch up a command stream and it is processed async.
> > >
> > > From the Vulkan API POV the application API is multi-thread safe, and
> > > we should avoid if we can taking too many locks under the covers, esp
> > > in common paths. Vulkan applications are also encouraged to allocate
> > > memory in large chunks and subdivide them between resources.
> > >
> > > I'm concerned that we are thinking of batching allocations in the
> > > userspace driver (or in the kernel) and how to flush those to the host
> > > side etc. If we have two threads in userspace allocate memory from the
> > > vulkan API, and one then does a transfer into the memory, how do we
> > > envisage that being flushed to the host side? Like if I allocate
> > > memory in one thread, then create images from that memory in another,
> > > how does that work out?
> > >
>
> The goal of encoding vkAllocateMemory in the execbuffer command stream
> is not for batching.  It is to reuse the mechanism to send
> API-specific opaque alloc command to the host, and to allow
> allocations without resources (e.g., non-shareable allocations from a
> non-mappable heap do not need resources).
>
> In the current (but outdated) code[1], there is a per-VkInstance
> execbuffer command stream struct (struct vn_cs).  Encoding to the
> vn_cs requires a per-instance lock to be taken.  There is also a
> per-VkCommandBuffer vn_cs.  Encoding to that vn_cs requires no
> locking.  Multiple-threading is only beneficial when the app uses that
> to build their VkCommandBuffers.
>
> But vkAllocateMemory can be changed to use a local vn_cs or a local
> template to be lock-free.  It will be like
>
>   mem->object_id = next_object_id();
>
>   local_cmd_templ[ALLOCATION_SIZE] = info->allocationSize;
>   local_cmd_templ[MEMORY_TYPE_INDEX] = info->memoryTypeIndex;
>   local_cmd_templ[OBJECT_ID] = mem->object_id;
>
>   // when a resource is needed;  otherwise, use EXECBUFFER instead
>   struct drm_virtgpu_resource_create_blob args = {
>     .size = info->allocationSize,
>     .flags = VIRTGPU_RESOURCE_FLAG_STORAGE_HOSTMEM,
>     .cmd_size = sizeof(local_cmd_templ),
>     .cmd = local_cmd_templ,
>     .object_id = mem->object_id
>   };
>   drmIoctl(fd, DRM_IOCTL_VIRTIO_GPU_RESOURCE_CREATE_BLOB, &args);
>
>   mem->resource_id = args.res_handle;
>   mem->bo = args.bo_handle;
>
> I think Gurchetan's proposal will look similar, except that the
> command stream will be replaced by something more flexible such that
> object id is optional.
>
> In the current design (v2), the host will
>
>  - allocate a VkDeviceMemory from the app's VkInstance
>  - export an fd (or whatever keeps the underlying storage alive)
>  - create a global resource struct to manage the fd
>
> For comparison, it is possible to extend the existing mechanism (v1)
> in a different way such that the host will
>
>  - allocate a VkDeviceMemory from a global allocator VkInstance
>  - create a resource struct to manage the VkDeviceMemory
>  - attach the resource struct to the app's VkInstance
>  - export an fd and import its as a VkDeviceMemory in app's VkInstance
>
> One (v2) exports app's VkDeviceMemory as a global resource while the
> other (extended v1) imports a global resource as app's VkDeviceMemory.
>
> [1] https://gitlab.freedesktop.org/olv/mesa/commits/venus-wsi
That was my thoughts on vkAllocateMemory and multi-threading.  I
missed the question on images.

Assuming the vkAllocateMemory code above, vkCreateImage will be like

  img->object_id = next_object_id();
  lock_and_encode_to_per_instance_vn_cs(...,
    {VK_CREATE_IMAGE, ..., img->object_id});

vkBindImageMemory will be like

  lock_and_encode_per_instance_vn_cs(...,
    {VK_BIND_IMAGE_MEMORY, img->object_id,
     mem->object_id, ...});

There will be no lock contention between the allocating thread and
this main thread, but it is hard to avoid locking completely.

I hope this helps.

> > > I don't see what having a command stream for allocating memory is a
> > > good thing here, apart from batching up things that Vulkan really
> > > wants the app to batch up itself anyways.
> > >
> > > Hopefully I'm missing something in the previous discussions.
> > >
> > > Dave.