[virglrenderer-devel] vulkan + virgl ioctl vs command submission

Chia-I Wu olvaffe at gmail.com
Fri Feb 28 19:07:17 UTC 2020


On Thu, Feb 27, 2020 at 5:37 PM Dave Airlie <airlied at gmail.com> wrote:
>
> On Fri, 28 Feb 2020 at 08:07, Chia-I Wu <olvaffe at gmail.com> wrote:
> >
> > On Thu, Feb 27, 2020 at 11:45 AM Dave Airlie <airlied at gmail.com> wrote:
> > >
> > > Realised you might not be reading the list, or I asked too hard a question :-P
> > Sorry that I missed this.
> > >
> > > On Tue, 25 Feb 2020 at 12:59, Dave Airlie <airlied at gmail.com> wrote:
> > > >
> > > > Okay I think I'm following along the mutiprocess model, and the object
> > > > id stuff, and I'm mostly coming around to the ideas presented.
> > > >
> > > > One question I have is how do we envisage the userspace vulkan driver
> > > > using things.
> > > >
> > > > I kinda feel I'm missing the difference between APIs that access
> > > > things on the CPU side and command for accessing things on the GPU
> > > > side in the proposal. In the gallium world the "screen" allocates
> > > > resources (memory + properties) synchronously on the API being
> > > > accessed, the context is then for operating on GPU side things where
> > > > we batch up a command stream and it is processed async.
> > > >
> > > > From the Vulkan API POV the application API is multi-thread safe, and
> > > > we should avoid if we can taking too many locks under the covers, esp
> > > > in common paths. Vulkan applications are also encouraged to allocate
> > > > memory in large chunks and subdivide them between resources.
> > > >
> > > > I'm concerned that we are thinking of batching allocations in the
> > > > userspace driver (or in the kernel) and how to flush those to the host
> > > > side etc. If we have two threads in userspace allocate memory from the
> > > > vulkan API, and one then does a transfer into the memory, how do we
> > > > envisage that being flushed to the host side? Like if I allocate
> > > > memory in one thread, then create images from that memory in another,
> > > > how does that work out?
> > > >
> >
> > The goal of encoding vkAllocateMemory in the execbuffer command stream
> > is not for batching.  It is to reuse the mechanism to send
> > API-specific opaque alloc command to the host, and to allow
> > allocations without resources (e.g., non-shareable allocations from a
> > non-mappable heap do not need resources).
> >
> > In the current (but outdated) code[1], there is a per-VkInstance
> > execbuffer command stream struct (struct vn_cs).  Encoding to the
> > vn_cs requires a per-instance lock to be taken.  There is also a
> > per-VkCommandBuffer vn_cs.  Encoding to that vn_cs requires no
> > locking.  Multiple-threading is only beneficial when the app uses that
> > to build their VkCommandBuffers.
>
> Imma gonna stop you there :-P, multithread vulkan apps are the normal
> use case, not a special case. We do not design any vulkan things for
> GL application ideas, Vulkan is different, multi-threaded command
> buffer building is basic vulkan.
That is how the current code looks like.  It is very naive and my
focus was also a vk.xml parser.  I don't know if anyone has ever
looked into the locking design (or command submission or sync
primitives) more seriously.  This can be a good chance to work out a
design.


>
> Having a per-instance lock is bad if it's being taken across multiple
> threads in normal use cases.
>
> Though it's quite likely due to VM design we have to take a lock at
> some point on those paths, it would be good to be explicit in the
> design of the impacts of every lock. Like we will likely need locks in
> the kernel submission paths anyways.

The current design essentially looks at the first parameter (the
dispatchable object) of a function, and if it is not externally synced
and the function needs to be executed by the host, a cs lock is
grabbed to encode the function.   We can add cs to more dispatchable
objects.  But I think we are looking for ways to handle (or batch)
functions locally to minimize locking.

One idea is that, say given this sequence

  {vkCreateImage, vkBindImageMemory, vkCmdCopyImage }

Instead of grabbing the per-instance (or per-device) lock for two
times to encode the first two functions separately, we can encode the
first two functions lock-free to a per-image storage first, and copy
the contents into the cs last minute.  vkCmdCopyImage is only shown as
an example.  We need to make sure the host sees the first two
functions before it sees vkCmdCopyImage.  It does not mean that
vkCmdCopyImage triggers the copying and flushing.

There are also cases where things can be handled inside the guest.
When a VkDeviceMemory has a guest shmem, vkMapMemory can be guest-only
for example.


>
> > But vkAllocateMemory can be changed to use a local vn_cs or a local
> > template to be lock-free.  It will be like
> >
> >   mem->object_id = next_object_id();
> >
> >   local_cmd_templ[ALLOCATION_SIZE] = info->allocationSize;
> >   local_cmd_templ[MEMORY_TYPE_INDEX] = info->memoryTypeIndex;
> >   local_cmd_templ[OBJECT_ID] = mem->object_id;
> >
> >   // when a resource is needed;  otherwise, use EXECBUFFER instead
> >   struct drm_virtgpu_resource_create_blob args = {
> >     .size = info->allocationSize,
> >     .flags = VIRTGPU_RESOURCE_FLAG_STORAGE_HOSTMEM,
> >     .cmd_size = sizeof(local_cmd_templ),
> >     .cmd = local_cmd_templ,
> >     .object_id = mem->object_id
> >   };
> >   drmIoctl(fd, DRM_IOCTL_VIRTIO_GPU_RESOURCE_CREATE_BLOB, &args);
> >
> >   mem->resource_id = args.res_handle;
> >   mem->bo = args.bo_handle;
> >
> > I think Gurchetan's proposal will look similar, except that the
> > command stream will be replaced by something more flexible such that
> > object id is optional.
> >
> > In the current design (v2), the host will
> >
> >  - allocate a VkDeviceMemory from the app's VkInstance
>
> VkDeviceMemory is tied to VkDevice object not VkInstance. though this
> makes sense either way.

Yeah, it is tied to VkDevice.  I had one-instance-per-process model in
mind and wanted to show the export/import part.

>
> Okay I'm not entirely comfortable with this design yet, I probably
> need to look at the code that's been done so far to get a better
> feeling for it.
Concern over resource allocation or the userspace driver?  I hope it
is mostly the latter...

>
> With the instance_vn_cs, who flushes those to the host, how is that decided?
The guest encodes functions in the order they are called (excluding
vkCmd*).  Flushes happen in vkGet*, vk*Wait*, vkAllocateMemory,
vkQueueSubmit, vkEndCommandBuffer, and maybe some more.  I don't think
they are meaningful though.


>
> Dave.


More information about the virglrenderer-devel mailing list