[virglrenderer-devel] vulkan + virgl ioctl vs command submission

Fri Feb 28 01:37:13 UTC 2020

On Fri, 28 Feb 2020 at 08:07, Chia-I Wu <olvaffe at gmail.com> wrote:
>
> On Thu, Feb 27, 2020 at 11:45 AM Dave Airlie <airlied at gmail.com> wrote:
> >
> > Realised you might not be reading the list, or I asked too hard a question :-P
> Sorry that I missed this.
> >
> > On Tue, 25 Feb 2020 at 12:59, Dave Airlie <airlied at gmail.com> wrote:
> > >
> > > Okay I think I'm following along the mutiprocess model, and the object
> > > id stuff, and I'm mostly coming around to the ideas presented.
> > >
> > > One question I have is how do we envisage the userspace vulkan driver
> > > using things.
> > >
> > > I kinda feel I'm missing the difference between APIs that access
> > > things on the CPU side and command for accessing things on the GPU
> > > side in the proposal. In the gallium world the "screen" allocates
> > > resources (memory + properties) synchronously on the API being
> > > accessed, the context is then for operating on GPU side things where
> > > we batch up a command stream and it is processed async.
> > >
> > > From the Vulkan API POV the application API is multi-thread safe, and
> > > we should avoid if we can taking too many locks under the covers, esp
> > > in common paths. Vulkan applications are also encouraged to allocate
> > > memory in large chunks and subdivide them between resources.
> > >
> > > I'm concerned that we are thinking of batching allocations in the
> > > userspace driver (or in the kernel) and how to flush those to the host
> > > side etc. If we have two threads in userspace allocate memory from the
> > > vulkan API, and one then does a transfer into the memory, how do we
> > > envisage that being flushed to the host side? Like if I allocate
> > > memory in one thread, then create images from that memory in another,
> > > how does that work out?
> > >
>
> The goal of encoding vkAllocateMemory in the execbuffer command stream
> is not for batching.  It is to reuse the mechanism to send
> API-specific opaque alloc command to the host, and to allow
> allocations without resources (e.g., non-shareable allocations from a
> non-mappable heap do not need resources).
>
> In the current (but outdated) code[1], there is a per-VkInstance
> execbuffer command stream struct (struct vn_cs).  Encoding to the
> vn_cs requires a per-instance lock to be taken.  There is also a
> per-VkCommandBuffer vn_cs.  Encoding to that vn_cs requires no
> locking.  Multiple-threading is only beneficial when the app uses that
> to build their VkCommandBuffers.

Imma gonna stop you there :-P, multithread vulkan apps are the normal
use case, not a special case. We do not design any vulkan things for
GL application ideas, Vulkan is different, multi-threaded command
buffer building is basic vulkan.

Having a per-instance lock is bad if it's being taken across multiple
threads in normal use cases.

Though it's quite likely due to VM design we have to take a lock at
some point on those paths, it would be good to be explicit in the
design of the impacts of every lock. Like we will likely need locks in
the kernel submission paths anyways.

> But vkAllocateMemory can be changed to use a local vn_cs or a local
> template to be lock-free.  It will be like
>
>   mem->object_id = next_object_id();
>
>   local_cmd_templ[ALLOCATION_SIZE] = info->allocationSize;
>   local_cmd_templ[MEMORY_TYPE_INDEX] = info->memoryTypeIndex;
>   local_cmd_templ[OBJECT_ID] = mem->object_id;
>
>   // when a resource is needed;  otherwise, use EXECBUFFER instead
>   struct drm_virtgpu_resource_create_blob args = {
>     .size = info->allocationSize,
>     .flags = VIRTGPU_RESOURCE_FLAG_STORAGE_HOSTMEM,
>     .cmd_size = sizeof(local_cmd_templ),
>     .cmd = local_cmd_templ,
>     .object_id = mem->object_id
>   };
>   drmIoctl(fd, DRM_IOCTL_VIRTIO_GPU_RESOURCE_CREATE_BLOB, &args);
>
>   mem->resource_id = args.res_handle;
>   mem->bo = args.bo_handle;
>
> I think Gurchetan's proposal will look similar, except that the
> command stream will be replaced by something more flexible such that
> object id is optional.
>
> In the current design (v2), the host will
>
>  - allocate a VkDeviceMemory from the app's VkInstance

VkDeviceMemory is tied to VkDevice object not VkInstance. though this
makes sense either way.

Okay I'm not entirely comfortable with this design yet, I probably
need to look at the code that's been done so far to get a better
feeling for it.

With the instance_vn_cs, who flushes those to the host, how is that decided?

Dave.