<div dir="ltr"><div dir="ltr"> </div> <div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Feb 28, 2020 at 11:07 AM Chia-I Wu <<a href="mailto:olvaffe@gmail.com">olvaffe@gmail.com</a>> wrote: </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, Feb 27, 2020 at 5:37 PM Dave Airlie <<a href="mailto:airlied@gmail.com" target="_blank">airlied@gmail.com</a>> wrote: > > On Fri, 28 Feb 2020 at 08:07, Chia-I Wu <<a href="mailto:olvaffe@gmail.com" target="_blank">olvaffe@gmail.com</a>> wrote: > > > > On Thu, Feb 27, 2020 at 11:45 AM Dave Airlie <<a href="mailto:airlied@gmail.com" target="_blank">airlied@gmail.com</a>> wrote: > > > > > > Realised you might not be reading the list, or I asked too hard a question :-P > > Sorry that I missed this. > > > > > > On Tue, 25 Feb 2020 at 12:59, Dave Airlie <<a href="mailto:airlied@gmail.com" target="_blank">airlied@gmail.com</a>> wrote: > > > > > > > > Okay I think I'm following along the mutiprocess model, and the object > > > > id stuff, and I'm mostly coming around to the ideas presented. > > > > > > > > One question I have is how do we envisage the userspace vulkan driver > > > > using things. > > > > > > > > I kinda feel I'm missing the difference between APIs that access > > > > things on the CPU side and command for accessing things on the GPU > > > > side in the proposal. In the gallium world the "screen" allocates > > > > resources (memory + properties) synchronously on the API being > > > > accessed, the context is then for operating on GPU side things where > > > > we batch up a command stream and it is processed async. > > > > > > > > From the Vulkan API POV the application API is multi-thread safe, and > > > > we should avoid if we can taking too many locks under the covers, esp > > > > in common paths. Vulkan applications are also encouraged to allocate > > > > memory in large chunks and subdivide them between resources. > > > > > > > > I'm concerned that we are thinking of batching allocations in the > > > > userspace driver (or in the kernel) and how to flush those to the host > > > > side etc. If we have two threads in userspace allocate memory from the > > > > vulkan API, and one then does a transfer into the memory, how do we > > > > envisage that being flushed to the host side? Like if I allocate > > > > memory in one thread, then create images from that memory in another, > > > > how does that work out? > > > > > > > > The goal of encoding vkAllocateMemory in the execbuffer command stream > > is not for batching. It is to reuse the mechanism to send > > API-specific opaque alloc command to the host, and to allow > > allocations without resources (e.g., non-shareable allocations from a > > non-mappable heap do not need resources). > > > > In the current (but outdated) code[1], there is a per-VkInstance > > execbuffer command stream struct (struct vn_cs). Encoding to the > > vn_cs requires a per-instance lock to be taken. There is also a > > per-VkCommandBuffer vn_cs. Encoding to that vn_cs requires no > > locking. Multiple-threading is only beneficial when the app uses that > > to build their VkCommandBuffers. > > Imma gonna stop you there :-P, multithread vulkan apps are the normal > use case, not a special case. We do not design any vulkan things for > GL application ideas, Vulkan is different, multi-threaded command > buffer building is basic vulkan. That is how the current code looks like. It is very naive and my focus was also a vk.xml parser. I don't know if anyone has ever looked into the locking design (or command submission or sync primitives) more seriously. This can be a good chance to work out a design. > > Having a per-instance lock is bad if it's being taken across multiple > threads in normal use cases. > > Though it's quite likely due to VM design we have to take a lock at > some point on those paths, it would be good to be explicit in the > design of the impacts of every lock. Like we will likely need locks in > the kernel submission paths anyways. The current design essentially looks at the first parameter (the dispatchable object) of a function, and if it is not externally synced and the function needs to be executed by the host, a cs lock is grabbed to encode the function. We can add cs to more dispatchable objects. But I think we are looking for ways to handle (or batch) functions locally to minimize locking. One idea is that, say given this sequence {vkCreateImage, vkBindImageMemory, vkCmdCopyImage } </blockquote><div> </div><div>Android Emulator Vulkan does something similar to this in certain cases, like translating guest vkCreateImage requests to APIs that extract requirements along with the image:</div><div> </div><div><a href="https://android.googlesource.com/platform/external/qemu/+/refs/heads/emu-master-dev/android/android-emugl/host/libs/libOpenglRender/vulkan-registry/xml/vk.xml#6351">https://android.googlesource.com/platform/external/qemu/+/refs/heads/emu-master-dev/android/android-emugl/host/libs/libOpenglRender/vulkan-registry/xml/vk.xml#6351</a></div><div> </div><div>However, this opens up the possibility of a lot of grungy manual work. The solution that I'm going for long term is to automatically optimize the command protocol itself via something similar to PGO.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Instead of grabbing the per-instance (or per-device) lock for two times to encode the first two functions separately, we can encode the first two functions lock-free to a per-image storage first, and copy the contents into the cs last minute. vkCmdCopyImage is only shown as an example. We need to make sure the host sees the first two functions before it sees vkCmdCopyImage. It does not mean that vkCmdCopyImage triggers the copying and flushing. There are also cases where things can be handled inside the guest. When a VkDeviceMemory has a guest shmem, vkMapMemory can be guest-only for example. > > > But vkAllocateMemory can be changed to use a local vn_cs or a local > > template to be lock-free. It will be like > > > > mem->object_id = next_object_id(); > > > > local_cmd_templ[ALLOCATION_SIZE] = info->allocationSize; > > local_cmd_templ[MEMORY_TYPE_INDEX] = info->memoryTypeIndex; > > local_cmd_templ[OBJECT_ID] = mem->object_id; > > > > // when a resource is needed; otherwise, use EXECBUFFER instead > > struct drm_virtgpu_resource_create_blob args = { > > .size = info->allocationSize, > > .flags = VIRTGPU_RESOURCE_FLAG_STORAGE_HOSTMEM, > > .cmd_size = sizeof(local_cmd_templ), > > .cmd = local_cmd_templ, > > .object_id = mem->object_id > > }; > > drmIoctl(fd, DRM_IOCTL_VIRTIO_GPU_RESOURCE_CREATE_BLOB, &args); > > > > mem->resource_id = args.res_handle; > > mem->bo = args.bo_handle; > > > > I think Gurchetan's proposal will look similar, except that the > > command stream will be replaced by something more flexible such that > > object id is optional. > > > > In the current design (v2), the host will > > > > - allocate a VkDeviceMemory from the app's VkInstance > > VkDeviceMemory is tied to VkDevice object not VkInstance. though this > makes sense either way. Yeah, it is tied to VkDevice. I had one-instance-per-process model in mind and wanted to show the export/import part. > > Okay I'm not entirely comfortable with this design yet, I probably > need to look at the code that's been done so far to get a better > feeling for it. Concern over resource allocation or the userspace driver? I hope it is mostly the latter... > > With the instance_vn_cs, who flushes those to the host, how is that decided? The guest encodes functions in the order they are called (excluding vkCmd*). Flushes happen in vkGet*, vk*Wait*, vkAllocateMemory, vkQueueSubmit, vkEndCommandBuffer, and maybe some more. I don't think they are meaningful though. > > Dave. _______________________________________________ virglrenderer-devel mailing list <a href="mailto:virglrenderer-devel@lists.freedesktop.org" target="_blank">virglrenderer-devel@lists.freedesktop.org</a> <a href="https://lists.freedesktop.org/mailman/listinfo/virglrenderer-devel" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/virglrenderer-devel</a> </blockquote></div></div>