<div dir="ltr"><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura <<a href="mailto:niranjana.vishwanathapura@intel.com">niranjana.vishwanathapura@intel.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin wrote:<br>
> On 02/06/2022 23:35, Jason Ekstrand wrote:<br>
><br>
> On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura<br>
> <<a href="mailto:niranjana.vishwanathapura@intel.com" target="_blank">niranjana.vishwanathapura@intel.com</a>> wrote:<br>
><br>
> On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew Brost wrote:<br>
> >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin wrote:<br>
> >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote:<br>
> >> > +VM_BIND/UNBIND ioctl will immediately start binding/unbinding<br>
> the mapping in an<br>
> >> > +async worker. The binding and unbinding will work like a special<br>
> GPU engine.<br>
> >> > +The binding and unbinding operations are serialized and will<br>
> wait on specified<br>
> >> > +input fences before the operation and will signal the output<br>
> fences upon the<br>
> >> > +completion of the operation. Due to serialization, completion of<br>
> an operation<br>
> >> > +will also indicate that all previous operations are also<br>
> complete.<br>
> >><br>
> >> I guess we should avoid saying "will immediately start<br>
> binding/unbinding" if<br>
> >> there are fences involved.<br>
> >><br>
> >> And the fact that it's happening in an async worker seem to imply<br>
> it's not<br>
> >> immediate.<br>
> >><br>
><br>
> Ok, will fix.<br>
> This was added because in earlier design binding was deferred until<br>
> next execbuff.<br>
> But now it is non-deferred (immediate in that sense). But yah, this is<br>
> confusing<br>
> and will fix it.<br>
><br>
> >><br>
> >> I have a question on the behavior of the bind operation when no<br>
> input fence<br>
> >> is provided. Let say I do :<br>
> >><br>
> >> VM_BIND (out_fence=fence1)<br>
> >><br>
> >> VM_BIND (out_fence=fence2)<br>
> >><br>
> >> VM_BIND (out_fence=fence3)<br>
> >><br>
> >><br>
> >> In what order are the fences going to be signaled?<br>
> >><br>
> >> In the order of VM_BIND ioctls? Or out of order?<br>
> >><br>
> >> Because you wrote "serialized I assume it's : in order<br>
> >><br>
><br>
> Yes, in the order of VM_BIND/UNBIND ioctls. Note that bind and unbind<br>
> will use<br>
> the same queue and hence are ordered.<br>
><br>
> >><br>
> >> One thing I didn't realize is that because we only get one<br>
> "VM_BIND" engine,<br>
> >> there is a disconnect from the Vulkan specification.<br>
> >><br>
> >> In Vulkan VM_BIND operations are serialized but per engine.<br>
> >><br>
> >> So you could have something like this :<br>
> >><br>
> >> VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2)<br>
> >><br>
> >> VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4)<br>
> >><br>
> >><br>
> >> fence1 is not signaled<br>
> >><br>
> >> fence3 is signaled<br>
> >><br>
> >> So the second VM_BIND will proceed before the first VM_BIND.<br>
> >><br>
> >><br>
> >> I guess we can deal with that scenario in userspace by doing the<br>
> wait<br>
> >> ourselves in one thread per engines.<br>
> >><br>
> >> But then it makes the VM_BIND input fences useless.<br>
> >><br>
> >><br>
> >> Daniel : what do you think? Should be rework this or just deal with<br>
> wait<br>
> >> fences in userspace?<br>
> >><br>
> ><br>
> >My opinion is rework this but make the ordering via an engine param<br>
> optional.<br>
> ><br>
> >e.g. A VM can be configured so all binds are ordered within the VM<br>
> ><br>
> >e.g. A VM can be configured so all binds accept an engine argument<br>
> (in<br>
> >the case of the i915 likely this is a gem context handle) and binds<br>
> >ordered with respect to that engine.<br>
> ><br>
> >This gives UMDs options as the later likely consumes more KMD<br>
> resources<br>
> >so if a different UMD can live with binds being ordered within the VM<br>
> >they can use a mode consuming less resources.<br>
> ><br>
><br>
> I think we need to be careful here if we are looking for some out of<br>
> (submission) order completion of vm_bind/unbind.<br>
> In-order completion means, in a batch of binds and unbinds to be<br>
> completed in-order, user only needs to specify in-fence for the<br>
> first bind/unbind call and the our-fence for the last bind/unbind<br>
> call. Also, the VA released by an unbind call can be re-used by<br>
> any subsequent bind call in that in-order batch.<br>
><br>
> These things will break if binding/unbinding were to be allowed to<br>
> go out of order (of submission) and user need to be extra careful<br>
> not to run into pre-mature triggereing of out-fence and bind failing<br>
> as VA is still in use etc.<br>
><br>
> Also, VM_BIND binds the provided mapping on the specified address<br>
> space<br>
> (VM). So, the uapi is not engine/context specific.<br>
><br>
> We can however add a 'queue' to the uapi which can be one from the<br>
> pre-defined queues,<br>
> I915_VM_BIND_QUEUE_0<br>
> I915_VM_BIND_QUEUE_1<br>
> ...<br>
> I915_VM_BIND_QUEUE_(N-1)<br>
><br>
> KMD will spawn an async work queue for each queue which will only<br>
> bind the mappings on that queue in the order of submission.<br>
> User can assign the queue to per engine or anything like that.<br>
><br>
> But again here, user need to be careful and not deadlock these<br>
> queues with circular dependency of fences.<br>
><br>
> I prefer adding this later an as extension based on whether it<br>
> is really helping with the implementation.<br>
><br>
> I can tell you right now that having everything on a single in-order<br>
> queue will not get us the perf we want. What vulkan really wants is one<br>
> of two things:<br>
> 1. No implicit ordering of VM_BIND ops. They just happen in whatever<br>
> their dependencies are resolved and we ensure ordering ourselves by<br>
> having a syncobj in the VkQueue.<br>
> 2. The ability to create multiple VM_BIND queues. We need at least 2<br>
> but I don't see why there needs to be a limit besides the limits the<br>
> i915 API already has on the number of engines. Vulkan could expose<br>
> multiple sparse binding queues to the client if it's not arbitrarily<br>
> limited.<br>
<br>
Thanks Jason, Lionel.<br>
<br>
Jason, what are you referring to when you say "limits the i915 API already<br>
has on the number of engines"? I am not sure if there is such an uapi today.<br></blockquote><div><br></div><div>There's a limit of something like 64 total engines today based on the number of bits we can cram into the exec flags in execbuffer2. I think someone had an extended version that allowed more but I ripped it out because no one was using it. Of course, execbuffer3 might not have that problem at all.<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
I am trying to see how many queues we need and don't want it to be arbitrarily<br>
large and unduely blow up memory usage and complexity in i915 driver.<br></blockquote><div><br></div><div>I expect a Vulkan driver to use at most 2 in the vast majority of cases. I could imagine a client wanting to create more than 1 sparse queue in which case, it'll be N+1 but that's unlikely. As far as complexity goes, once you allow two, I don't think the complexity is going up by allowing N. As for memory usage, creating more queues means more memory. That's a trade-off that userspace can make. Again, the expected number here is 1 or 2 in the vast majority of cases so I don't think you need to worry.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> Why? Because Vulkan has two basic kind of bind operations and we don't<br>
> want any dependencies between them:<br>
> 1. Immediate. These happen right after BO creation or maybe as part of<br>
> vkBindImageMemory() or VkBindBufferMemory(). These don't happen on a<br>
> queue and we don't want them serialized with anything. To synchronize<br>
> with submit, we'll have a syncobj in the VkDevice which is signaled by<br>
> all immediate bind operations and make submits wait on it.<br>
> 2. Queued (sparse): These happen on a VkQueue which may be the same as<br>
> a render/compute queue or may be its own queue. It's up to us what we<br>
> want to advertise. From the Vulkan API PoV, this is like any other<br>
> queue. Operations on it wait on and signal semaphores. If we have a<br>
> VM_BIND engine, we'd provide syncobjs to wait and signal just like we do<br>
> in execbuf().<br>
> The important thing is that we don't want one type of operation to block<br>
> on the other. If immediate binds are blocking on sparse binds, it's<br>
> going to cause over-synchronization issues.<br>
> In terms of the internal implementation, I know that there's going to be<br>
> a lock on the VM and that we can't actually do these things in<br>
> parallel. That's fine. Once the dma_fences have signaled and we're<br>
<br>
Thats correct. It is like a single VM_BIND engine with multiple queues<br>
feeding to it.<br></blockquote><div><br></div><div>Right. As long as the queues themselves are independent and can block on dma_fences without holding up other queues, I think we're fine.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> unblocked to do the bind operation, I don't care if there's a bit of<br>
> synchronization due to locking. That's expected. What we can't afford<br>
> to have is an immediate bind operation suddenly blocking on a sparse<br>
> operation which is blocked on a compute job that's going to run for<br>
> another 5ms.<br>
<br>
As the VM_BIND queue is per VM, VM_BIND on one VM doesn't block the VM_BIND<br>
on other VMs. I am not sure about usecases here, but just wanted to clarify.<br></blockquote><div><br></div><div>Yes, that's what I would expect.</div><div><br></div><div>--Jason</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Niranjana<br>
<br>
> For reference, Windows solves this by allowing arbitrarily many paging<br>
> queues (what they call a VM_BIND engine/queue). That design works<br>
> pretty well and solves the problems in question. Again, we could just<br>
> make everything out-of-order and require using syncobjs to order things<br>
> as userspace wants. That'd be fine too.<br>
> One more note while I'm here: danvet said something on IRC about VM_BIND<br>
> queues waiting for syncobjs to materialize. We don't really want/need<br>
> this. We already have all the machinery in userspace to handle<br>
> wait-before-signal and waiting for syncobj fences to materialize and<br>
> that machinery is on by default. It would actually take MORE work in<br>
> Mesa to turn it off and take advantage of the kernel being able to wait<br>
> for syncobjs to materialize. Also, getting that right is ridiculously<br>
> hard and I really don't want to get it wrong in kernel space. When we<br>
> do memory fences, wait-before-signal will be a thing. We don't need to<br>
> try and make it a thing for syncobj.<br>
> --Jason<br>
><br>
> Thanks Jason,<br>
><br>
> I missed the bit in the Vulkan spec that we're allowed to have a sparse<br>
> queue that does not implement either graphics or compute operations :<br>
><br>
> "While some implementations may include VK_QUEUE_SPARSE_BINDING_BIT<br>
> support in queue families that also include<br>
><br>
> graphics and compute support, other implementations may only expose a<br>
> VK_QUEUE_SPARSE_BINDING_BIT-only queue<br>
><br>
> family."<br>
><br>
> So it can all be all a vm_bind engine that just does bind/unbind<br>
> operations.<br>
><br>
> But yes we need another engine for the immediate/non-sparse operations.<br>
><br>
> -Lionel<br>
><br>
> <br>
><br>
> Daniel, any thoughts?<br>
><br>
> Niranjana<br>
><br>
> >Matt<br>
> ><br>
> >><br>
> >> Sorry I noticed this late.<br>
> >><br>
> >><br>
> >> -Lionel<br>
> >><br>
> >><br>
</blockquote></div></div>