<!DOCTYPE html><html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> Am 21.11.24 um 19:41 schrieb Matthew Brost: <blockquote type="cite" cite="mid:Zz9+87ZUpmD5VGDL@lstrano-desk.jf.intel.com">[SNIP] <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">+ ops->preempt_finished(pfence); </pre> </blockquote> <pre class="moz-quote-pre" wrap="">Why is that callback useful? </pre> </blockquote> <pre class="moz-quote-pre" wrap="">In Xe, this is where we kick the resume worker and drop a ref to the preemption object, which in Xe is an individual queue, and in AMD it is a VM, right? </pre> </blockquote> <pre class="moz-quote-pre" wrap=""> Correct. The whole VM is preempted since we don't know which queue is using which BO. </pre> </blockquote> <pre class="moz-quote-pre" wrap=""> Right. In Xe we don't know which queue is using a BO either - we trigger all queues preempt fences attached to the VM effectively preempting the entire VM. Per VM preempt fence or per queue preempt fence is a driver choice (I can see arguments for both cases) is the point here and any base class shouldn't dictate what a driver wants to do.</pre> </blockquote> Oh, I think we need to unify that for drivers or at least find an interface which works for both cases. And yeah, I agree there are really good arguments for both directions. Essentially you want separate preemption fences for each queue, but only one restore worker. <blockquote type="cite" cite="mid:Zz9+87ZUpmD5VGDL@lstrano-desk.jf.intel.com"> <blockquote type="cite"> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">wrt preemption object, I've reasoned this should work for an either per queue or VM driver design of preempt fences. This part likely could be moved into the preempt_wait callback though but would get a little goofy in the error case if preempt_wait is not called as the driver side would still need to cleanup a ref. Maybe I don't even need a ref though - have to think that through - but for general safety we typically like to take refs whenever a fence references a different object. </pre> </blockquote> <pre class="moz-quote-pre" wrap=""> The tricky part is that at least for us we need to do this *before* the fence is signaled. </pre> </blockquote> <pre class="moz-quote-pre" wrap=""> Hmm, I'm a little confused by this. Do you think the code as is missing somehthing or opposed to keeping the preempt_finished vfunc?</pre> </blockquote> I think we first need a complete picture how all that is supposed to work. When we say we resume only on demand then this callback would make sense I think, but at least the AMD solution doesn't do that at the moment. <blockquote type="cite" cite="mid:Zz9+87ZUpmD5VGDL@lstrano-desk.jf.intel.com"> <blockquote type="cite"> <pre class="moz-quote-pre" wrap=""> This way we can do something like: retry: mutex_lock(&lock); if (dma_fence_is_signaled(preemept_fence)) { mutex_unlock(&lock); flush_work(resume_work); gotot retry; } </pre> </blockquote> <pre class="moz-quote-pre" wrap=""> This snippet is from your convert user fence to dma fence IOCTL?</pre> </blockquote> Yes. <blockquote type="cite" cite="mid:Zz9+87ZUpmD5VGDL@lstrano-desk.jf.intel.com"> <pre class="moz-quote-pre" wrap="">I think this makes sense given your design of the conver user to dma fence IOCTL not actually doing a resume - I landed making that IOCTL basically another version of the resume worker for simplicity but that may change if we find locking the entire VM is too costly.</pre> </blockquote> Well we could also resume on demand (thinking more about it that's most likely the better approach) but I would implement it something like this then: <pre class="moz-quote-pre" wrap="">retry: mutex_lock(&lock); if (dma_fence_is_signaled(preemept_fence)) { mutex_unlock(&lock); schedule_work(resume_work); flush_work(resume_work); gotot retry; } </pre> This way we don't run into issues when multiple participants try to resume at the same time. E.g. multiple threads where each one tries to submit work to different queues at the same time. [SNIP] <blockquote type="cite" cite="mid:Zz9+87ZUpmD5VGDL@lstrano-desk.jf.intel.com"> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">+ fence = pfence->ops->preempt_delay(pfence); </pre> </blockquote> <pre class="moz-quote-pre" wrap="">Mhm, why is that useful? </pre> </blockquote> <pre class="moz-quote-pre" wrap="">This for attaching the preempt object's last exported fence which needs to be signaled before the preemption is issued. So for purely long running VM's, this function could be NULL. For VM's with user queues + dma fences, the driver returns the last fence from the convert user fence to dma-fence IOCTL. I realized my kernel doc doesn't explain this as well as it should, I have already made this more verbose locally and hopefully it clearly explains all of this. </pre> </blockquote> <pre class="moz-quote-pre" wrap=""> That part was actually obvious. But I would expected that to be push interface instead of a pull interface. E.g. the preemption fence would also provide something like a manager object which has a mutex, the last exported user fence and the necessary functionality to update this user fence. </pre> </blockquote> <pre class="moz-quote-pre" wrap=""> Hmm, I rather like the pull interface. In Xe this is dma-fence chain attached to the VM. It safe pull given our covert IOCTL takes the VM's dma-resv locks / notifier locks before publishing the user fence.</pre> </blockquote> I don't think that will work like this. Publishing the user fence must be serialized with signaling the preemption fence. And that serialization can't be done by the dma-resv lock nor the notifier lock because we can't let a dma_fence signaling depend on them. That this is a separate lock or similar mechanism is a must have. <blockquote type="cite" cite="mid:Zz9+87ZUpmD5VGDL@lstrano-desk.jf.intel.com"> <pre class="moz-quote-pre" wrap="">In your design couldn't you use spin lock in the last step of publishing a user fence which checks for sw signaling on the preempt fence, if it enabled restart the IOCTL waiting the resume worker? Then in this vfunc pull the fence under the spin lock?</pre> </blockquote> Yeah that could maybe work, but there is also a different challenge to keep in mind. See below. <blockquote type="cite" cite="mid:Zz9+87ZUpmD5VGDL@lstrano-desk.jf.intel.com"> <pre class="moz-quote-pre" wrap="">Not opposed to a push interface though if you really think this the way to go. Quite certain I could make that work for Xe too. </pre> </blockquote> A push interface is just easier to validate. Keep in mind that you can only update the user fence when you can guarantee that the preemption fence is not signaled nor in the process of signaling. So when you create a user fence the approach becomes: 1. kmalloc the fence structure, 2. Initialize the fence. 3. Push it into the premption manage of your queue, this makes sure that the queue is runable. 4. Publish the new fence in drm_sync, dma_resv etc... <blockquote type="cite" cite="mid:Zz9+87ZUpmD5VGDL@lstrano-desk.jf.intel.com"> <pre class="moz-quote-pre" wrap="">This definitely works as is - I've tested this. If dma-fence's lock was embedded within the dma-fence, then ofc lockdep would complain without nesting. It isn't though - the spin lock is passed in as argument so lockdep can identify the locks for 'preempt fence lock' and 'last exported fence' as independent locks.</pre> </blockquote> Ah, good point. But exactly that passing in of the lock is what we try to get away from to allow dma_fences to surpass the module they issued. <blockquote type="cite" cite="mid:Zz9+87ZUpmD5VGDL@lstrano-desk.jf.intel.com"> <blockquote type="cite"> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">Lockdep does not explode in Xe but maybe can buy this is a little unsafe. We could always move preempt_delay to the worker, attach a CB, and rekick the worker upon the last fence signaling if you think that is safer. Of course we could always just directly wait on the returned last fence in the worker too. </pre> </blockquote> <pre class="moz-quote-pre" wrap=""> Yeah I that is basically what we do at the moment since you also need to make sure that no new user fence is installed while you wait for the latest to signal. </pre> </blockquote> <pre class="moz-quote-pre" wrap=""> After I typed this realized waiting on 'last fence' in the worker is a no go given we want to pipeline preemptions in Xe (e.g. issue all queues preemption commands to firmware in parallel as these are async operations which may be fast in cases and slow in others). I think having preempt vfunc done directly in a dma-fence CB is a must.</pre> </blockquote> At least with out current design that won't work because you need to somehow prevent installing new user fences while the preemption fence is signaling. Currently we do that by holding a mutex, but you can't hold a mutex and return from a worker and then drop the mutex again in a different worker (ok in theory you can, but that is so strongly disregarded that upstream would probably reject the code). Regards, Christian. <blockquote type="cite" cite="mid:Zz9+87ZUpmD5VGDL@lstrano-desk.jf.intel.com"> <pre class="moz-quote-pre" wrap=""> Matt </pre> </blockquote> </body> </html>