[Intel-xe] [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences

Tue Apr 4 19:02:23 UTC 2023

On Tue, Apr 04, 2023 at 08:14:01PM +0200, Thomas Hellström (Intel) wrote:
> 
> On 4/4/23 15:10, Christian König wrote:
> > Am 04.04.23 um 14:54 schrieb Thomas Hellström:
> > > Hi, Christian,
> > > 
> > > On 4/4/23 11:09, Christian König wrote:
> > > > Am 04.04.23 um 02:22 schrieb Matthew Brost:
> > > > > From: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > > > > 
> > > > > For long-running workloads, drivers either need to open-code
> > > > > completion
> > > > > waits, invent their own synchronization primitives or internally use
> > > > > dma-fences that do not obey the cross-driver dma-fence protocol, but
> > > > > without any lockdep annotation all these approaches are error prone.
> > > > > 
> > > > > So since for example the drm scheduler uses dma-fences it is
> > > > > desirable for
> > > > > a driver to be able to use it for throttling and error
> > > > > handling also with
> > > > > internal dma-fences tha do not obey the cros-driver
> > > > > dma-fence protocol.
> > > > > 
> > > > > Introduce long-running completion fences in form of
> > > > > dma-fences, and add
> > > > > lockdep annotation for them. In particular:
> > > > > 
> > > > > * Do not allow waiting under any memory management locks.
> > > > > * Do not allow to attach them to a dma-resv object.
> > > > > * Introduce a new interface for adding callbacks making the
> > > > > helper adding
> > > > >    a callback sign off on that it is aware that the dma-fence may not
> > > > >    complete anytime soon. Typically this will be the
> > > > > scheduler chaining
> > > > >    a new long-running fence on another one.
> > > > 
> > > > Well that's pretty much what I tried before:
> > > > https://lwn.net/Articles/893704/
> > > > 

I don't think this quite the same, this explictly enforces that we don't
break the dma-fence rules (in path of memory allocations, exported in
any way), essentially this just SW sync point reusing dma-fence the
infrastructure for signaling / callbacks. I believe your series tried to
export these fences to user space (admittedly I haven't fully read your
series).

In this use case we essentially just want to flow control the ring via
the dma-scheduler + maintain a list of pending jobs so the TDR can be
used for cleanup if LR entity encounters an error. To me this seems
perfectly reasonable but I know dma-femce rules are akin to a holy war.

If we return NULL in run_job, now we have to be able to sink all jobs
in the backend regardless on ring space, maintain a list of jobs pending
for cleanup after errors, and write a different cleanup path as now the
TDR doesn't work. Seems very, very silly to duplicate all of this code
when the DRM scheduler provides all of this for us. Also if we go this
route, now all drivers are going to invent ways to handle LR jobs /w the
DRM scheduler.

This solution is pretty clear, mark the scheduler as LR, and don't
export any fences from the scheduler. If you try to export these fences
a blow up happens.

> > > > And the reasons why it was rejected haven't changed.
> > > > 
> > > > Regards,
> > > > Christian.
> > > > 
> > > Yes, TBH this was mostly to get discussion going how we'd best
> > > tackle this problem while being able to reuse the scheduler for
> > > long-running workloads.
> > > 
> > > I couldn't see any clear decision on your series, though, but one
> > > main difference I see is that this is intended for driver-internal
> > > use only. (I'm counting using the drm_scheduler as a helper for
> > > driver-private use). This is by no means a way to try tackle the
> > > indefinite fence problem.
> > 
> > Well this was just my latest try to tackle this, but essentially the
> > problems are the same as with your approach: When we express such
> > operations as dma_fence there is always the change that we leak that
> > somewhere.
> > 
> > My approach of adding a flag noting that this operation is dangerous and
> > can't be synced with something memory management depends on tried to
> > contain this as much as possible, but Daniel still pretty clearly
> > rejected it (for good reasons I think).
> > 
> > > 
> > > We could ofc invent a completely different data-type that abstracts
> > > the synchronization the scheduler needs in the long-running case, or
> > > each driver could hack something up, like sleeping in the
> > > prepare_job() or run_job() callback for throttling, but those waits
> > > should still be annotated in one way or annotated one way or another
> > > (and probably in a similar way across drivers) to make sure we don't
> > > do anything bad.
> > > 
> > >  So any suggestions as to what would be the better solution here
> > > would be appreciated.
> > 
> > Mhm, do we really the the GPU scheduler for that?
> > 

I think we need to solve this within the DRM scheduler one way or
another.

> > I mean in the 1 to 1 case  you basically just need a component which
> > collects the dependencies as dma_fence and if all of them are fulfilled
> > schedules a work item.
> > 
> > As long as the work item itself doesn't produce a dma_fence it can then
> > still just wait for other none dma_fence dependencies.
> > 
> > Then the work function could submit the work and wait for the result.
> > 
> > The work item would then pretty much represent what you want, you can
> > wait for it to finish and pass it along as long running dependency.
> > 
> > Maybe give it a funky name and wrap it up in a structure, but that's
> > basically it.
> > 
> This very much sounds like a i915_sw_fence for the dependency tracking and
> dma_fence_work for the actual work although it's completion fence is a
> dma_fence.
>

Agree this does sound to i915ish as stated below one of mandates in Xe
was to use the DRM scheduler. Beyond that as someone who a submission
backend in the i915 and Xe, I love how the DRM scheduler works (single
entry point), it makes everything so much easier.

Matt

> Although that goes against the whole idea of a condition for merging the xe
> driver would be that we implement some sort of minimal scaffolding for
> long-running workloads in the drm scheduler, and the thinking behind that is
> to avoid implementing intel-specific solutions like those...
> 
> Thanks,
> 
> Thomas
> 
> 
> 
> > Regards,
> > Christian.
> > 
> > > 
> > > Thanks,
> > > 
> > > Thomas
> > > 
> > > 
> > > 
> > > 
> > >