[Intel-xe] [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences

Thomas Hellström thomas.hellstrom at linux.intel.com
Tue Apr 4 12:54:50 UTC 2023


Hi, Christian,

On 4/4/23 11:09, Christian König wrote:
> Am 04.04.23 um 02:22 schrieb Matthew Brost:
>> From: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>>
>> For long-running workloads, drivers either need to open-code completion
>> waits, invent their own synchronization primitives or internally use
>> dma-fences that do not obey the cross-driver dma-fence protocol, but
>> without any lockdep annotation all these approaches are error prone.
>>
>> So since for example the drm scheduler uses dma-fences it is 
>> desirable for
>> a driver to be able to use it for throttling and error handling also 
>> with
>> internal dma-fences tha do not obey the cros-driver dma-fence protocol.
>>
>> Introduce long-running completion fences in form of dma-fences, and add
>> lockdep annotation for them. In particular:
>>
>> * Do not allow waiting under any memory management locks.
>> * Do not allow to attach them to a dma-resv object.
>> * Introduce a new interface for adding callbacks making the helper 
>> adding
>>    a callback sign off on that it is aware that the dma-fence may not
>>    complete anytime soon. Typically this will be the scheduler chaining
>>    a new long-running fence on another one.
>
> Well that's pretty much what I tried before: 
> https://lwn.net/Articles/893704/
>
> And the reasons why it was rejected haven't changed.
>
> Regards,
> Christian.
>
Yes, TBH this was mostly to get discussion going how we'd best tackle 
this problem while being able to reuse the scheduler for long-running 
workloads.

I couldn't see any clear decision on your series, though, but one main 
difference I see is that this is intended for driver-internal use only. 
(I'm counting using the drm_scheduler as a helper for driver-private 
use). This is by no means a way to try tackle the indefinite fence problem.

We could ofc invent a completely different data-type that abstracts the 
synchronization the scheduler needs in the long-running case, or each 
driver could hack something up, like sleeping in the prepare_job() or 
run_job() callback for throttling, but those waits should still be 
annotated in one way or annotated one way or another (and probably in a 
similar way across drivers) to make sure we don't do anything bad.

  So any suggestions as to what would be the better solution here would 
be appreciated.

Thanks,

Thomas







More information about the Intel-xe mailing list