[Linaro-mm-sig] [PATCH 1/2] dma-buf.rst: Document why indefinite fences are a bad idea

Wed Jul 22 06:45:45 UTC 2020

On 2020-07-22 00:45, Dave Airlie wrote:
> On Tue, 21 Jul 2020 at 18:47, Thomas Hellström (Intel)
> <thomas_os at shipmail.org> wrote:
>>
>> On 7/21/20 9:45 AM, Christian König wrote:
>>> Am 21.07.20 um 09:41 schrieb Daniel Vetter:
>>>> On Mon, Jul 20, 2020 at 01:15:17PM +0200, Thomas Hellström (Intel)
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> On 7/9/20 2:33 PM, Daniel Vetter wrote:
>>>>>> Comes up every few years, gets somewhat tedious to discuss, let's
>>>>>> write this down once and for all.
>>>>>>
>>>>>> What I'm not sure about is whether the text should be more explicit in
>>>>>> flat out mandating the amdkfd eviction fences for long running compute
>>>>>> workloads or workloads where userspace fencing is allowed.
>>>>> Although (in my humble opinion) it might be possible to completely
>>>>> untangle
>>>>> kernel-introduced fences for resource management and dma-fences used
>>>>> for
>>>>> completion- and dependency tracking and lift a lot of restrictions
>>>>> for the
>>>>> dma-fences, including prohibiting infinite ones, I think this makes
>>>>> sense
>>>>> describing the current state.
>>>> Yeah I think a future patch needs to type up how we want to make that
>>>> happen (for some cross driver consistency) and what needs to be
>>>> considered. Some of the necessary parts are already there (with like the
>>>> preemption fences amdkfd has as an example), but I think some clear docs
>>>> on what's required from both hw, drivers and userspace would be really
>>>> good.
>>> I'm currently writing that up, but probably still need a few days for
>>> this.
>> Great! I put down some (very) initial thoughts a couple of weeks ago
>> building on eviction fences for various hardware complexity levels here:
>>
>> https://gitlab.freedesktop.org/thomash/docs/-/blob/master/Untangling%20dma-fence%20and%20memory%20allocation.odt
> We are seeing HW that has recoverable GPU page faults but only for
> compute tasks, and scheduler without semaphores hw for graphics.
>
> So a single driver may have to expose both models to userspace and
> also introduces the problem of how to interoperate between the two
> models on one card.
>
> Dave.

Hmm, yes to begin with it's important to note that this is not a 
replacement for new programming models or APIs, This is something that 
takes place internally in drivers to mitigate many of the restrictions 
that are currently imposed on dma-fence and documented in this and 
previous series. It's basically the driver-private narrow completions 
Jason suggested in the lockdep patches discussions implemented the same 
way as eviction-fences.

The memory fence API would be local to helpers and middle-layers like 
TTM, and the corresponding drivers.  The only cross-driver-like 
visibility would be that the dma-buf move_notify() callback would not be 
allowed to wait on dma-fences or something that depends on a dma-fence.

So with that in mind, I don't foresee engines with different 
capabilities on the same card being a problem.

/Thomas