[Intel-gfx] [PATCH v3 2/2] drm/i915: add syncobj timeline support

Lionel Landwerlin lionel.g.landwerlin at intel.com
Thu Aug 1 09:37:23 UTC 2019


On 01/08/2019 12:22, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-08-01 10:01:44)
>> On 01/08/2019 11:08, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2019-08-01 08:43:24)
>>>> On 31/07/2019 23:03, Chris Wilson wrote:
>>>>> Quoting Lionel Landwerlin (2019-07-31 15:07:33)
>>>>> I think I have convinced myself that with the split between wait before,
>>>>> signal after combined with the rule that seqno point along the syncobj
>>>>> are monotonic, you should not be able to generate an AB-BA deadlock
>>>>> between concurrent clients.
>>>>>
>>>>> Except that we need to fixup drm_syncobj_add_point() to actually apply
>>>>> that rule. Throwing an error and leaving the client stuck is less than
>>>>> ideal, we can't even recover with a GPU reset, and as they are native
>>>>> fences we don't employ a failsafe timer.
>>>> Unfortunately we're not the only user of add_point() and amdgpu doesn't
>>>> want it to fail.
>>> It's dangerously exposing the deadlock from incorrect seqno ordering to
>>> userspace. We should be able to hit that DRM_ERROR() in testing, and be
>>> able to detect the out-of-sequence timeline.
>>>
>>>> Best I can come up with is take the syncobj lock when signaling in
>>>> get_timeline_fence_array() do the check there, unlock in __free_fence_array.
>>> I think the intermediate step is a "safe" version of add_point that
>>> doesn't leave the timeline broken. That still leaves a glitch visible to
>>> userspace, but it should not deadlock.
>>
>> Right, sounds doable. What happens in execbuf after add_point() fails?
>>
>> We've already handed the request to the scheduler and potentially it
>> might be running already.
> Right, at present we've already submitted the request, so the batch will
> be run and fence will be signaled. The failure to add to it the
> timeline means that someone else already submitted a later fence and
> some third party is using that syncpt instead of ours for their
> dependency. So that third party will be delayed, but so long as they kept
> their dependencies in order it should just be a delay.


But should we return an error to userspace?


-Lionel


>
> The problem comes into play if we go ahead and insert our fence into the
> dma_fence_chain out of order, breaking all the monotonic guarantees and
> flipping the order of the fences. (The equivalent to taking mutexes out
> of order.)
>
>>> The way I was looking at it was to reserve a placeholder syncpt before
>>> building the request and allow replacing the placeholder afterwards.
>>
>> That sounds fairly tricky to get right :(
> Invasive, yeah, turns out one needs to start walking the fence chain
> more carefully. The actual placeholder fence is pretty run of the mill
> as far as dma-fence*.c goes, which is say...
>   
>> The fence stuff is already full of magic.
> It's full of magic.
> -Chris
>



More information about the Intel-gfx mailing list