[Intel-gfx] [PATCH v3 2/2] drm/i915: add syncobj timeline support

Chris Wilson chris at chris-wilson.co.uk
Thu Aug 1 08:08:04 UTC 2019


Quoting Lionel Landwerlin (2019-08-01 08:43:24)
> On 31/07/2019 23:03, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2019-07-31 15:07:33)
> > I think I have convinced myself that with the split between wait before,
> > signal after combined with the rule that seqno point along the syncobj
> > are monotonic, you should not be able to generate an AB-BA deadlock
> > between concurrent clients.
> >
> > Except that we need to fixup drm_syncobj_add_point() to actually apply
> > that rule. Throwing an error and leaving the client stuck is less than
> > ideal, we can't even recover with a GPU reset, and as they are native
> > fences we don't employ a failsafe timer.
> 
> 
> Unfortunately we're not the only user of add_point() and amdgpu doesn't 
> want it to fail.

It's dangerously exposing the deadlock from incorrect seqno ordering to
userspace. We should be able to hit that DRM_ERROR() in testing, and be
able to detect the out-of-sequence timeline.

> Best I can come up with is take the syncobj lock when signaling in 
> get_timeline_fence_array() do the check there, unlock in __free_fence_array.

I think the intermediate step is a "safe" version of add_point that
doesn't leave the timeline broken. That still leaves a glitch visible to
userspace, but it should not deadlock.

The way I was looking at it was to reserve a placeholder syncpt before
building the request and allow replacing the placeholder afterwards.

If we abort the submission part way, we leave the placeholder in the
timeline to be replaced later, or subsumed by a later syncpt.

> > Doesn't WAIT_FOR_SUBMIT throw a spanner in the works and allow for
> > deadlocks?
> 
> 
> Hm... I can't see how.
> 
> Unless you mean clients could deadlock them themselves the same way it 
> would using 2 pthread_mutex and having 2 threads trying to lock both 
> mutexes in opposite order.

Yes.
 
> Is it that the kernel's problem?

It becomes a problem for us as it ties up kernel resources that we can
not reap currently. Userspace is not meant to be able to break the
kernel on a whim.  Even futexes are robust. ;)
-Chris


More information about the Intel-gfx mailing list