[Intel-gfx] [PATCH] drm/i915: Lock timeline mutex directly in error path of eb_pin_timeline

Thu Jan 6 16:52:31 UTC 2022

On Thu, Jan 06, 2022 at 04:51:21PM +0000, Tvrtko Ursulin wrote:
> 
> On 06/01/2022 16:18, Matthew Brost wrote:
> > On Thu, Jan 06, 2022 at 09:56:03AM +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 05/01/2022 16:24, Matthew Brost wrote:
> > > > On Wed, Jan 05, 2022 at 09:35:44AM +0000, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 04/01/2022 23:30, Matthew Brost wrote:
> > > > > > Don't use the interruptable version of the timeline mutex lock in the
> > > > > 
> > > > > interruptible
> > > > > 
> > > > > > error path of eb_pin_timeline as the cleanup must always happen.
> > > > > > 
> > > > > > v2:
> > > > > >     (John Harrison)
> > > > > >      - Don't check for interrupt during mutex lock
> > > > > > 
> > > > > > Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
> > > > > > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > > > > > ---
> > > > > >     drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 ++--
> > > > > >     1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > index e9541244027a..e96e133cbb1f 100644
> > > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > @@ -2516,9 +2516,9 @@ static int eb_pin_timeline(struct i915_execbuffer *eb, struct intel_context *ce,
> > > > > >     				      timeout) < 0) {
> > > > > >     			i915_request_put(rq);
> > > > > > -			tl = intel_context_timeline_lock(ce);
> > > > > > +			mutex_lock(&ce->timeline->mutex);
> > > > > 
> > > > > On the other hand it is more user friendly to handle signals (which maybe
> > > > > does not matter in this case, not sure any longer how long hold time it can
> > > > > have) but there is also a question of consistency within the very function
> > > > > you are changing.
> > > > > 
> > > > > Apart from consistency, what about the parent-child magic
> > > > > intel_context_timeline_lock does and you wouldn't have here?
> > > > > 
> > > > > And what about the very existence of intel_context_timeline_lock as a
> > > > > component boundary separation API, if it is used inconsistently throughout
> > > > > i915_gem_execbuffer.c?
> > > > 
> > > > intel_context_timeline_lock does 2 things:
> > > > 
> > > > 1. Handles lockdep nesting of timeline locks for parent-child contexts
> > > > ensuring locks are acquired from parent to last child, then released
> > > > last child to parent
> > > > 2. Allows the mutex lock to be interrupted
> > > > 
> > > > This helper should be used in setup steps where a user can signal abort
> > > > (context pinning time + request creation time), by 'should be' I mean
> > > > this was how it was done before I extended the execbuf IOCTL for
> > > > multiple BBs. Slightly confusing but this is what was in place so I
> > > > stuck with it.
> > > > 
> > > > This code here is an error path that only hold at most 1 timeline lock
> > > > (no nesting required) and is a path that must be executed as it is a
> > > > cleanup step (not allowed to be interrupted by user, intel_context_exit
> > > > must be called or we have dangling engine PM refs).
> > > > 
> > > > Make sense? I probably should update the comment message to explain this
> > > > a bit better as it did take me a bit to understand how this locking
> > > > worked.
> > > 
> > > The part which does not make sense is this:
> > > 
> > 
> > I'll try to explain this again...
> > 
> > > eb_pin_timeline()
> > > {
> > > ...
> > > 	tl = intel_context_timeline_lock(ce);
> > > 	if (IS_ERR(tl))
> > > 		return PTR_ERR(tl);
> > 
> > This part is allowed to be aborted by the user.
> > 
> > > 
> > > ... do some throttling, and if it fail:
> > > 			mutex_lock(&ce->timeline->mutex);
> > 
> > This part is not.
> 
> Pfft got it this time round.
> 
> I suggest a comment above the mutex to this effect. And maybe still consider
> the explicit error path which may be more readable due single request put,
> but it's up to you.
> 

I'll add comment and see how adding an explict error path looks in the code.

Matt

> Regards,
> 
> Tvrtko