[Intel-gfx] [PATCH 3/9] drm/i915/gt: Unlock engine-pm after queuing the kernel context switch
Chris Wilson
chris at chris-wilson.co.uk
Wed Nov 20 12:07:37 UTC 2019
Quoting Tvrtko Ursulin (2019-11-20 11:58:27)
>
> On 20/11/2019 09:32, Chris Wilson wrote:
> > In commit a79ca656b648 ("drm/i915: Push the wakeref->count deferral to
> > the backend"), I erroneously concluded that we last modify the engine
> > inside __i915_request_commit() meaning that we could enable concurrent
> > submission for userspace as we enqueued this request. However, this
> > falls into a trap with other users of the engine->kernel_context waking
> > up and submitting their request before the idle-switch is queued, with
> > the result that the kernel_context is executed out-of-sequence most
> > likely upsetting the GPU and certainly ourselves when we try to retire
> > the out-of-sequence requests.
> >
> > As such we need to hold onto the effective engine->kernel_context mutex
> > lock (via the engine pm mutex proxy) until we have finish queuing the
> > request to the engine.
> >
> > v2: Serialise against concurrent intel_gt_retire_requests()
> > v3: Describe the hairy locking scheme with intel_gt_retire_requests()
> > for future reference.
> > v4: Combine timeline->lock and engine pm release; it's hairy.
> >
> > Fixes: a79ca656b648 ("drm/i915: Push the wakeref->count deferral to the backend")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 47 +++++++++++++++++++----
> > 1 file changed, 40 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > index 3c0f490ff2c7..1f517357a268 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > @@ -73,8 +73,25 @@ static inline void __timeline_mark_unlock(struct intel_context *ce,
> >
> > #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
> >
> > +static void
> > +__intel_timeline_enter_and_pm_release(struct intel_timeline *tl,
> > + struct intel_engine_cs *engine)
> > +{
> > + struct intel_gt_timelines *timelines = &engine->gt->timelines;
> > +
> > + spin_lock(&timelines->lock);
> > +
> > + if (!atomic_fetch_inc(&tl->active_count))
> > + list_add_tail(&tl->link, &timelines->active_list);
>
> Hmm with these new part it maybe matches/answers my question from
> "drm/i915/gt: Close race between engine_park and
> intel_gt_retire_requests". I think at least. Since it now adds a second
> place timeline can enter the active_list.
>
> But no, where does the intel_timeline_enter race come from? Can't be
> userspace submission since they are blocked on wf->lock.
The race is not just with intel_timeline_enter, but with
intel_gt_retire_requests.
As soon as we are on that list, we may be retired. If we are retired
before adjusting the engine->wakeref.count, we are b0rked.
As soon as we adjust the engine->wakeref.count, another submission may
call intel_timeline_enter, and again may even retire this request. The
enter itself is perfectly fine, but we need to serialise against the
retires.
-Chris
More information about the Intel-gfx
mailing list