[Intel-gfx] [PATCH 3/9] drm/i915/gt: Unlock engine-pm after queuing the kernel context switch

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Wed Nov 20 12:40:13 UTC 2019


On 20/11/2019 12:07, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-11-20 11:58:27)
>>
>> On 20/11/2019 09:32, Chris Wilson wrote:
>>> In commit a79ca656b648 ("drm/i915: Push the wakeref->count deferral to
>>> the backend"), I erroneously concluded that we last modify the engine
>>> inside __i915_request_commit() meaning that we could enable concurrent
>>> submission for userspace as we enqueued this request. However, this
>>> falls into a trap with other users of the engine->kernel_context waking
>>> up and submitting their request before the idle-switch is queued, with
>>> the result that the kernel_context is executed out-of-sequence most
>>> likely upsetting the GPU and certainly ourselves when we try to retire
>>> the out-of-sequence requests.
>>>
>>> As such we need to hold onto the effective engine->kernel_context mutex
>>> lock (via the engine pm mutex proxy) until we have finish queuing the
>>> request to the engine.
>>>
>>> v2: Serialise against concurrent intel_gt_retire_requests()
>>> v3: Describe the hairy locking scheme with intel_gt_retire_requests()
>>> for future reference.
>>> v4: Combine timeline->lock and engine pm release; it's hairy.
>>>
>>> Fixes: a79ca656b648 ("drm/i915: Push the wakeref->count deferral to the backend")
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_engine_pm.c | 47 +++++++++++++++++++----
>>>    1 file changed, 40 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>> index 3c0f490ff2c7..1f517357a268 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>> @@ -73,8 +73,25 @@ static inline void __timeline_mark_unlock(struct intel_context *ce,
>>>    
>>>    #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
>>>    
>>> +static void
>>> +__intel_timeline_enter_and_pm_release(struct intel_timeline *tl,
>>> +                                   struct intel_engine_cs *engine)
>>> +{
>>> +     struct intel_gt_timelines *timelines = &engine->gt->timelines;
>>> +
>>> +     spin_lock(&timelines->lock);
>>> +
>>> +     if (!atomic_fetch_inc(&tl->active_count))
>>> +             list_add_tail(&tl->link, &timelines->active_list);
>>
>> Hmm with these new part it maybe matches/answers my question from
>> "drm/i915/gt: Close race between engine_park and
>> intel_gt_retire_requests". I think at least. Since it now adds a second
>> place timeline can enter the active_list.
>>
>> But no, where does the intel_timeline_enter race come from? Can't be
>> userspace submission since they are blocked on wf->lock.
> 
> The race is not just with intel_timeline_enter, but with
> intel_gt_retire_requests.
> 
> As soon as we are on that list, we may be retired. If we are retired
> before adjusting the engine->wakeref.count, we are b0rked.
> 
> As soon as we adjust the engine->wakeref.count, another submission may
> call intel_timeline_enter, and again may even retire this request. The
> enter itself is perfectly fine, but we need to serialise against the
> retires.

I think the two patches combined work, I am just not sure two 
atomic_fetch_inc()->list_add() are needed now that you re-ordered 
__i915_requests_queue and __intel_wakeref_defer_park - that's the part 
which is confusing me. But it also doesn't harm...

Regards,

Tvrtko


More information about the Intel-gfx mailing list