[Intel-gfx] [PATCH 03/19] drm/i915/gt: Close race between engine_park and intel_gt_retire_requests

Tue Nov 19 14:15:49 UTC 2019

On 18/11/2019 23:02, Chris Wilson wrote:
> The general concept was that intel_timeline.active_count was locked by
> the intel_timeline.mutex. The exception was for power management, where
> the engine->kernel_context->timeline could be manipulated under the
> global wakeref.mutex.
> 
> This was quite solid, as we always manipulated the timeline only while
> we held an engine wakeref.
> 
> And then we started retiring requests outside of struct_mutex, only
> using the timelines.active_list and the timeline->mutex. There we
> started manipulating intel_timeline.active_count outside of an engine
> wakeref, and so introduced a race between __engine_park() and
> intel_gt_retire_requests(), a race that could result in the
> engine->kernel_context not being added to the active timelines and so
> losing requests, which caused us to keep the system permanently powered
> up [and unloadable].
> 
> The race would be easy to close if we could take the engine wakeref for
> the timeline before we retire -- except timelines are not bound to any
> engine and so we would need to keep all active engines awake. The
> alternative is to guard intel_timeline_enter/intel_timeline_exit for use
> outside of the timeline->mutex.
> 
> Fixes: e5dadff4b093 ("drm/i915: Protect request retirement with timeline->mutex")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Matthew Auld <matthew.auld at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c   |  8 ++---
>   drivers/gpu/drm/i915/gt/intel_timeline.c      | 34 +++++++++++++++----
>   .../gpu/drm/i915/gt/intel_timeline_types.h    |  2 +-
>   3 files changed, 32 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index a79e6efb31a2..7559d6373f49 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -49,8 +49,8 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
>   			continue;
>   
>   		intel_timeline_get(tl);
> -		GEM_BUG_ON(!tl->active_count);
> -		tl->active_count++; /* pin the list element */
> +		GEM_BUG_ON(!atomic_read(&tl->active_count));
> +		atomic_inc(&tl->active_count); /* pin the list element */
>   		spin_unlock_irqrestore(&timelines->lock, flags);
>   
>   		if (timeout > 0) {
> @@ -71,14 +71,14 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
>   
>   		/* Resume iteration after dropping lock */
>   		list_safe_reset_next(tl, tn, link);
> -		if (!--tl->active_count)
> +		if (atomic_dec_and_test(&tl->active_count))
>   			list_del(&tl->link);
>   
>   		mutex_unlock(&tl->mutex);
>   
>   		/* Defer the final release to after the spinlock */
>   		if (refcount_dec_and_test(&tl->kref.refcount)) {
> -			GEM_BUG_ON(tl->active_count);
> +			GEM_BUG_ON(atomic_read(&tl->active_count));
>   			list_add(&tl->link, &free);
>   		}
>   	}
> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
> index 16a9e88d93de..4f914f0d5eab 100644
> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> @@ -334,15 +334,33 @@ void intel_timeline_enter(struct intel_timeline *tl)
>   	struct intel_gt_timelines *timelines = &tl->gt->timelines;
>   	unsigned long flags;
>   
> +	/*
> +	 * Pretend we are serialised by the timeline->mutex.
> +	 *
> +	 * While generally true, there are a few exceptions to the rule
> +	 * for the engine->kernel_context being used to manage power
> +	 * transitions. As the engine_park may be called from under any
> +	 * timeline, it uses the power mutex as a global serialisation
> +	 * lock to prevent any other request entering its timeline.
> +	 *
> +	 * The rule is generally tl->mutex, otherwise engine->wakeref.mutex.
> +	 *
> +	 * However, intel_gt_retire_request() does not know which engine
> +	 * it is retiring along and so cannot partake in the engine-pm
> +	 * barrier, and there we use the tl->active_count as a means to
> +	 * pin the timeline in the active_list while the locks are dropped.
> +	 * Ergo, as that is outside of the engine-pm barrier, we need to
> +	 * use atomic to manipulate tl->active_count.
> +	 */
>   	lockdep_assert_held(&tl->mutex);
> -
>   	GEM_BUG_ON(!atomic_read(&tl->pin_count));
> -	if (tl->active_count++)
> +
> +	if (atomic_add_unless(&tl->active_count, 1, 0))
>   		return;
> -	GEM_BUG_ON(!tl->active_count); /* overflow? */
>   
>   	spin_lock_irqsave(&timelines->lock, flags);
> -	list_add(&tl->link, &timelines->active_list);
> +	if (!atomic_fetch_inc(&tl->active_count))
> +		list_add(&tl->link, &timelines->active_list);

So retirement raced with this and has elevated the active_count? But 
retirement does not add the timeline to the list, so we exit here 
without it on the active_list.

>   	spin_unlock_irqrestore(&timelines->lock, flags);
>   }
>   
> @@ -351,14 +369,16 @@ void intel_timeline_exit(struct intel_timeline *tl)
>   	struct intel_gt_timelines *timelines = &tl->gt->timelines;
>   	unsigned long flags;
>   
> +	/* See intel_timeline_enter() */
>   	lockdep_assert_held(&tl->mutex);
>   
> -	GEM_BUG_ON(!tl->active_count);
> -	if (--tl->active_count)
> +	GEM_BUG_ON(!atomic_read(&tl->active_count));
> +	if (atomic_add_unless(&tl->active_count, -1, 1))
>   		return;
>   
>   	spin_lock_irqsave(&timelines->lock, flags);
> -	list_del(&tl->link);
> +	if (atomic_dec_and_test(&tl->active_count))
> +		list_del(&tl->link);

This one I can understand because retirement would have unlinked it.

>   	spin_unlock_irqrestore(&timelines->lock, flags);
>   
>   	/*
> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline_types.h b/drivers/gpu/drm/i915/gt/intel_timeline_types.h
> index 98d9ee166379..5244615ed1cb 100644
> --- a/drivers/gpu/drm/i915/gt/intel_timeline_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_timeline_types.h
> @@ -42,7 +42,7 @@ struct intel_timeline {
>   	 * from the intel_context caller plus internal atomicity.
>   	 */
>   	atomic_t pin_count;
> -	unsigned int active_count;
> +	atomic_t active_count;
>   
>   	const u32 *hwsp_seqno;
>   	struct i915_vma *hwsp_ggtt;
> 

Regards,

Tvrtko