[Intel-gfx] [PATCH 5/5] drm/i915/gt: Only transfer the virtual context to the new engine if active

Thu Jul 16 15:37:25 UTC 2020

On 16/07/2020 12:33, Chris Wilson wrote:
> One more complication of preempt-to-busy with respect to the virtual
> engine is that we may have retired the last request along the virtual
> engine at the same time as preparing to submit the completed request to
> a new engine. That submit will be shortcircuited, but not before we have
> updated the context with the new register offsets and marked the virtual
> engine as bound to the new engine (by calling swap on ve->siblings[]).
> As we may have just retired the completed request, we may also be in the
> middle of calling intel_context_exit() to turn off the power management

virtual_context_exit

> associated with the virtual engine, and that in turn walks the
> ve->siblings[]. If we happen to call swap() on the array as we walk, we
> will call intel_engine_pm_put() twice on the same engine.
> 
> In this patch, we prevent this by only updating the bound engine after a
> successful submission which weeds out the already completed requests.
> 
> Alternatively, we could walk a non-volatile array for the pm, such as
> using the engine->mask. The small advantage to performing the update
> after the submit is that we then only have to do a swap for active
> requests.
> 
> Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
> References: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine"
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana at intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 65 ++++++++++++++++++-----------
>   1 file changed, 40 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 88a5c155154d..5e8278e8ac79 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1805,6 +1805,33 @@ static bool virtual_matches(const struct virtual_engine *ve,
>   	return true;
>   }
>   
> +static void virtual_xfer_context(struct virtual_engine *ve,
> +				 struct intel_engine_cs *engine)
> +{
> +	unsigned int n;
> +
> +	if (likely(engine == ve->siblings[0]))
> +		return;
> +
> +	GEM_BUG_ON(READ_ONCE(ve->context.inflight));
> +	if (!intel_engine_has_relative_mmio(engine))
> +		virtual_update_register_offsets(ve->context.lrc_reg_state,
> +						engine);
> +
> +	/*
> +	 * Move the bound engine to the top of the list for
> +	 * future execution. We then kick this tasklet first
> +	 * before checking others, so that we preferentially
> +	 * reuse this set of bound registers.
> +	 */
> +	for (n = 1; n < ve->num_siblings; n++) {
> +		if (ve->siblings[n] == engine) {
> +			swap(ve->siblings[n], ve->siblings[0]);
> +			break;
> +		}
> +	}
> +}
> +
>   #define for_each_waiter(p__, rq__) \
>   	list_for_each_entry_lockless(p__, \
>   				     &(rq__)->sched.waiters_list, \
> @@ -2253,35 +2280,23 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   			GEM_BUG_ON(!(rq->execution_mask & engine->mask));
>   			WRITE_ONCE(rq->engine, engine);
>   
> -			if (engine != ve->siblings[0]) {
> -				u32 *regs = ve->context.lrc_reg_state;
> -				unsigned int n;
> -
> -				GEM_BUG_ON(READ_ONCE(ve->context.inflight));
> -
> -				if (!intel_engine_has_relative_mmio(engine))
> -					virtual_update_register_offsets(regs,
> -									engine);
> -
> +			if (__i915_request_submit(rq)) {
>   				/*
> -				 * Move the bound engine to the top of the list
> -				 * for future execution. We then kick this
> -				 * tasklet first before checking others, so that
> -				 * we preferentially reuse this set of bound
> -				 * registers.
> +				 * Only after we confirm that we will submit
> +				 * this request (i.e. it has not already
> +				 * completed), do we want to update the context.
> +				 *
> +				 * This serves two purposes. It avoids
> +				 * unnecessary work if we are resubmitting an
> +				 * already completed request after timeslicing.
> +				 * But more importantly, it prevents us altering
> +				 * ve->siblings[] on an idle context, where
> +				 * we may be using ve->siblings[] in
> +				 * virtual_context_enter / virtual_context_exit.
>   				 */
> -				for (n = 1; n < ve->num_siblings; n++) {
> -					if (ve->siblings[n] == engine) {
> -						swap(ve->siblings[n],
> -						     ve->siblings[0]);
> -						break;
> -					}
> -				}
> -
> +				virtual_xfer_context(ve, engine);
>   				GEM_BUG_ON(ve->siblings[0] != engine);
> -			}
>   
> -			if (__i915_request_submit(rq)) {
>   				submit = true;
>   				last = rq;
>   			}
> 

This was nasty indeed. How did you manage to find this?

I think it is okay to do the update once we know submit is doing ahead, 
but in the light of this "gotcha" I think it would also be good to start 
using for_each_engine_masked in virtual_context_enter/exit.

Regardless:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko