[Intel-gfx] [PATCH 5/5] drm/i915: Prevent bonded requests from overtaking each other on preemption

Mon Sep 23 13:48:44 UTC 2019

On 21/09/2019 10:55, Chris Wilson wrote:
> Force bonded requests to run on distinct engines so that they cannot be
> shuffled onto the same engine where timeslicing will reverse the order.
> A bonded request will often wait on a semaphore signaled by its master,
> creating an implicit dependency -- if we ignore that implicit dependency
> and allow the bonded request to run on the same engine and before its
> master, we will cause a GPU hang. [Whether it will hang the GPU is
> debatable, we should keep on timeslicing and each timeslice should be
> "accidentally" counted as forward progress, in which case it should run
> but at one-half to one-third speed.]
> 
> We can prevent this inversion by restricting which engines we allow
> ourselves to jump to upon preemption, i.e. baking in the arrangement
> established at first execution. (We should also consider capturing the
> implicit dependency using i915_sched_add_dependency(), but first we need
> to think about the constraints that requires on the execution/retirement
> ordering.)
> 
> Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
> References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding")
> Testcase: igt/gem_exec_balancer/bonded-slice
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 20 ++++++++++++--------
>   1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 3eadd294bcd7..9872bb4c99fc 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -3846,18 +3846,22 @@ static void
>   virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
>   {
>   	struct virtual_engine *ve = to_virtual_engine(rq->engine);
> +	intel_engine_mask_t allowed, exec;
>   	struct ve_bond *bond;
>   
> +	allowed = ~to_request(signal)->engine->mask;
> +
>   	bond = virtual_find_bond(ve, to_request(signal)->engine);
> -	if (bond) {
> -		intel_engine_mask_t old, new, cmp;
> +	if (bond)
> +		allowed &= bond->sibling_mask;
>   
> -		cmp = READ_ONCE(rq->execution_mask);
> -		do {
> -			old = cmp;
> -			new = cmp & bond->sibling_mask;
> -		} while ((cmp = cmpxchg(&rq->execution_mask, old, new)) != old);
> -	}
> +	/* Restrict the bonded request to run on only the available engines */
> +	exec = READ_ONCE(rq->execution_mask);
> +	while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed))
> +		;
> +
> +	/* Prevent the master from being re-run on the bonded engines */
> +	to_request(signal)->execution_mask &= ~allowed;

I had an open here whether we should also fix the slave. At re-submit 
time what is preventing the slave being put on the same engine as the 
master?

Regards,

Tvrtko

>   }
>   
>   struct intel_context *
>