[Intel-gfx] [PATCH 5/5] drm/i915: Prevent bonded requests from overtaking each other on preemption
Chris Wilson
chris at chris-wilson.co.uk
Mon Sep 23 13:53:21 UTC 2019
Quoting Tvrtko Ursulin (2019-09-23 14:48:44)
>
> On 21/09/2019 10:55, Chris Wilson wrote:
> > Force bonded requests to run on distinct engines so that they cannot be
> > shuffled onto the same engine where timeslicing will reverse the order.
> > A bonded request will often wait on a semaphore signaled by its master,
> > creating an implicit dependency -- if we ignore that implicit dependency
> > and allow the bonded request to run on the same engine and before its
> > master, we will cause a GPU hang. [Whether it will hang the GPU is
> > debatable, we should keep on timeslicing and each timeslice should be
> > "accidentally" counted as forward progress, in which case it should run
> > but at one-half to one-third speed.]
> >
> > We can prevent this inversion by restricting which engines we allow
> > ourselves to jump to upon preemption, i.e. baking in the arrangement
> > established at first execution. (We should also consider capturing the
> > implicit dependency using i915_sched_add_dependency(), but first we need
> > to think about the constraints that requires on the execution/retirement
> > ordering.)
> >
> > Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
> > References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding")
> > Testcase: igt/gem_exec_balancer/bonded-slice
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> > drivers/gpu/drm/i915/gt/intel_lrc.c | 20 ++++++++++++--------
> > 1 file changed, 12 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 3eadd294bcd7..9872bb4c99fc 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -3846,18 +3846,22 @@ static void
> > virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> > {
> > struct virtual_engine *ve = to_virtual_engine(rq->engine);
> > + intel_engine_mask_t allowed, exec;
> > struct ve_bond *bond;
> >
> > + allowed = ~to_request(signal)->engine->mask;
> > +
> > bond = virtual_find_bond(ve, to_request(signal)->engine);
> > - if (bond) {
> > - intel_engine_mask_t old, new, cmp;
> > + if (bond)
> > + allowed &= bond->sibling_mask;
> >
> > - cmp = READ_ONCE(rq->execution_mask);
> > - do {
> > - old = cmp;
> > - new = cmp & bond->sibling_mask;
> > - } while ((cmp = cmpxchg(&rq->execution_mask, old, new)) != old);
> > - }
> > + /* Restrict the bonded request to run on only the available engines */
> > + exec = READ_ONCE(rq->execution_mask);
> > + while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed))
> > + ;
> > +
> > + /* Prevent the master from being re-run on the bonded engines */
> > + to_request(signal)->execution_mask &= ~allowed;
>
> I had an open here whether we should also fix the slave. At re-submit
> time what is preventing the slave being put on the same engine as the
> master?
We removed the master engine from the list of allowed engines for the
slave, and we never reset the execution_mask on unsubmit (and we don't
re-execute the bonding on resubmit). That's the hard bit I left open for
the reader.
-Chris
More information about the Intel-gfx
mailing list