[Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption
Chris Wilson
chris at chris-wilson.co.uk
Fri Sep 20 15:11:54 UTC 2019
Quoting Tvrtko Ursulin (2019-09-20 15:51:35)
>
> On 20/09/2019 13:42, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-09-20 13:24:47)
> >>
> >> On 20/09/2019 09:36, Chris Wilson wrote:
> >>> Force bonded requests to run on distinct engines so that they cannot be
> >>> shuffled onto the same engine where timeslicing will reverse the order.
> >>> A bonded request will often wait on a semaphore signaled by its master,
> >>> creating an implicit dependency -- if we ignore that implicit dependency
> >>> and allow the bonded request to run on the same engine and before its
> >>> master, we will cause a GPU hang.
> >>>
> >>> We can prevent this inversion by restricting which engines we allow
> >>> ourselves to jump to upon preemption, i.e. baking in the arrangement
> >>> established at first execution. (We should also consider capturing the
> >>> implicit dependency using i915_sched_add_dependency(), but first we need
> >>> to think about the constraints that requires on the execution/retirement
> >>> ordering.)
> >>>
> >>> Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
> >>> References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding")
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>> ---
> >>> drivers/gpu/drm/i915/gt/intel_lrc.c | 19 +++++++++++--------
> >>> 1 file changed, 11 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> index a99166a2d2eb..7920649e4d87 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> @@ -3755,18 +3755,21 @@ static void
> >>> virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> >>> {
> >>> struct virtual_engine *ve = to_virtual_engine(rq->engine);
> >>> + intel_engine_mask_t allowed, exec;
> >>> struct ve_bond *bond;
> >>>
> >>> bond = virtual_find_bond(ve, to_request(signal)->engine);
> >>> - if (bond) {
> >>> - intel_engine_mask_t old, new, cmp;
> >>> + if (!bond)
> >>> + return;
> >>>
> >>> - cmp = READ_ONCE(rq->execution_mask);
> >>> - do {
> >>> - old = cmp;
> >>> - new = cmp & bond->sibling_mask;
> >>> - } while ((cmp = cmpxchg(&rq->execution_mask, old, new)) != old);
> >>> - }
> >>> + /* Restrict the bonded request to run on only the slaved engines */
> >>> + allowed = bond->sibling_mask & ~to_request(signal)->engine->mask;
> >>
> >> Hmm.. isn't it a miss on the uapi level that we allow master to be
> >> mentioned in the list of bonds? That's the only scenario where this line
> >> does something I think. So should we just forbid this setup on the uapi
> >> level?
> >
> > That's just a lot of digging!
>
> It's simple, in set_engines__bond in the bonds loop we just do "if
> (master == bond) oh_no_you_wont". Am I missing something?
There doesn't even need to be a bond, which is something I forgot
to handle. So I'm looking at something more like
static void
virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
{
struct virtual_engine *ve = to_virtual_engine(rq->engine);
intel_engine_mask_t allowed, exec;
struct ve_bond *bond;
allowed = ~to_request(signal)->engine->mask;
bond = virtual_find_bond(ve, to_request(signal)->engine);
if (bond)
allowed &= bond->sibling_mask;
/* Restrict the bonded request to run on only the available engines */
exec = READ_ONCE(rq->execution_mask);
while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed))
;
/* Prevent the master from being re-run on the bonded engines */
to_request(signal)->execution_mask &= ~allowed;
}
(The joy of trying to write a test case.)
-Chris
More information about the Intel-gfx
mailing list