[Intel-gfx] [PATCH] drm/i915: Special handling for bonded requests

Wed May 27 10:05:16 UTC 2020

Quoting Tvrtko Ursulin (2020-05-27 10:36:31)
> 
> On 27/05/2020 10:20, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-05-27 09:53:22)
> >> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>
> >> Our generic mechanism for specifying aligned request start,
> >> I915_EXEC_FENCE_SUBMIT, has a bit too relaxed rules when it comes to the
> >> actual use case of media scalability on Tigerlake.
> >>
> >> While the submit fence can provide the aligned start, the  actual media
> >> pipeline expects that execution remains in lockstep from then onwards.
> >> Otherwise the hardware cannot handle well one of the pair getting
> >> preempted leading to GPU hangs and corrupted media playback.
> >>
> >> This puts us in a difficult position where the only visible solution,
> >> which does not involve adding new uapi, is to give more meaning to the
> >> generic submit fence. At least when used between the video engines.
> >>
> >> The special semantics this patch applies in that case are two fold. First
> >> is that both of the aligned pairs will be marked as non-preemptable and
> >> second is ensuring both are not sharing ELSP ports with any other context.
> >>
> >> Non-preemptable property will ensure that once the start has been aligned
> >> the requests remain executing until completion.
> >>
> >> Single port ELSP property will ensure there are no possible inversions
> >> between different independent pairs of aligned requests.
> > 
> > Nak.
> > 
> >> Going forward we can think of introducing new uapi to request this
> >> behaviour as a separate, more explicit flag, and at that point retire this
> >> semi-generic special handling.
> > 
> > As CI will say, such behaviour will already need a new flag. Forcing
> > no-preemption should be a privileged flag, so I would expect some
> > acknowledge that this is a HW problem that we have to workaround.
> 
> I don't know how hw exactly works so from my side it's only empirical.
> 
> I agree new flag is needed but we also need a fix ASAP for TGL. And I 
> don't think we can add new uapi in a reasonable time frame here. We 
> would need the ctx set_parallel extension with a dont-preempt flag and 
> multi-batch execbuf. And a lot of work in media-driver which will not be 
> ready for TGL. I don't think a flag at the I915_EXEC_FENCE_SUBMIT level 
> is a solution.

I'm going to say the starting point is to remove the
I915_ENGINE_HAS_PREEMPTION flag from vcs*. That way the tests that
require preemption will skip. Sadly that's the level of our current API.

> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >> Suggested-by: Xiaogang Li <xiaogang.li at intel.com>
> >> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >> ---
> >>   drivers/gpu/drm/i915/gt/intel_context.h |  2 +-
> >>   drivers/gpu/drm/i915/gt/intel_lrc.c     | 46 +++++++++++++++++++++----
> >>   2 files changed, 41 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> >> index 07be021882cc..576d11c0cdd9 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> >> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> >> @@ -212,7 +212,7 @@ intel_context_force_single_submission(const struct intel_context *ce)
> >>   static inline void
> >>   intel_context_set_single_submission(struct intel_context *ce)
> >>   {
> >> -       __set_bit(CONTEXT_FORCE_SINGLE_SUBMISSION, &ce->flags);
> >> +       set_bit(CONTEXT_FORCE_SINGLE_SUBMISSION, &ce->flags);
> >>   }
> >>   
> >>   static inline bool
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> index 3214a4ecc31a..88cf20fd92c8 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> @@ -1734,8 +1734,7 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
> >>   
> >>   static bool ctx_single_port_submission(const struct intel_context *ce)
> >>   {
> >> -       return (IS_ENABLED(CONFIG_DRM_I915_GVT) &&
> >> -               intel_context_force_single_submission(ce));
> >> +       return intel_context_force_single_submission(ce);
> >>   }
> >>   
> >>   static bool can_merge_ctx(const struct intel_context *prev,
> >> @@ -4929,9 +4928,41 @@ static void execlists_park(struct intel_engine_cs *engine)
> >>          cancel_timer(&engine->execlists.preempt);
> >>   }
> >>   
> >> +static void
> >> +mark_bonded_pair(struct i915_request *rq, struct i915_request *signal)
> >> +{
> >> +       /*
> >> +        * Give (temporary) special meaning to a pair requests with requested
> >> +        * aligned start along the video engines.
> >> +        *
> >> +        * They should be non-preemptable and have all ELSP ports to themselves
> >> +        * to avoid any deadlocks caused by inversions.
> > 
> > This sentence needs expanding, because you are implying that this is an
> > issue we need to address in the code. If there is a deadlock resulting
> > from incorrect submission ordering, that is a bug in the code.
> 
> If we add no-preempt I think we have to have single elsp because there 
> is no ordering between unrelated pairs and then elsp inversion certainly 
> sounds like a deadlock.

You don't have a deadlock between unrelated pairs :)

> 
> ctx-A: vcs0 + vcs1
> ctx-B: vcs0 + vcs1
> 
> There is no ordering between A and B but we'd have to pick same port for 
> both pairs of A and B respectively. I don't see that we can do that 
> since all four submissions are async.
> 
> Regards,
> 
> Tvrtko
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>