[Intel-gfx] [PATCH] drm/i915: Special handling for bonded requests
Chris Wilson
chris at chris-wilson.co.uk
Wed May 27 10:05:16 UTC 2020
Quoting Tvrtko Ursulin (2020-05-27 10:36:31)
>
> On 27/05/2020 10:20, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-05-27 09:53:22)
> >> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>
> >> Our generic mechanism for specifying aligned request start,
> >> I915_EXEC_FENCE_SUBMIT, has a bit too relaxed rules when it comes to the
> >> actual use case of media scalability on Tigerlake.
> >>
> >> While the submit fence can provide the aligned start, the actual media
> >> pipeline expects that execution remains in lockstep from then onwards.
> >> Otherwise the hardware cannot handle well one of the pair getting
> >> preempted leading to GPU hangs and corrupted media playback.
> >>
> >> This puts us in a difficult position where the only visible solution,
> >> which does not involve adding new uapi, is to give more meaning to the
> >> generic submit fence. At least when used between the video engines.
> >>
> >> The special semantics this patch applies in that case are two fold. First
> >> is that both of the aligned pairs will be marked as non-preemptable and
> >> second is ensuring both are not sharing ELSP ports with any other context.
> >>
> >> Non-preemptable property will ensure that once the start has been aligned
> >> the requests remain executing until completion.
> >>
> >> Single port ELSP property will ensure there are no possible inversions
> >> between different independent pairs of aligned requests.
> >
> > Nak.
> >
> >> Going forward we can think of introducing new uapi to request this
> >> behaviour as a separate, more explicit flag, and at that point retire this
> >> semi-generic special handling.
> >
> > As CI will say, such behaviour will already need a new flag. Forcing
> > no-preemption should be a privileged flag, so I would expect some
> > acknowledge that this is a HW problem that we have to workaround.
>
> I don't know how hw exactly works so from my side it's only empirical.
>
> I agree new flag is needed but we also need a fix ASAP for TGL. And I
> don't think we can add new uapi in a reasonable time frame here. We
> would need the ctx set_parallel extension with a dont-preempt flag and
> multi-batch execbuf. And a lot of work in media-driver which will not be
> ready for TGL. I don't think a flag at the I915_EXEC_FENCE_SUBMIT level
> is a solution.
I'm going to say the starting point is to remove the
I915_ENGINE_HAS_PREEMPTION flag from vcs*. That way the tests that
require preemption will skip. Sadly that's the level of our current API.
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >> Suggested-by: Xiaogang Li <xiaogang.li at intel.com>
> >> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >> ---
> >> drivers/gpu/drm/i915/gt/intel_context.h | 2 +-
> >> drivers/gpu/drm/i915/gt/intel_lrc.c | 46 +++++++++++++++++++++----
> >> 2 files changed, 41 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> >> index 07be021882cc..576d11c0cdd9 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> >> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> >> @@ -212,7 +212,7 @@ intel_context_force_single_submission(const struct intel_context *ce)
> >> static inline void
> >> intel_context_set_single_submission(struct intel_context *ce)
> >> {
> >> - __set_bit(CONTEXT_FORCE_SINGLE_SUBMISSION, &ce->flags);
> >> + set_bit(CONTEXT_FORCE_SINGLE_SUBMISSION, &ce->flags);
> >> }
> >>
> >> static inline bool
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> index 3214a4ecc31a..88cf20fd92c8 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> @@ -1734,8 +1734,7 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
> >>
> >> static bool ctx_single_port_submission(const struct intel_context *ce)
> >> {
> >> - return (IS_ENABLED(CONFIG_DRM_I915_GVT) &&
> >> - intel_context_force_single_submission(ce));
> >> + return intel_context_force_single_submission(ce);
> >> }
> >>
> >> static bool can_merge_ctx(const struct intel_context *prev,
> >> @@ -4929,9 +4928,41 @@ static void execlists_park(struct intel_engine_cs *engine)
> >> cancel_timer(&engine->execlists.preempt);
> >> }
> >>
> >> +static void
> >> +mark_bonded_pair(struct i915_request *rq, struct i915_request *signal)
> >> +{
> >> + /*
> >> + * Give (temporary) special meaning to a pair requests with requested
> >> + * aligned start along the video engines.
> >> + *
> >> + * They should be non-preemptable and have all ELSP ports to themselves
> >> + * to avoid any deadlocks caused by inversions.
> >
> > This sentence needs expanding, because you are implying that this is an
> > issue we need to address in the code. If there is a deadlock resulting
> > from incorrect submission ordering, that is a bug in the code.
>
> If we add no-preempt I think we have to have single elsp because there
> is no ordering between unrelated pairs and then elsp inversion certainly
> sounds like a deadlock.
You don't have a deadlock between unrelated pairs :)
>
> ctx-A: vcs0 + vcs1
> ctx-B: vcs0 + vcs1
>
> There is no ordering between A and B but we'd have to pick same port for
> both pairs of A and B respectively. I don't see that we can do that
> since all four submissions are async.
>
> Regards,
>
> Tvrtko
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
More information about the Intel-gfx
mailing list