[Intel-gfx] [PATCH] drm/i915/gt: prepare reset based on reset domain
Rodrigo Vivi
rodrigo.vivi at intel.com
Fri Feb 11 23:00:31 UTC 2022
On Thu, Dec 16, 2021 at 09:52:26AM +0000, Tvrtko Ursulin wrote:
>
>
> On 09/12/2021 12:01, Tejas Upadhyay wrote:
> > Most code paths does full reset with preparing all
> > engines for reset except below two :
> >
> > 1. Single engine reset needs to prepare engines for
> > reset based on its reset domain. In __intel_engine
> > _reset_bh is a place needs loop over to do engine
> > prepare for all engines which are in same reset
> > domain before triggering reset.
> >
> > 2. enable_error_interrupt() in drivers/gpu/drm/i915/
> > gt/intel_execlists_submission.c needs similar change.
> >
> > whenever there is full reset done, engine prepare for
> > all engines are already being called right now before
> > actual reset triggered, except above two scenario
> > seeking single engine reset.
> >
> > Note: Requirement of this change is occurred recently
> > because whenever engine does reset, all engines in
> > same reset domain gets reset and in case engine goes
> > for reset before stopping CS or applying required W/A,
> > there are high chances of hang/crash. reset_prepare_
> > engine takes care of it.
>
> On the trivial level, please wrap your commit messages to standard 75 wide.
> See Documentation/process/submitting-patches.rst.
>
> >
> > Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay at intel.com>
> > ---
> > drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 9 +++++++++
> > drivers/gpu/drm/i915/gt/intel_reset.c | 12 ++++++++++--
> > drivers/gpu/drm/i915/gt/intel_reset.h | 1 +
> > 3 files changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index a69df5e9e77a..668e7ba5b254 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -2806,6 +2806,15 @@ static void enable_error_interrupt(struct intel_engine_cs *engine)
> > drm_err(&engine->i915->drm,
> > "engine '%s' resumed still in error: %08x\n",
> > engine->name, status);
> > + if (engine->reset_domain) {
> > + struct intel_engine_cs *nengine;
> > + enum intel_engine_id id;
> > +
> > + for_each_engine(nengine, engine->gt, id)
> > + if (nengine->reset_domain ==
> > + engine->reset_domain)
> > + reset_prepare_engine(nengine);
> > + }
> > __intel_gt_reset(engine->gt, engine->mask);
> > }
> > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> > index 63199f0550e6..454d6ab1d9f4 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> > @@ -705,7 +705,7 @@ int intel_reset_guc(struct intel_gt *gt)
> > * Ensure irq handler finishes, and not run again.
> > * Also return the active request so that we only search for it once.
> > */
> > -static void reset_prepare_engine(struct intel_engine_cs *engine)
> > +void reset_prepare_engine(struct intel_engine_cs *engine)
> > {
> > /*
> > * During the reset sequence, we must prevent the engine from
> > @@ -1167,7 +1167,15 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
> > if (!intel_engine_pm_get_if_awake(engine))
> > return 0;
> > - reset_prepare_engine(engine);
> > + if (engine->reset_domain) {
>
> Can it be zero?
>
> > + struct intel_engine_cs *nengine;
> > + enum intel_engine_id id;
> > +
> > + for_each_engine(nengine, gt, id)
> > + if (nengine->reset_domain ==
> > + engine->reset_domain)
> > + reset_prepare_engine(nengine);
>
> Having glanced over the discussion about the workaround you are trying to
> implement a few times, this is what I think.
>
> First of all I think you are definitely missing the engine resume side of
> things. Consider this entry point:
>
> execlists_submission_tasklet
> execlists_reset
> intel_engine_reset
> __intel_engine_reset_bh
>
> AFAICT you can halt more than one engine but you will not resume them all.
> What I mean here is that __intel_engine_reset, intel_engine_resume,
> reset_finish_engine and even the engine pm management end up being called
> for only one engine. Also absence of locking via I915_RESET_ENGINE in gt
> reset flags.
>
> Secondly, the question on whether it is acceptable to corrupt the state for
> the other engine seems not to have a 100% clear answer. Can this be
> confirmed? Because in the ideal world you would really need to preempt to
> idle the other engines from the engine group and only then would be able to
> proceed.
we shouldn't do the reset without checking if it is idle or preempting to idle.
and handle the FORCE_WAKE case as well...
>
> If that is considered a problem for later, with the first step being adding
> of a simpler workaround to prevent a more serious lockup, then it may be
> acceptable with some tweaks.
>
> I suspect there isn't a proper IGT for this otherwise you would have noticed
> the failure to resume the coupled engines? So I think next step is to write
> that IGT and then make sure code is robust.
>
> Sufficient testing will probably drive the implementation.
>
> You may for instance need to lock all engines for reset starting from
> execlists_reset() and you may end up uncovering locking/ordering challenges
> as you go.
and everything else that Tvrtko said... we cannot go with this as is without
ensuring that we have a good flow with the scheduler.
>
> Regards,
>
> Tvrtko
>
> > + }
> > if (msg)
> > drm_notice(&engine->i915->drm,
> > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.h b/drivers/gpu/drm/i915/gt/intel_reset.h
> > index adc734e67387..7abd5d49f0e5 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_reset.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_reset.h
> > @@ -28,6 +28,7 @@ void intel_gt_handle_error(struct intel_gt *gt,
> > const char *fmt, ...);
> > #define I915_ERROR_CAPTURE BIT(0)
> > +void reset_prepare_engine(struct intel_engine_cs *engine);
> > void intel_gt_reset(struct intel_gt *gt,
> > intel_engine_mask_t stalled_mask,
> > const char *reason);
> >
More information about the Intel-gfx
mailing list