[Intel-gfx] [PATCH v2 2/4] drm/i915: Verify the engine workarounds stick on application

Chris Wilson chris at chris-wilson.co.uk
Tue Apr 16 10:49:58 UTC 2019


Quoting Tvrtko Ursulin (2019-04-16 11:36:12)
> 
> On 16/04/2019 10:57, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-04-16 10:43:43)
> >>
> >> On 16/04/2019 09:10, Chris Wilson wrote:
> >>> Read the engine workarounds back using the GPU after loading the initial
> >>> context state to verify that we are setting them correctly, and bail if
> >>> it fails.
> >>>
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/i915_gem.c               |   6 +
> >>>    drivers/gpu/drm/i915/intel_workarounds.c      | 120 ++++++++++++++++++
> >>>    drivers/gpu/drm/i915/intel_workarounds.h      |   2 +
> >>>    .../drm/i915/selftests/intel_workarounds.c    |  53 +-------
> >>>    4 files changed, 134 insertions(+), 47 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>> index 0a818a60ad31..95ae69753e91 100644
> >>> --- a/drivers/gpu/drm/i915/i915_gem.c
> >>> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >>> @@ -4717,6 +4717,12 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
> >>>                i915_request_add(rq);
> >>>                if (err)
> >>>                        goto err_active;
> >>> +
> >>> +             if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) &&
> >>> +                 intel_engine_verify_workarounds(engine, "load")) {
> >>> +                     err = -EIO;
> >>> +                     goto err_active;
> >>
> >> The two submission use different contexts timelines so strictly speaking
> >> should be somehow explicitly serialized.
> > 
> > Yes, we are reading from the kernel context. Neither of these contexts
> > will overwrite the existing register state on first load, and there is
> > nothing to prevent fifo here. But it doesn't matter which order the read
> > is done, as the w/a are set by mmio, not engine->init_context().
> > Although I've been contemplating using init_context in case LRI work
> > better.
> 
> Using init_context for what, engine wa/?

Yeah, just thinking of other ways to try and hammer icl-2/3 (or
whichever fail today) into submission.

> > It could be in a separate loop beforehand, the only catch is that we
> > expect to leave this function with the GT idle, so it's best done before
> > we park.
> 
> Yep since the placement in the ->init_context loop confused me I think a 
> separate loop would indeed be better.
> 
> __i915_gem_restart_engines looks like the place, to follow 
> engine->init_hw which applies the engine workarounds.

Hmm. That may restart with a full engine and a probe take an
unnecessarily, if not dangerously, long time.

load_power_context() perhaps. Bleugh, even on resume, we may have
restarted persistent batches in the near future.

So far I've talked myself into that we only know the system is idle at
init, and when we park the engines. We could squeeze a sanity check into
the parking.

> >>> +     i915_request_add(rq);
> >>> +     if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) {
> >>
> >> Wouldn't:
> >>
> >> err = i915_request_wait(...
> >> if (err < 0 || !i915_request_completed())
> >>
> >> be more correct?
> > 
> > It can't return for any other reason as it's an uninterruptible wait, so
> > it times out, or the request is completed.
> > 
> > If you want to consider that hangcheck could have kicked in the 0.2ms
> > and cancelled the request, then we could need to check
> > request->fence.error.
> 
> I was going by the comment for i915_request_wait which says "Returns the 
> remaining time (in jiffies) if the request completed, which may be zero 
> or -ETIME if the request is unfinished after the timeout expires.". So 
> two timeout conditions. Anyways, I know it is highly hypothetical if that.

Hence why we check for < 0, though it could just be -ETIME.

Hmm, the sentence slightly runs on, it's the remaining time that may be
zero if the request completed just in the nick of time. -ETIME is always
the response if the request is not completed at timeout.
-Chris


More information about the Intel-gfx mailing list