[Intel-gfx] [PATCH 2/4] drm/i915: Verify the engine workarounds stick on application

Mon Apr 15 10:45:47 UTC 2019

Quoting Tvrtko Ursulin (2019-04-15 11:41:43)
> 
> On 13/04/2019 13:58, Chris Wilson wrote:
> > Read the engine workarounds back using the GPU after loading the initial
> > context state to verify that we are setting them correctly, and bail if
> > it fails.
> 
> Aren't the context wa/ ones we expect to see saved in the context? As 
> such, what difference do you expect to see between mmio and srm 
> verification? Should even both be done?

I was following a hunch that maybe these were being saved in the
powercontext and so we needed an active engine to be able to read --
just ruling out that maybe forcewake wasn't enough.

> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c               |   6 +
> >   drivers/gpu/drm/i915/intel_workarounds.c      | 120 ++++++++++++++++++
> >   drivers/gpu/drm/i915/intel_workarounds.h      |   2 +
> >   .../drm/i915/selftests/intel_workarounds.c    |  53 +-------
> >   4 files changed, 134 insertions(+), 47 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 0a818a60ad31..95ae69753e91 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4717,6 +4717,12 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
> >               i915_request_add(rq);
> >               if (err)
> >                       goto err_active;
> > +
> > +             if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) &&
> > +                 intel_engine_verify_workarounds(engine, "load")) {
> > +                     err = -EIO;
> > +                     goto err_active;
> 
> I am not sure that wedging is required even if only in debug builds. Is 
> DRM_ERROR not enough to flag failures?

My thinking was if we can't apply workarounds to avoid known issues in HW,
further testing of HW was suspect.

Make the error as obnoxious as possible so that we have no choice but to
fix it before carrying on with testing.
-Chris