[RFC PATCH] i915/gt: Reapply workarounds in case the previous attempt failed.
Matt Roper
matthew.d.roper at intel.com
Mon Dec 16 21:27:51 UTC 2024
On Thu, Dec 12, 2024 at 03:51:12PM +0100, Andi Shyti wrote:
> Hi Rodrigo,
>
> On Fri, Dec 06, 2024 at 10:38:24AM -0500, Rodrigo Vivi wrote:
> > On Thu, Dec 05, 2024 at 03:47:35PM +0000, Sebastian Brzezinka wrote:
> > > `wa_verify`sporadically detects lost workaround on application; this
> > > is unusual behavior since wa are applied at `intel_gt_init_hw` and
> > > verified right away by `intel_gt_verify_workarounds`, and `wa_verify`
> > > doesn't fail on initialization as one might suspect would happen.
> > >
> > > One approach that may be somewhat beneficial is to reapply workarounds
> > > in the event of failure, or even get rid of verify on application,
> > > since it's redundant to `intel_gt_verify_workarounds`.
> > >
> > > This patch aims to resolve: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12668
> >
> > It should be:
> >
> > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12668
>
> aapart from the formatting issues this was suggested by me. We
> have observed some sporadic vailures in applying the specific
> workaround added by Ville (now cc'ed to the thread) in commit
> 0ddae025ab6c ("drm/i915: Disable compression tricks on JSL").
>
> Because it's sporadic, we could give it one more chance and try
> to re-apply it.
That sounds like it's just papering over the issue without really
figuring out what's truly going on.
Looking at the current implementation, it looks like at least one
possible problem is that it was implemented in rcs_engine_wa_init, but
the CACHE_MODE_0 register itself is part of the LRC (according to bspec
18907). So we want to move it to icl_ctx_workarounds_init() instead to
make sure it gets recorded in the golden context image. Our
initialization and reset handling for workarounds touching registers in
the context are different from those that aren't.
BTW, I'm a bit surprised to see us needing to implement this workaround
in the kernel at all. CACHE_MODE_0 is a register that's under userspace
control (according to bspec 14181), so we usually expect the userspace
drivers to own implementing any workarounds dealing with the registers
they control. Indeed, it looks like Mesa's Iris driver already has an
implementation of this workaround in iris_state.c:
if (devinfo->disable_ccs_repack) {
iris_emit_reg(batch, GENX(CACHE_MODE_0), reg) {
reg.DisableRepackingforCompression = true;
reg.DisableRepackingforCompressionMask = true;
}
}
and that workaround was added back in mid-2019 so it should be in all
recent Mesa releases.
Matt
>
> > >
> > > Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka at intel.com>
> > > ---
> > > drivers/gpu/drm/i915/gt/intel_workarounds.c | 11 +++++++++++
> > > 1 file changed, 11 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > > index 570c91878189..4ee623448223 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > > @@ -1761,6 +1761,17 @@ static void wa_list_apply(const struct i915_wa_list *wal)
> > > intel_gt_mcr_read_any_fw(gt, wa->mcr_reg) :
> > > intel_uncore_read_fw(uncore, wa->reg);
> > >
> > > + if ((val ^ wa->set) & wa->read) {
> > > + if (wa->is_mcr)
> > > + intel_gt_mcr_multicast_write_fw(gt, wa->mcr_reg, val);
> > > + else
> > > + intel_uncore_write_fw(uncore, wa->reg, val);
> > > + }
> >
> > instead of duplicating the code you should extract that to an aux function
> > to write it...
>
> a for loop can decrease the amount of duplicated code.
>
> > > +
> > > + val = wa->is_mcr ?
> > > + intel_gt_mcr_read_any_fw(gt, wa->mcr_reg) :
> > > + intel_uncore_read_fw(uncore, wa->reg);
> >
> > and another one to read it...
>
> this, indeed it's just reading, but we are trying to re-write. If
> we wrote the unwanted value, we will keep reading the unwanted
> value.
>
> > > +
> > > wa_verify(gt, wa, val, wal->name, "application");
> >
> > However my biggest concern with this patch is the brute force solution
> > and only on CONFIG_DRM_I915_DEBUG_GEM case...
>
> this is a good point, indeed, I don't understand why the
> confirmation should be within the DEBUG section.
>
> Andi
>
> > and as duplication because I see that the second write attempt is
> > already happening above if (val != old || !wa->clr)
> >
> > So, something is not quite right in here and this deserves another alternative..
> >
> >
> > > }
> > > }
> > > --
> > > 2.34.1
> > >
--
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation
More information about the Intel-gfx
mailing list