[Intel-gfx] [PATCH 1/4] drm/i915/gt: Remove platform comments from workarounds

Matt Roper matthew.d.roper at intel.com
Tue Jan 3 23:03:37 UTC 2023


On Fri, Dec 23, 2022 at 10:28:07AM -0800, Lucas De Marchi wrote:
> On Fri, Dec 23, 2022 at 09:02:35AM +0000, Tvrtko Ursulin wrote:
> > 
> > On 22/12/2022 15:55, Lucas De Marchi wrote:
> > > On Thu, Dec 22, 2022 at 10:27:00AM +0000, Tvrtko Ursulin wrote:
> > > > 
> > > > On 22/12/2022 08:25, Lucas De Marchi wrote:
> > > > > The comments are redundant to the checks being done to apply the
> > > > > workarounds and very often get outdated as workarounds need to be
> > > > > extended to new platforms or steppings.  Remove them altogether with
> > > > > the following matches (platforms extracted from intel_workarounds.c):
> > > > > 
> > > > >     find drivers/gpu/drm/i915/gt/ -name '*.c' | xargs sed -i -E \
> > > > > 's/(Wa.*):(bdw|chv|bxt|glk|skl|kbl|cfl|cfl|whl|cml|aml|chv|cl|bw|ctg|elk|ilk|snb|dg|pvc|g4x|ilk|gen|glk|kbl|cml|glk|kbl|cml|hsw|icl|ehl|ivb|hsw|ivb|vlv|kbl|pvc|rkl|dg|adl|skl|skl|bxt|blk|cfl|cnl|glk|snb|tgl|vlv|xehpsdv).*/\1/'
> > > > >     find drivers/gpu/drm/i915/gt/ -name '*.c' | xargs sed -i -E \
> > > > > 's/(Wa.*):(bdw|chv|bxt|glk|skl|kbl|cfl|cfl|whl|cml|aml|chv|cl|bw|ctg|elk|ilk|snb|dg|pvc|g4x|ilk|gen|glk|kbl|cml|glk|kbl|cml|hsw|icl|ehl|ivb|hsw|ivb|vlv|kbl|pvc|rkl|dg|adl|skl|skl|bxt|blk|cfl|cnl|glk|snb|tgl|vlv|xehpsdv).*\*\//\1
> > > > > 
> > > > > Same things was executed in the gem directory, omitted here for brevity.
> > > > 
> > > > > There were a few false positives that included the workaround
> > > > > description. Those were manually patched.
> > > > 
> > > > sed -E 's/(Wa[a-zA-Z0-9_]+)[:,]([a-zA-Z0-9,-_\+\[]{2,})/\1/'
> > > 
> > > then there are false negatives. We have Was in the form
> > > "Wa_xxx:tgl,dg2, mtl". False positives we can fixup, false negatives
> > > we simply don't see. After running that in gt/:
> > > 
> > > $ git grep ": mtl" -- drivers/gpu/drm/i915/
> > > drivers/gpu/drm/i915/gt/intel_gt_pm.c:  /* Wa_14017073508: mtl */
> > > drivers/gpu/drm/i915/gt/intel_gt_pm.c:  /* Wa_14017073508: mtl */
> > > drivers/gpu/drm/i915/gt/intel_gt_pm.c:  /* Wa_14017073508: mtl */
> > > drivers/gpu/drm/i915/gt/intel_gt_pm.c:  /* Wa_14017073508: mtl */
> > > drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c:       * Wa_14017073508: mtl
> > > drivers/gpu/drm/i915/i915_reg.h:/* Wa_14017210380: mtl */
> > > 
> > > I was going with the platform names to avoid the false
> > > negatives and because I was entertaining the idea of only doing this for
> > > latest platforms where we do have the "Wa_[[:number:]]" form
> > > 
> > > > 
> > > > Maybe..
> > > > 
> > > > Matt recently said he has this worked planned, but more
> > > > importantly - I gather then that the WA lookup tool definitely
> > > > does not output these strings?
> > > 
> > > Whatever it does it's true only at the time it's called. It simply
> > > tells what
> > > are the platforms and steppings the Wa applies to. We can change the
> > > output to whatever we want, but that is not the point.
> > > Those comments get stale and bring no real value as they match 1:1
> > > what the code is supposed to be doing. Several times a patch has to
> > > update just that comment to "extend a workaround" to a next platform.
> > > This is not always done, so we get a comment that doesn't match what is
> > > supposed to be there.
> > 
> > Tl;dr; version - lets park this until January and discuss once everyone
> > is back.
> 
> I'll leave my comment here since I will be out until mid January.
> 
> > 
> > Longer version. I've been trying to get us talking about this a couple
> > times before and I'd really like to close with an explicit consensus,
> > discussion points addressed instead of skipped and just moving ahead
> > with patches.
> > 
> > References:
> >  3fcf71b9-337f-6186-7b00-27cbfd116743 at linux.intel.com
> >  Y5j0b/bykbitCa4Q at mdroper-desk1.amr.corp.intel.com
> > 
> > So point I wanted to discuss is whether these comments are truly useless
> > or maybe they can help during review. If the tool can actually output
> > them then I am leaning towards that they can be.
> 
> I consider "can the tool output xyz?" asking the wrong question.
> "The tool", which is our own small python script querying a database can
> output anything like that if we want to. The database has information of
> what are the platforms/steppings for each the WA is known to be applied
> *today*. And that can change and do change often, particularly for early
> steppings and recent platforms.
> 
> > Thought is, when a patch comes for review adding a new platform,
> > stepping, whatever, to an existing if condition, if it contains the
> > comments reviewer can more easily spot a hyphotetical logic inversion
> > error or similar. It is also trivial to check that both condition and
> > comment have been updated. (So lets not be rash and remove something
> > maybe useful just because it can go stale *only if* reviewers are not
> > giving sufficient attention that changes are made in tandem.)
> 
> I can argue to the other side too. We don't have comments in the kernel
> like
> 
> 	/* Add 1 to i */
> 	i += 1;
> 
> This is exactly what these comments are doing. And they are misleading

Yeah, this is exactly my feeling for why the platform/steppings should
just be removed from the comments.  Keeping the workaround numbers is
useful for quickly jumping to the implementation(s) of the workaround in
the code with grep, but everything else is (at best) just re-stating
exactly what the next lines of code should say.  During review, the code
itself is what really matters, and comments (which might become stale or
incorrect) just make it easier for mistakes to get overlooked.

I haven't reviewed the patch in detail, but from my point of view,

Acked-by: Matt Roper <matthew.d.roper at intel.com>

on the general direction here.


Matt

> and may introduce bugs rather than helping reviewing:
> 
> 	Wa_12345:tgl[a0,c0)
> 	if (IS_TGL_GRAPHICS_STEP(STEP_A0, STEP_B0)
> 
> One might read the comment, skipping over the condition and thinking
> "ok, we already extended this WA to B* steppings, which doesn't match
> the code.
> 
> 
> > From a slightly different angle - do we expect anyone reviewing
> > workaround patches will cross-check against the tool? Would it be
> > simpler and more efficient that they could just cross-check against the
> > comment output from the tool and put into the patch by the author?
> 
> see above. Someone cross-checking the comment is cross-checking the
> wrong thing. As I said, it happens more on early enabling of a platform.
> 
> > And point here to stress out is that accidental logic errors (missed
> > workarounds) can be super expensive to debug in the field. Sometimes it
> > can literally take _months_ for sporadic and hard to reproduce issues to
> > get debugged, handed over between the teams, etc. So any way in which we
> > can influence the likelyhood of that happening is something to weigh
> > carefully.
> 
> yes, that's why I want to remove the comments: from my experience they
> are more a source of bugs rather than helping.
> 
> > Secondary but also important - if i915 is end of line then an extra why
> > we want to rip out this for ancient platforms. Is the cost/benefit
> > positive there?
> 
> yep, here I agree and was my argument about using the platform names
> rather than a more "catch all" regex. I think doing this only for tgl+
> platforms or even dg2+ would be ok.
> 
> > As a side note, and going back to the question of what the tool can
> > output. Long time ago I had an idea where we could improve all this by
> > making it completely data-driven. Have the WA database inspecting tool
> > output a table which could be directly pasted into code and interpreted
> > by i915.
> > 
> > For reference look at intel_workarounds_table.h in
> > https://patchwork.freedesktop.org/patch/399377/?series=83580&rev=3 and
> > see what you thing. That was just a sketch of the idea, not complete,
> > and yes, i915 end of life point makes it moot.
> 
> now that xe is announced I can talk about this part... this was more
> or less what I implemented in xe: it's a table with
> "register + condition + action". There are the most common condition
> checks builtin + a function hook for the more advanced ones. During
> binding the driver walks the table and coalesces the entries creating
> a per-register value that can be used at the proper times, depending if
> they are gt, engine, context workarounds.
> 
> Lucas De Marchi
> 
> > 
> > Regards,
> > 
> > Tvrtko

-- 
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation


More information about the Intel-gfx mailing list