[PATCH 1/4] drm/i915/gt: Remove platform comments from workarounds

Lucas De Marchi lucas.demarchi at intel.com
Fri Dec 23 18:28:07 UTC 2022


On Fri, Dec 23, 2022 at 09:02:35AM +0000, Tvrtko Ursulin wrote:
>
>On 22/12/2022 15:55, Lucas De Marchi wrote:
>>On Thu, Dec 22, 2022 at 10:27:00AM +0000, Tvrtko Ursulin wrote:
>>>
>>>On 22/12/2022 08:25, Lucas De Marchi wrote:
>>>>The comments are redundant to the checks being done to apply the
>>>>workarounds and very often get outdated as workarounds need to be
>>>>extended to new platforms or steppings.  Remove them altogether with
>>>>the following matches (platforms extracted from intel_workarounds.c):
>>>>
>>>>    find drivers/gpu/drm/i915/gt/ -name '*.c' | xargs sed -i -E \
>>>>'s/(Wa.*):(bdw|chv|bxt|glk|skl|kbl|cfl|cfl|whl|cml|aml|chv|cl|bw|ctg|elk|ilk|snb|dg|pvc|g4x|ilk|gen|glk|kbl|cml|glk|kbl|cml|hsw|icl|ehl|ivb|hsw|ivb|vlv|kbl|pvc|rkl|dg|adl|skl|skl|bxt|blk|cfl|cnl|glk|snb|tgl|vlv|xehpsdv).*/\1/'
>>>>    find drivers/gpu/drm/i915/gt/ -name '*.c' | xargs sed -i -E \
>>>>'s/(Wa.*):(bdw|chv|bxt|glk|skl|kbl|cfl|cfl|whl|cml|aml|chv|cl|bw|ctg|elk|ilk|snb|dg|pvc|g4x|ilk|gen|glk|kbl|cml|glk|kbl|cml|hsw|icl|ehl|ivb|hsw|ivb|vlv|kbl|pvc|rkl|dg|adl|skl|skl|bxt|blk|cfl|cnl|glk|snb|tgl|vlv|xehpsdv).*\*\//\1
>>>>
>>>>Same things was executed in the gem directory, omitted here for brevity.
>>>
>>>>There were a few false positives that included the workaround
>>>>description. Those were manually patched.
>>>
>>>sed -E 's/(Wa[a-zA-Z0-9_]+)[:,]([a-zA-Z0-9,-_\+\[]{2,})/\1/'
>>
>>then there are false negatives. We have Was in the form
>>"Wa_xxx:tgl,dg2, mtl". False positives we can fixup, false negatives
>>we simply don't see. After running that in gt/:
>>
>>$ git grep ": mtl" -- drivers/gpu/drm/i915/
>>drivers/gpu/drm/i915/gt/intel_gt_pm.c:  /* Wa_14017073508: mtl */
>>drivers/gpu/drm/i915/gt/intel_gt_pm.c:  /* Wa_14017073508: mtl */
>>drivers/gpu/drm/i915/gt/intel_gt_pm.c:  /* Wa_14017073508: mtl */
>>drivers/gpu/drm/i915/gt/intel_gt_pm.c:  /* Wa_14017073508: mtl */
>>drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c:       * Wa_14017073508: mtl
>>drivers/gpu/drm/i915/i915_reg.h:/* Wa_14017210380: mtl */
>>
>>I was going with the platform names to avoid the false
>>negatives and because I was entertaining the idea of only doing this for
>>latest platforms where we do have the "Wa_[[:number:]]" form
>>
>>>
>>>Maybe..
>>>
>>>Matt recently said he has this worked planned, but more 
>>>importantly - I gather then that the WA lookup tool definitely 
>>>does not output these strings?
>>
>>Whatever it does it's true only at the time it's called. It simply 
>>tells what
>>are the platforms and steppings the Wa applies to. We can change the
>>output to whatever we want, but that is not the point.
>>Those comments get stale and bring no real value as they match 1:1
>>what the code is supposed to be doing. Several times a patch has to
>>update just that comment to "extend a workaround" to a next platform.
>>This is not always done, so we get a comment that doesn't match what is
>>supposed to be there.
>
>Tl;dr; version - lets park this until January and discuss once 
>everyone is back.

I'll leave my comment here since I will be out until mid January.

>
>Longer version. I've been trying to get us talking about this a couple 
>times before and I'd really like to close with an explicit consensus, 
>discussion points addressed instead of skipped and just moving ahead 
>with patches.
>
>References:
>  3fcf71b9-337f-6186-7b00-27cbfd116743 at linux.intel.com
>  Y5j0b/bykbitCa4Q at mdroper-desk1.amr.corp.intel.com
>
>So point I wanted to discuss is whether these comments are truly 
>useless or maybe they can help during review. If the tool can actually 
>output them then I am leaning towards that they can be.

I consider "can the tool output xyz?" asking the wrong question.
"The tool", which is our own small python script querying a database can
output anything like that if we want to. The database has information of
what are the platforms/steppings for each the WA is known to be applied
*today*. And that can change and do change often, particularly for early
steppings and recent platforms.

>Thought is, when a patch comes for review adding a new platform, 
>stepping, whatever, to an existing if condition, if it contains the 
>comments reviewer can more easily spot a hyphotetical logic inversion 
>error or similar. It is also trivial to check that both condition and 
>comment have been updated. (So lets not be rash and remove something 
>maybe useful just because it can go stale *only if* reviewers are not 
>giving sufficient attention that changes are made in tandem.)

I can argue to the other side too. We don't have comments in the kernel
like

	/* Add 1 to i */
	i += 1;

This is exactly what these comments are doing. And they are misleading
and may introduce bugs rather than helping reviewing:

	Wa_12345:tgl[a0,c0)
	if (IS_TGL_GRAPHICS_STEP(STEP_A0, STEP_B0)

One might read the comment, skipping over the condition and thinking
"ok, we already extended this WA to B* steppings, which doesn't match
the code.


>From a slightly different angle - do we expect anyone reviewing 
>workaround patches will cross-check against the tool? Would it be 
>simpler and more efficient that they could just cross-check against 
>the comment output from the tool and put into the patch by the author?

see above. Someone cross-checking the comment is cross-checking the
wrong thing. As I said, it happens more on early enabling of a platform.

>And point here to stress out is that accidental logic errors (missed 
>workarounds) can be super expensive to debug in the field. Sometimes 
>it can literally take _months_ for sporadic and hard to reproduce 
>issues to get debugged, handed over between the teams, etc. So any way 
>in which we can influence the likelyhood of that happening is 
>something to weigh carefully.

yes, that's why I want to remove the comments: from my experience they
are more a source of bugs rather than helping.

>Secondary but also important - if i915 is end of line then an extra 
>why we want to rip out this for ancient platforms. Is the cost/benefit 
>positive there?

yep, here I agree and was my argument about using the platform names
rather than a more "catch all" regex. I think doing this only for tgl+
platforms or even dg2+ would be ok.

>As a side note, and going back to the question of what the tool can 
>output. Long time ago I had an idea where we could improve all this by 
>making it completely data-driven. Have the WA database inspecting tool 
>output a table which could be directly pasted into code and 
>interpreted by i915.
>
>For reference look at intel_workarounds_table.h in 
>https://patchwork.freedesktop.org/patch/399377/?series=83580&rev=3 and 
>see what you thing. That was just a sketch of the idea, not complete, 
>and yes, i915 end of life point makes it moot.

now that xe is announced I can talk about this part... this was more
or less what I implemented in xe: it's a table with
"register + condition + action". There are the most common condition
checks builtin + a function hook for the more advanced ones. During
binding the driver walks the table and coalesces the entries creating
a per-register value that can be used at the proper times, depending if
they are gt, engine, context workarounds.

Lucas De Marchi

>
>Regards,
>
>Tvrtko


More information about the dri-devel mailing list