[Intel-gfx] [PATCH 02/21] drm/i915/tgl: Fix macros for TGL SOC based WA

Tue Nov 24 13:14:01 UTC 2020

On Mon, Nov 23, 2020 at 05:32:22PM -0800, Aditya Swarup wrote:
>On 11/18/20 1:18 AM, Jani Nikula wrote:
>> On Tue, 17 Nov 2020, Lucas De Marchi <lucas.demarchi at intel.com> wrote:
>>> On Tue, Nov 17, 2020 at 10:50:10AM -0800, Aditya Swarup wrote:
>>>> @@ -1579,9 +1579,9 @@ static inline const struct i915_rev_steppings *
>>>> tgl_revids_get(struct drm_i915_private *dev_priv)
>>>> {
>>>> 	if (IS_TGL_U(dev_priv) || IS_TGL_Y(dev_priv))
>>>> -		return tgl_uy_revids;
>>>> +		return tgl_uy_revids + INTEL_REVID(dev_priv);
>>>
>>> oohh, no. You have to at least check you are not accessing out of
>>> bounds. New HW running on old kernel should not access create invalid
>>> accesses like this.
>>
>> And this is just one reason why exposing arrays directly as an interface
>> to the rest of the driver is a bad idea. Basically I look at *all*
>> externs in the driver with suspicion, and they're all exceptions that
>> should not be repeated. The revid arrays are a direct invitation to keep
>> adding more and more extern arrays. And more ways to go out of bounds.
>
>We definitely need an array table for the SOC -> Display, GT stepping mapping.

the mapping could be very well in the define iff you don't have
different mappings per sku as is the case with TGL. Example:

#define ADLS_REVID_A0		0
#define ADLS_REVID_A1		5

#define ADLS_DISP_REVID_A0	0
#define ADLS_DISP_REVID_B0	5

The actual value is actually the *SoC* revid, regardless the name of the
macro. Since we already have to use a different macro -
IS_DISP_REVID() - I don't think this is much worse and would allow us to
get rid of the table *for ADL-S*, at the expense of having to pass as
argument the ADLS_DISP_REVID_*.  However this doesn't apply to TGL as TGL
has a different mapping per sku.

>SOC steppings were usually the same as display steppings/GT steppings until TGL and therefore
>didn't require special mapping cases. But from TGL onwards, we have different combinations of
>Disp and GT steppings per SOC stepping. Alderlake-S makes this direct mapping even more difficult
>without the array requiring more macros to deal with SOC -> DISP/GT stepping differences.
>
>Will fix the array bound checks but the possibility of SOC revision id from drm struct going
>out of bounds is minimal. Can only happen if we don't have support for latest SOC -> Disp/GT table

this is very common. It's just a matter of trying to run a slightly old
kernel in a slightly newer rev of the hardware.

>for TGL from Bspec and if we are picking up wrong revision id from drm struct that means the platform
>information obtained itself is wrong which will be a general platform problem unrelated to Gfx driver.

Nothing else should really be a problem. We don't really use the revid
much, mostly for WAs. And if other parts of the kernel are trying to use
the SoC revid, then they are reading that info themselves, not using
something we read.

We are simply reading the revid from hardware and using that value
without checking and that needs to change.

>
>>
>> I'd rather we seek for ways to either nuke the revid arrays altogether,
>> or encapsulate them within a .c file with static scope.
>
>I don't think we should nuke the revid arrays but I agree with finding a more appropriate place to
>parse the gt/display stepping info. This should be an exercise for a later patch that takes
>care of kbl,tgl and adl-s mappings.
>
>>
>> And for that .c file... the arrays are now in gt/intel_workarounds.c
>> which is a really weird place for stuff that's used for generic stepping
>> info, and particularly for *display* stepping info.
>
>I agree and we can change the approach with a different patch later.
>
>>
>> BR,
>> Jani.
>>
>>
>