[Intel-gfx] [PATCH 02/21] drm/i915/tgl: Fix macros for TGL SOC based WA

Wed Nov 25 08:36:19 UTC 2020

On Tue, 24 Nov 2020, Aditya Swarup <aditya.swarup at intel.com> wrote:
> On 11/24/20 12:11 PM, Lucas De Marchi wrote:
>> +Matt Roper, see question in item (3) below
>> 
>> On Tue, Nov 24, 2020 at 04:20:40PM +0200, Jani Nikula wrote:
>>> On Tue, 24 Nov 2020, Lucas De Marchi <lucas.demarchi at intel.com> wrote:
>>>> On Mon, Nov 23, 2020 at 05:32:22PM -0800, Aditya Swarup wrote:
>>>>> On 11/18/20 1:18 AM, Jani Nikula wrote:
>>>>>> On Tue, 17 Nov 2020, Lucas De Marchi <lucas.demarchi at intel.com> wrote:
>>>>>>> On Tue, Nov 17, 2020 at 10:50:10AM -0800, Aditya Swarup wrote:
>>>>>>>> @@ -1579,9 +1579,9 @@ static inline const struct i915_rev_steppings *
>>>>>>>> tgl_revids_get(struct drm_i915_private *dev_priv)
>>>>>>>> {
>>>>>>>>     if (IS_TGL_U(dev_priv) || IS_TGL_Y(dev_priv))
>>>>>>>> -        return tgl_uy_revids;
>>>>>>>> +        return tgl_uy_revids + INTEL_REVID(dev_priv);
>>>>>>>
>>>>>>> oohh, no. You have to at least check you are not accessing out of
>>>>>>> bounds. New HW running on old kernel should not access create invalid
>>>>>>> accesses like this.
>>>>>>
>>>>>> And this is just one reason why exposing arrays directly as an interface
>>>>>> to the rest of the driver is a bad idea. Basically I look at *all*
>>>>>> externs in the driver with suspicion, and they're all exceptions that
>>>>>> should not be repeated. The revid arrays are a direct invitation to keep
>>>>>> adding more and more extern arrays. And more ways to go out of bounds.
>>>>>
>>>>> We definitely need an array table for the SOC -> Display, GT stepping mapping.
>>>>
>>>> the mapping could be very well in the define iff you don't have
>>>> different mappings per sku as is the case with TGL. Example:
>>>>
>>>> #define ADLS_REVID_A0        0
>>>> #define ADLS_REVID_A1        5
>>>>
>>>> #define ADLS_DISP_REVID_A0    0
>>>> #define ADLS_DISP_REVID_B0    5
>>>>
>>>> The actual value is actually the *SoC* revid, regardless the name of the
>>>> macro. Since we already have to use a different macro -
>>>> IS_DISP_REVID() - I don't think this is much worse and would allow us to
>>>> get rid of the table *for ADL-S*, at the expense of having to pass as
>>>> argument the ADLS_DISP_REVID_*.  However this doesn't apply to TGL as TGL
>>>> has a different mapping per sku.
>>>>
>>>>
>>>>> SOC steppings were usually the same as display steppings/GT steppings until TGL and therefore
>>>>> didn't require special mapping cases. But from TGL onwards, we have different combinations of
>>>>> Disp and GT steppings per SOC stepping. Alderlake-S makes this direct mapping even more difficult
>>>>> without the array requiring more macros to deal with SOC -> DISP/GT stepping differences.
>>>>>
>>>>> Will fix the array bound checks but the possibility of SOC revision id from drm struct going
>>>>> out of bounds is minimal. Can only happen if we don't have support for latest SOC -> Disp/GT table
>>>>
>>>> this is very common. It's just a matter of trying to run a slightly old
>>>> kernel in a slightly newer rev of the hardware.
>>>
>>> Indeed. All kernels released with the arrays are simply bust for any new
>>> hardware revisions. They'll need a minimal Cc: stable fix.
>>>
>>> Here's something I drafted [1] to fix the situation more
>>> generally. There are still some issues to overcome, though they exist
>>> already in the current code.
>>>
>>> This could be followed up with converting *all* platforms to the scheme,
>>> making it universal, regardless of whether the revids in the hardware
>>> are consecutive or not.
>>>
>>> BR,
>>> Jani.
>>>
>>>
>>> [1] https://cgit.freedesktop.org/~jani/drm/log/?h=revid-stepping-scheme
>> 
>> That is looking good.  Some feedback I can give before this series being
>> sent for review:
>
> I like this approach as well and we were discussing this in the ADLS
> rev ID thread. With the tables it makes it simpler to manage rather
> than worrying about individual macros. Jani do you want me to rebase
> ADLS changes on top of your patches and resubmit your patches for
> review or you will be submitting this series yourself?

Please proceed with ADL as you were, I'll do the refactoring afterwards
on top. It'll take a while anyway, we don't want to delay ADL with
this. Just add the required bounds checks to the ADL patches.

>> 
>> 1) You need to call the init function from somewhere

Yeah, that's a FIXME in a commit message. Draft patches and all that. ;)
I'll need to figure out where and how early we need to set this up, as
it needs to be set up before any users, obviously.

>> 2) For the FIXMEs:
>> 
>> +    /*
>> +     * FIXME: We should be able to take into account new revids not
>> +     * recognized by this kernel version.
>> +     */
>> 
>> +    /*
>> +     * FIXME: We should be able to handle gaps in revid arrays gracefully,
>> +     * and in a way that works sensibly for the range checks. This is true
>> +     * for the existing revid range checks; it's fine if a new id pops up in
>> +     * the middle.
>> +     *
>> +     * It's okay for the display stepping to be zero, though in an array all
>> +     * or none should be set to non-zero, not a mix.
>> +     */
>> 
>> Maybe consider that gt_stepping will never be 0 and in the case it is (or
>> size > ARRAY_SIZE), just backtrack to use the first one we find with
>> gt_stepping != 0?  then we probably should add a warning that we are not
>> actually using the correct one, but it's the best we can do.

Yeah, it'll probably need to be something like that. I'll figure
something out.

>> 
>> 3) REVID_BXT_B_LAST
>
> I couldn't spot REVID_NONE as well in the patches.

For now REVID_NONE was intended as a placeholder to ensure all valid
symbolic revids are non-zero, for array initialization purposes.

>> 
>> what is that? The only thing that comes to mind is for "matching all B
>> steps". Matt Roper had a patch to change the way we interpret the WA
>> ranges so the bounds are [lower, upper) rather than [lower, upper].
>> Matt, any problem you faced with that patch? It makes  more sense
>> because we know the stepping in which it's fixed, but we may have
>> additional revids before that
>> 
>> But I don't see any trace of REVID_BXT_B_LAST in the tree, so not sure
>> what's this about.

I just indiscriminately scooped up all revid macros from i915_drv.h for
starters. We have BXT_REVID_B_LAST to identify the last pre-pro bxt
before C0. Needs cleanup, agreed.

>> 
>> 4)
>> 
>> Lastly, I'd still like the simple fix for TGL without all the noise and
>> without the refactor.  It's the simplest fix we can do for the 5.10
>> timeframe.

Agreed.

> I just submitted the fix.

Thanks. Let's get that done first, then the ADL enabling, then I'll do
the refactoring afterwards.

BR,
Jani.

>
> Aditya 
>
>> 
>> 
>> Lucas De Marchi
>> 
>>>
>>>
>>>
>>>
>>>>
>>>>> for TGL from Bspec and if we are picking up wrong revision id from drm struct that means the platform
>>>>> information obtained itself is wrong which will be a general platform problem unrelated to Gfx driver.
>>>>
>>>> Nothing else should really be a problem. We don't really use the revid
>>>> much, mostly for WAs. And if other parts of the kernel are trying to use
>>>> the SoC revid, then they are reading that info themselves, not using
>>>> something we read.
>>>>
>>>> We are simply reading the revid from hardware and using that value
>>>> without checking and that needs to change.
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> I'd rather we seek for ways to either nuke the revid arrays altogether,
>>>>>> or encapsulate them within a .c file with static scope.
>>>>>
>>>>> I don't think we should nuke the revid arrays but I agree with finding a more appropriate place to
>>>>> parse the gt/display stepping info. This should be an exercise for a later patch that takes
>>>>> care of kbl,tgl and adl-s mappings.
>>>>>
>>>>>>
>>>>>> And for that .c file... the arrays are now in gt/intel_workarounds.c
>>>>>> which is a really weird place for stuff that's used for generic stepping
>>>>>> info, and particularly for *display* stepping info.
>>>>>
>>>>> I agree and we can change the approach with a different patch later.
>>>>>
>>>>>>
>>>>>> BR,
>>>>>> Jani.
>>>>>>
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> Intel-gfx mailing list
>>>> Intel-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>
>>> -- 
>>> Jani Nikula, Intel Open Source Graphics Center
>

-- 
Jani Nikula, Intel Open Source Graphics Center