[Intel-xe] [RFC 0/2] Dump + OOB workarounds

Lucas De Marchi lucas.demarchi at intel.com
Fri Apr 14 15:03:02 UTC 2023


On Fri, Apr 14, 2023 at 10:36:53AM -0400, Rodrigo Vivi wrote:
>On Wed, Apr 12, 2023 at 01:20:52AM -0700, Lucas De Marchi wrote:
>> This is 2 RFCs in one, since they are more or less related.
>>
>> PATCH #1) Dump the "driver workaround database" to check what are the
>> workarounds implemented. This is useful when checking against the spec
>> what workarounds are implemented in the driver.  This can later be
>> extended to mark what are the "active" workarounds on the current
>> platform.  I think this second scenario is more useful for the debugfs
>> usage.
>
>It is actually useful for the w/a assessment to know if the wa is
>already known and implemented by the driver.
>
>But for this use case in particular, this information without the
>platform that it is 'active' is kind of useless...

yeah... I think will drop the "all known WAs by the driver" from
debugfs. Printing just the active ones are the useful thing for "live"
checks.

>I would continue to use git-grep for that while we don't show the
>platform info.
>
>But thinking about other cases I also believe this is not that useful
>without the active platform information.

agreed. My idea now would be to:

a) move engine/gt/lrc WAs to special sections:
".rodata.xe_wa_{gt,engine,lrc}".  This makes it easier for external
tools to parse the .ko

b) Change this implementation so the condition is an rtp rule

	XE_WA(xe, <name>, <rtp-rules>, <action-statements>);

I'm leaning towards evaluating the rtp-rules where it's called rather
than on initialization since it's simpler. Once we have all of them
converted, switching to evaluation on probe-only would be easy.

The advantage with evaluating it on probe time is that we can then
report more easily what are the active WAs since we don't need to rely
on having crossed that code path.

>
>>
>> Other alternatives I have considered:
>> a) Reuse the kunit infra; with this it's possible to easily fake a
>> platform and get what are the WAs considered "active" for a platform.
>> This is nice because it doesn't need to run on real hardware. However
>> this may be abusing what the kunit is for
>
>First I also thought this was an abuse... but if you think on validation
>and we have some way to cross check some expected table against the
>implemented/active one then you might build a case.

After implementing the intel-gfx-fw-info, I think we have a more viable
path just moving the WAs to special sections as above and having a
dedicated tool for that. The sync issue between script and kernel
regardig the struct layout can be easily solved by having the tool
itself in the kernel repo.

>
>>
>> b) Just write a tool in C or python that parses the section from
>> the .ko and output the info need. Drawback is keeping in sync the
>> declarations from xe_rtp with this additional tool. And probably also
>> that it'd need to process the rules by itself.
>
>why from the .ko and not a grep in the code then?

you can't have a tool to reason on the WA - you will only know "is WA x
implemented?". And have to deal with false positives like
"TODO: remove this when Wa_xxxxx is implemented"

>
>>
>> PATCH #2) "out-of-band" workarounds, i.e. those that are sprinkled
>> around the driver. In the implementation here I tried to keep the caller
>> similar to what it was before. With the addition of XE_WA() the caller
>> "registers" the workaround so later we can access it. As example I
>> converted 1 place in xe_guc.c to use that.
>
>I liked that XE_WA declaration.
>
>We could even have some tool that also checks if all the XE_WA declarations
>are indeed the lineage number and not some random hsd number.

yep. My *current* plan (may change next week ;)) is implementing that
additional tool.

>
>>
>> Note: The "condition" is checked each time it goes through that code
>> path.  I was debating the ability to just re-use XE_RTP_RULE() instead
>> since it appears all the conditions can be translated to rules.
>> Consider this the alternative (a).
>>
>> After writing the commit message for that patch I thought: but if we can
>> transform those conditions in rules, we could very well keep them in an
>> additional table in xe_wa.c, which is the alternative below:
>>
>> b) keep an additional table in xe_wa.c whose action is to set a
>> bit/byte(?)  in  xe->active_oob_workarounds[]. During probe
>> the rules are processed and the ones active are marked in that array.
>> The extra section created by XE_WA() is then used to just map
>> the WA to the index value. Then the callers would only need something
>> like `if (XE_WA(xe, "XXXXXX") { ... } ` since the condition would be
>> executed once on init.  However I'm thinking that this may not scale to
>> all workarounds we may have.
>
>I like this idea and I believe that it does scale... It also helps
>to push back on big ugly workarounds that doesn't fit in this.

Thanks,  it seems we are converging on a final solution...

Lucas De Marchi

>
>>
>> Thoughts?
>>
>> thanks
>> Lucas De Marchi
>>
>> Lucas De Marchi (2):
>>   drm/xe: Add debugfs to dump all known workarounds
>>   drm/xe: Register OOB workarounds
>>
>>  drivers/gpu/drm/xe/Makefile       |  2 ++
>>  drivers/gpu/drm/xe/xe.lds         |  8 +++++++
>>  drivers/gpu/drm/xe/xe_debugfs.c   | 12 ++++++++++
>>  drivers/gpu/drm/xe/xe_guc.c       |  6 ++---
>>  drivers/gpu/drm/xe/xe_rtp_types.h |  8 +++++++
>>  drivers/gpu/drm/xe/xe_wa.c        | 38 +++++++++++++++++++++++++++++--
>>  drivers/gpu/drm/xe/xe_wa.h        | 14 ++++++++++++
>>  7 files changed, 83 insertions(+), 5 deletions(-)
>>  create mode 100644 drivers/gpu/drm/xe/xe.lds
>>
>> --
>> 2.39.0
>>


More information about the Intel-xe mailing list