[PATCH v4 0/2] Support/debug for slow GuC loads

John Harrison john.c.harrison at intel.com
Tue Apr 9 00:09:11 UTC 2024


On 4/4/2024 11:25, Lucas De Marchi wrote:
> On Tue, Feb 27, 2024 at 05:09:54PM -0800, John.C.Harrison at Intel.com 
> wrote:
>> From: John Harrison <John.C.Harrison at Intel.com>
>>
>> Sometimes the GuC load is slower that it should be. For end users,
>> that usually means some kind of thermal throttling issue. Internally,
>> there can be any number of bugs that cause it. So don't completely
>> fail to load, just cope with it and report the problem.
>>
>> v2: Revert include order (review feedback from Lucas)
>> v3: Remove '_sysfs' from throttle file names and keep limit query in
>> the same file rather than moving elsewhere (review feedback from
>> Rodrigo). Fix the reporting of requested vs granted frequencies
>> (review feedback from Badal).
>> v4: Manually code the loop timeout/condition checking because helper
>> functions are not allowed (review feedback from Lucas/Rodrigo)
>
> wrong reason. It's not that helper functions are not allowed. Rather
> *this* particular helper was considered bad and counter productive.
>
> For similar reasons as e.g. Linus commented recently on bcachefs moving
> some functions to be shared:
>
> https://lore.kernel.org/all/CAHk-=wg3djFJMeN3L_zx3P-6eN978Y1JTssxy81RhAbxB==L8Q@mail.gmail.com/ 
>
Not seeing how this compares. Linus' complaint is about some algorithmic 
decisions that he disagrees with. It sounds like quite a large chunk of 
code that is doing fundamentally wrong (or at least unnecessary) things.

Whereas this is simply abstracting timeout functionality for a generic 
wait. I have no problems with wanting to have a more specific helper for 
99% of use cases that are a specific but common pattern. But for those 
few cases that do not fit that specific pattern, having a more generic 
wait helper is hardly creating 'disgusting and completely nonsensical 
interfaces'. Certainly the comment 'But the main dealbreaker is the 
insane math.' does not apply to a simple wait helper.

>
> We'd need to spend much more time cleaning it up and making it a good
> interface rather than copying what we have in i915 and stuffing it in a
Not exactly sure what needs large amounts of time to clean up? It would 
simply be the existing xe_mmio_wait32 function but with the "read = 
xe_mmio_read(reg); if(read == val) break;" replaced with a callback. 
Indeed the xe_mmio_wait32 function itself would just be a wrapper around 
the generic wait helper that passes in the read/if as the callback. 
Everything else is identical to what we already have and apparently 
consider clean and a good interface.

Apart from the atomic part. Which is apparently hideous and broken 
according to earlier comments. But still made it in to the Xe re-write 
anyway. And that is the underlying wait helper part, not related to any 
interfaces around the test itself.

> *utils.[hc]. In the past it turned out there were not real good reasons
> for abstracting it and making it generic for all the contexts the caller
> may be on.
That is a failing of the usage not the helper.

With great power...

John.

>
> Lucas De Marchi
>
>>
>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>
>>
>> John Harrison (2):
>>  drm/xe: Make read_perf_limit_reasons globally accessible
>>  drm/xe/guc: Port over the slow GuC loading support from i915
>>
>> drivers/gpu/drm/xe/Makefile                   |   2 +-
>> drivers/gpu/drm/xe/abi/guc_errors_abi.h       |  26 +-
>> drivers/gpu/drm/xe/regs/xe_guc_regs.h         |   2 +
>> drivers/gpu/drm/xe/xe_gt_freq.c               |   4 +-
>> ...e_gt_throttle_sysfs.c => xe_gt_throttle.c} |  26 +-
>> drivers/gpu/drm/xe/xe_gt_throttle.h           |  17 ++
>> drivers/gpu/drm/xe/xe_gt_throttle_sysfs.h     |  16 --
>> drivers/gpu/drm/xe/xe_guc.c                   | 226 ++++++++++++++----
>> drivers/gpu/drm/xe/xe_mmio.c                  |  61 +++++
>> drivers/gpu/drm/xe/xe_mmio.h                  |   2 +
>> 10 files changed, 307 insertions(+), 75 deletions(-)
>> rename drivers/gpu/drm/xe/{xe_gt_throttle_sysfs.c => 
>> xe_gt_throttle.c} (86%)
>> create mode 100644 drivers/gpu/drm/xe/xe_gt_throttle.h
>> delete mode 100644 drivers/gpu/drm/xe/xe_gt_throttle_sysfs.h
>>
>> -- 
>> 2.43.0
>>



More information about the Intel-xe mailing list