[Intel-gfx] [PATCH 1/2] drm/i915/gt: Add GT oriented dmesg output

John Harrison john.c.harrison at intel.com
Mon Nov 7 19:14:20 UTC 2022


On 11/7/2022 08:17, Tvrtko Ursulin wrote:
> On 07/11/2022 09:33, Tvrtko Ursulin wrote:
>> On 05/11/2022 01:03, Ceraolo Spurio, Daniele wrote:
>>> On 11/4/2022 10:25 AM, John.C.Harrison at Intel.com wrote:
>>>> From: John Harrison <John.C.Harrison at Intel.com>
>>>>
>>>> When trying to analyse bug reports from CI, customers, etc. it can be
>>>> difficult to work out exactly what is happening on which GT in a
>>>> multi-GT system. So add GT oriented debug/error message wrappers. If
>>>> used instead of the drm_ equivalents, you get the same output but with
>>>> a GT# prefix on it.
>>>>
>>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>>
>>> The only downside to this is that we'll print "GT0: " even on 
>>> single-GT devices. We could introduce a gt->info.name and print 
>>> that, so we could have it different per-platform, but IMO it's not 
>>> worth the effort.
>>>
>>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>>
>>> I think it might be worth getting an ack from one of the maintainers 
>>> to make sure we're all aligned on transitioning to these new logging 
>>> macro for gt code.
>>
>> Idea is I think a very good one. First I would suggest standardising 
>> to lowercase GT in logs because:
>>
>> $ grep "GT%" i915/ -r
>> $ grep "gt%" i915/ -r
>> i915/gt/intel_gt_sysfs.c: gt->i915->sysfs_gt, "gt%d", gt->info.id))
>> i915/gt/intel_gt_sysfs.c:                "failed to initialize gt%d 
>> sysfs root\n", gt->info.id);
>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>> gt%u RC6 sysfs files (%pe)\n",
>> i915/gt/intel_gt_sysfs_pm.c:                             "failed to 
>> create gt%u RC6p sysfs files (%pe)\n",
>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>> gt%u RPS sysfs files (%pe)",
>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>> gt%u punit_req_freq_mhz sysfs (%pe)",
>> i915/gt/intel_gt_sysfs_pm.c:                             "failed to 
>> create gt%u throttle sysfs files (%pe)",
>> i915/gt/intel_gt_sysfs_pm.c:                             "failed to 
>> create gt%u media_perf_power_attrs sysfs (%pe)\n",
>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to add gt%u 
>> rps defaults (%pe)\n",
>> i915/i915_driver.c: drm_err(&gt->i915->drm, "gt%d: intel_pcode_init 
>> failed %d\n", id, ret);
>> i915/i915_hwmon.c:              snprintf(ddat_gt->name, 
>> sizeof(ddat_gt->name), "i915_gt%u", i);
>>

Just because there are 11 existing instances of one form doesn't mean 
that the 275 instances that are waiting to be converted should be done 
incorrectly. GT is an acronym and should be capitalised.

Besides:
grep -r "GT " i915 | grep '"'
i915/vlv_suspend.c:             drm_err(&i915->drm, "timeout disabling 
GT waking\n");
i915/vlv_suspend.c:                     "timeout waiting for GT wells to 
go %s\n",
i915/vlv_suspend.c:     drm_dbg(&i915->drm, "GT register access while GT 
waking disabled\n");
i915/i915_gpu_error.c:  err_printf(m, "GT awake: %s\n", 
str_yes_no(gt->awake));
i915/i915_debugfs.c:    seq_printf(m, "GT awake? %s [%d], %llums\n",
i915/selftests/i915_gem_evict.c: pr_err("Failed to idle GT (on %s)", 
engine->name);
i915/intel_uncore.c:                  "GT thread status wait timed out\n");
i915/gt/uc/selftest_guc_multi_lrc.c: drm_err(&gt->i915->drm, "GT failed 
to idle: %d\n", ret);
i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to idle: 
%d\n", ret);
i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to idle: 
%d\n", ret);
i915/gt/intel_gt_mcr.c: * Some GT registers are designed as "multicast" 
or "replicated" registers:
i915/gt/selftest_rps.c:                 pr_info("%s: rps counted %d C0 
cycles [%lldns] in %lldns [%d cycles], using GT clock frequency of %uKHz\n",
i915/gt/selftest_hangcheck.c:                   pr_err("[%s] GT is 
wedged!\n", engine->name);
i915/gt/selftest_hangcheck.c:           pr_err("GT is wedged!\n");
i915/gt/intel_gt_clock_utils.c:                 "GT clock frequency 
changed, was %uHz, now %uHz!\n",
i915/gt/selftest_engine_pm.c:           pr_err("Unable to flush GT pm 
before test\n");
i915/gt/selftest_engine_pm.c:                           pr_err("GT 
failed to idle\n");
i915/i915_sysfs.c:                       "failed to register GT sysfs 
directory\n");
i915/intel_uncore.h:     * of the basic non-engine GT registers 
(referred to as "GSI" on
i915/intel_uncore.h:     * newer platforms, or "GT block" on older 
platforms)?  If so, we'll



>> Then there is a question of naming. Are we okay with GT_XXX or, do we 
>> want intel_gt_, or something completely different. I don't have a 
>> strong opinion at the moment so I'll add some more folks to Cc.
>
You mean GT_ERR("msg") vs intel_gt_err("msg")? Personally, I would 
prefer just gt_err("msg") to keep it as close to the official drm_* 
versions as possible. Print lines tend to be excessively long already. 
Taking a 'gt' parameter instead of a '&gt->i915->drm' parameter does 
help with that but it seems like calling the wrapper intel_gt_* is 
shooting ourselves in the foot on that one. And GT_ERR vs gt_err just 
comes down to the fact that it is a macro wrapper and therefore is 
required to be in upper case.

> There was a maintainer level mini-discussion on this topic which I 
> will try to summarise.
>
> Main contention point was the maintenance cost and generally an 
> undesirable pattern of needing to add many 
> subsystem/component/directory specific macros. Which then typically 
> need extra flavours and so on. But over verbosity of the 
How many versions are you expecting to add? Beyond the tile instance, 
what further addressing requirements are there? The card instance is 
already printed as part of the PCI address. The only other reason to add 
per component wrappers would be to wrap the mechanism for getting from 
some random per component object back to the intel_gt structure. But 
that is hardware a new issue being added by this wrapper. It is also not 
a requirement. Much of the code has a gt pointer already. For the parts 
that don't, some of it would be a trivial engine->gt type dereference, 
some of it is a more complex container_of type construction. But for 
those, the given file will already have multiple instances of that 
already (usually as the first or second line of the function - 'intel_gt 
*gt = fancy_access_method(my_obj)' so adding one or two more of those as 
necessary is not making the code harder to read.

> code is obviously also bad, so one compromise idea was to add a macro 
> which builds the GT string and use drm logging helpers directly. This 
> would be something like:
>
>  drm_err(GT_LOG("something went wrong ret=%d\n", gt), ret);
>  drm_info(GT_LOG(...same...));
Seriously? As above, some of these lines are already way too long, this 
version makes them even longer with no obvious benefit. Worse, it makes 
it harder to read what is going on. It is much less intuitive to read 
than just replacing the drm_err itself. And having two sets of 
parenthesis with some parameters inside the first and some only inside 
the second is really horrid! Also, putting the 'gt' parameter in the 
middle just confuses it with the rest of the printf arguments even 
though there is no %d in the string for it. So now a quick glances tells 
you that your code is wrong because you have three format specifiers but 
four parameters.

Whereas, just replacing drm_err with gt_err (or GT_ERR or intel_gt_err) 
keeps everything else consistent. The first parameter changes from 'drm' 
to 'gt' but is still the master object parameter and it matches the 
function/macro prefix so inherently looks correct. Then you have your 
message plus parameters. No confusing orders, no confusing parenthesis, 
no excessive macro levels, no confusion at all. Just nice simple, easy 
to read, easy to maintain code.

>
> Whether or not to put the gt as parameter to the helper macro or 
> outside wasn't really decided upon. Anyway the macro would be adding 
> the magic "gt%u: " prefix, drm device and all.
>
> Also the name GT_LOG (or case) is just for illustration, that part 
> wasn't really discussed.
>
> If agreeable this pattern could then be used to consolidate some other 
> macros that we have. Although apart from CT_DEBUG/ERROR I don't know 
> if we have any others.
By consolidating as what? I hope you aren't meaning something like 
'drm_err(CT_DEBUG(GT_LOG("msg")))?'

John.

>
> I hope I have transferred the idea correctly. Please shout if I have not.
>
> Regards,
>
> Tvrtko
>
>
>> What I'd would like to see tried is to converting all of i915/gt 
>> within one kernel release so we don't have a mish-mash of log formats.
If each subsystem's team would like to convert their own section...

John.

>>
>> Regards,
>>
>> Tvrtko
>>
>>>> ---
>>>>   drivers/gpu/drm/i915/gt/intel_gt.h | 15 +++++++++++++++
>>>>   1 file changed, 15 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
>>>> b/drivers/gpu/drm/i915/gt/intel_gt.h
>>>> index e0365d5562484..1e016fb0117a4 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
>>>> @@ -13,6 +13,21 @@
>>>>   struct drm_i915_private;
>>>>   struct drm_printer;
>>>> +#define GT_ERR(_gt, _fmt, ...) \
>>>> +    drm_err(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>>> ##__VA_ARGS__)
>>>> +
>>>> +#define GT_WARN(_gt, _fmt, ...) \
>>>> +    drm_warn(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>>> ##__VA_ARGS__)
>>>> +
>>>> +#define GT_NOTICE(_gt, _fmt, ...) \
>>>> +    drm_notice(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>>> ##__VA_ARGS__)
>>>> +
>>>> +#define GT_INFO(_gt, _fmt, ...) \
>>>> +    drm_info(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>>> ##__VA_ARGS__)
>>>> +
>>>> +#define GT_DBG(_gt, _fmt, ...) \
>>>> +    drm_dbg(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>>> ##__VA_ARGS__)
>>>> +
>>>>   #define GT_TRACE(gt, fmt, ...) do {                    \
>>>>       const struct intel_gt *gt__ __maybe_unused = (gt);        \
>>>>       GEM_TRACE("%s " fmt, dev_name(gt__->i915->drm.dev),        \
>>>



More information about the dri-devel mailing list