[Intel-gfx] [PATCH 1/2] drm/i915/gt: Add GT oriented dmesg output

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Thu Nov 10 09:43:19 UTC 2022


On 09/11/2022 17:46, John Harrison wrote:
> On 11/9/2022 03:05, Tvrtko Ursulin wrote:
>> On 08/11/2022 20:15, John Harrison wrote:
>>> On 11/8/2022 01:01, Tvrtko Ursulin wrote:
>>>> On 07/11/2022 19:14, John Harrison wrote:
>>>>> On 11/7/2022 08:17, Tvrtko Ursulin wrote:
>>>>>> On 07/11/2022 09:33, Tvrtko Ursulin wrote:
>>>>>>> On 05/11/2022 01:03, Ceraolo Spurio, Daniele wrote:
>>>>>>>> On 11/4/2022 10:25 AM, John.C.Harrison at Intel.com wrote:
>>>>>>>>> From: John Harrison <John.C.Harrison at Intel.com>
>>>>>>>>>
>>>>>>>>> When trying to analyse bug reports from CI, customers, etc. it 
>>>>>>>>> can be
>>>>>>>>> difficult to work out exactly what is happening on which GT in a
>>>>>>>>> multi-GT system. So add GT oriented debug/error message 
>>>>>>>>> wrappers. If
>>>>>>>>> used instead of the drm_ equivalents, you get the same output 
>>>>>>>>> but with
>>>>>>>>> a GT# prefix on it.
>>>>>>>>>
>>>>>>>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>>>>>>>
>>>>>>>> The only downside to this is that we'll print "GT0: " even on 
>>>>>>>> single-GT devices. We could introduce a gt->info.name and print 
>>>>>>>> that, so we could have it different per-platform, but IMO it's 
>>>>>>>> not worth the effort.
>>>>>>>>
>>>>>>>> Reviewed-by: Daniele Ceraolo Spurio 
>>>>>>>> <daniele.ceraolospurio at intel.com>
>>>>>>>>
>>>>>>>> I think it might be worth getting an ack from one of the 
>>>>>>>> maintainers to make sure we're all aligned on transitioning to 
>>>>>>>> these new logging macro for gt code.
>>>>>>>
>>>>>>> Idea is I think a very good one. First I would suggest 
>>>>>>> standardising to lowercase GT in logs because:
>>>>>>>
>>>>>>> $ grep "GT%" i915/ -r
>>>>>>> $ grep "gt%" i915/ -r
>>>>>>> i915/gt/intel_gt_sysfs.c: gt->i915->sysfs_gt, "gt%d", gt->info.id))
>>>>>>> i915/gt/intel_gt_sysfs.c:                "failed to initialize 
>>>>>>> gt%d sysfs root\n", gt->info.id);
>>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to 
>>>>>>> create gt%u RC6 sysfs files (%pe)\n",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u RC6p sysfs 
>>>>>>> files (%pe)\n",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to 
>>>>>>> create gt%u RPS sysfs files (%pe)",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to 
>>>>>>> create gt%u punit_req_freq_mhz sysfs (%pe)",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u throttle 
>>>>>>> sysfs files (%pe)",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u 
>>>>>>> media_perf_power_attrs sysfs (%pe)\n",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to add 
>>>>>>> gt%u rps defaults (%pe)\n",
>>>>>>> i915/i915_driver.c: drm_err(&gt->i915->drm, "gt%d: 
>>>>>>> intel_pcode_init failed %d\n", id, ret);
>>>>>>> i915/i915_hwmon.c: snprintf(ddat_gt->name, sizeof(ddat_gt->name), 
>>>>>>> "i915_gt%u", i);
>>>>>>>
>>>>>
>>>>> Just because there are 11 existing instances of one form doesn't 
>>>>> mean that the 275 instances that are waiting to be converted should 
>>>>> be done incorrectly. GT is an acronym and should be capitalised.
>>>>
>>>> Okay just make it consistent then.
>>>>
>>>>> Besides:
>>>>> grep -r "GT " i915 | grep '"'
>>>>> i915/vlv_suspend.c:             drm_err(&i915->drm, "timeout 
>>>>> disabling GT waking\n");
>>>>> i915/vlv_suspend.c:                     "timeout waiting for GT 
>>>>> wells to go %s\n",
>>>>> i915/vlv_suspend.c:     drm_dbg(&i915->drm, "GT register access 
>>>>> while GT waking disabled\n");
>>>>> i915/i915_gpu_error.c:  err_printf(m, "GT awake: %s\n", 
>>>>> str_yes_no(gt->awake));
>>>>> i915/i915_debugfs.c:    seq_printf(m, "GT awake? %s [%d], %llums\n",
>>>>> i915/selftests/i915_gem_evict.c: pr_err("Failed to idle GT (on 
>>>>> %s)", engine->name);
>>>>> i915/intel_uncore.c:                  "GT thread status wait timed 
>>>>> out\n");
>>>>> i915/gt/uc/selftest_guc_multi_lrc.c: drm_err(&gt->i915->drm, "GT 
>>>>> failed to idle: %d\n", ret);
>>>>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to 
>>>>> idle: %d\n", ret);
>>>>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to 
>>>>> idle: %d\n", ret);
>>>>> i915/gt/intel_gt_mcr.c: * Some GT registers are designed as 
>>>>> "multicast" or "replicated" registers:
>>>>> i915/gt/selftest_rps.c:                 pr_info("%s: rps counted %d 
>>>>> C0 cycles [%lldns] in %lldns [%d cycles], using GT clock frequency 
>>>>> of %uKHz\n",
>>>>> i915/gt/selftest_hangcheck.c:                   pr_err("[%s] GT is 
>>>>> wedged!\n", engine->name);
>>>>> i915/gt/selftest_hangcheck.c:           pr_err("GT is wedged!\n");
>>>>> i915/gt/intel_gt_clock_utils.c:                 "GT clock frequency 
>>>>> changed, was %uHz, now %uHz!\n",
>>>>> i915/gt/selftest_engine_pm.c:           pr_err("Unable to flush GT 
>>>>> pm before test\n");
>>>>> i915/gt/selftest_engine_pm.c: pr_err("GT failed to idle\n");
>>>>> i915/i915_sysfs.c:                       "failed to register GT 
>>>>> sysfs directory\n");
>>>>> i915/intel_uncore.h:     * of the basic non-engine GT registers 
>>>>> (referred to as "GSI" on
>>>>> i915/intel_uncore.h:     * newer platforms, or "GT block" on older 
>>>>> platforms)?  If so, we'll
>>>>>
>>>>>
>>>>>
>>>>>>> Then there is a question of naming. Are we okay with GT_XXX or, 
>>>>>>> do we want intel_gt_, or something completely different. I don't 
>>>>>>> have a strong opinion at the moment so I'll add some more folks 
>>>>>>> to Cc.
>>>>>>
>>>>> You mean GT_ERR("msg") vs intel_gt_err("msg")? Personally, I would 
>>>>> prefer just gt_err("msg") to keep it as close to the official drm_* 
>>>>> versions as possible. Print lines tend to be excessively long 
>>>>> already. Taking a 'gt' parameter instead of a '&gt->i915->drm' 
>>>>> parameter does help with that but it seems like calling the wrapper 
>>>>> intel_gt_* is shooting ourselves in the foot on that one. And 
>>>>> GT_ERR vs gt_err just comes down to the fact that it is a macro 
>>>>> wrapper and therefore is required to be in upper case.
>>>>>
>>>>>> There was a maintainer level mini-discussion on this topic which I 
>>>>>> will try to summarise.
>>>>>>
>>>>>> Main contention point was the maintenance cost and generally an 
>>>>>> undesirable pattern of needing to add many 
>>>>>> subsystem/component/directory specific macros. Which then 
>>>>>> typically need extra flavours and so on. But over verbosity of the 
>>>>> How many versions are you expecting to add? Beyond the tile 
>>>>> instance, what further addressing requirements are there? The card 
>>>>> instance is already printed as part of the PCI address. The only 
>>>>> other reason to add per component wrappers would be to wrap the 
>>>>> mechanism for getting from some random per component object back to 
>>>>> the intel_gt structure. But that is hardware a new issue being 
>>>>> added by this wrapper. It is also not a requirement. Much of the 
>>>>> code has a gt pointer already. For the parts that don't, some of it 
>>>>> would be a trivial engine->gt type dereference, some of it is a 
>>>>> more complex container_of type construction. But for those, the 
>>>>> given file will already have multiple instances of that already 
>>>>> (usually as the first or second line of the function - 'intel_gt 
>>>>> *gt = fancy_access_method(my_obj)' so adding one or two more of 
>>>>> those as necessary is not making the code harder to read.
>>>>>
>>>>>> code is obviously also bad, so one compromise idea was to add a 
>>>>>> macro which builds the GT string and use drm logging helpers 
>>>>>> directly. This would be something like:
>>>>>>
>>>>>>  drm_err(GT_LOG("something went wrong ret=%d\n", gt), ret);
>>>>>>  drm_info(GT_LOG(...same...));
>>>>> Seriously? As above, some of these lines are already way too long, 
>>>>> this version makes them even longer with no obvious benefit. Worse, 
>>>>> it makes it harder to read what is going on. It is much less 
>>>>> intuitive to read than just replacing the drm_err itself. And 
>>>>> having two sets of parenthesis with some parameters inside the 
>>>>> first and some only inside the second is really horrid! Also, 
>>>>> putting the 'gt' parameter in the middle just confuses it with the 
>>>>> rest of the printf arguments even though there is no %d in the 
>>>>> string for it. So now a quick glances tells you that your code is 
>>>>> wrong because you have three format specifiers but four parameters.
>>>>>
>>>>> Whereas, just replacing drm_err with gt_err (or GT_ERR or 
>>>>> intel_gt_err) keeps everything else consistent. The first parameter 
>>>>> changes from 'drm' to 'gt' but is still the master object parameter 
>>>>> and it matches the function/macro prefix so inherently looks 
>>>>> correct. Then you have your message plus parameters. No confusing 
>>>>> orders, no confusing parenthesis, no excessive macro levels, no 
>>>>> confusion at all. Just nice simple, easy to read, easy to maintain 
>>>>> code.
>>>>
>>>> I am personally okay with gt_err/GT_ERR some other folks might 
>>>> object though. And I can also understand the argument why it is 
>>>> better to not have to define gt_err, gt_warn, gt_info, gt_notice, 
>>>> gt_debug, gt_err_ratelimited, gt_warn_once.. and instead have only 
>>>> one macro.
>>> A small set of trivial macro definitions vs a complicated and 
>>> unreadable construct on every single print? Erm, isn't that the very 
>>> definition of abstracting to helpers as generally required by every 
>>> code review ever?
>>>
>>> And what 'other folks might object'? People already CC'd? People 
>>> outside of i915?
>>>
>>>
>>>>
>>>> Because of that I was passing on to you the compromise option.
>>>>
>>>> It maybe still has net space savings since we wouldn't have to be 
>>>> repeating the gt->i915->drm whatever and gt->info.id on every line.
>>>>
>>>> You are free to try the most compact one and see how hard those 
>>>> objections will be.
>>> Um. I already did. This patch. And you are the only person to have 
>>> objected in any manner at all.
>>
>> Where I have objected?
> Only in everything you have written. Or are you saying that actually I 

???

> could have just taken the r-b from Daniele and merged it while 
> completely ignoring everything you have said because you didn't say 
> 'NACK' at the top of your email?

No, but because we don't do that while there is an ongoing discussion.

>> I was a) asking to convert all gt/ within one kernel release, b) 
>> transferring the maintainer discussion from IRC to this email chain to 
>> outlay one alternative, for which I said I could see the pros and cons 
>> of both, and c) raised the naming question early since that can 
>> usually become a churn point later on when we have large scale code 
>> transformations.
>>
>> As said, FWIW you have my ack for GT_XXX naming and approach, but 
>> please do convert the whole of gt/ so we don't ship with a mish-mash 
>> of log messages.
>>
> That sounds like a quite a lot of objections to me - don't do it that 
> way, do it this way; fine do it that way but expect lots of complaints 
> when you do; oh all right, do it that way but re-write the entire driver 
> all at once.

No, I'll repeat what I said one last time.

I said it's good idea, raised the topic of naming, since it is good to 
get consensus before starting a wide renaming exercise. I also chatted 
with other maintainers and passed on here one alternative proposed, with 
its pros and cons. Finally I said your initial approach has my ack at 
least, even though not everyone liked it. And I asked to convert all 
gt/, since I think it is a good idea, and frankly it's not a very big 
ask. It is just how it works.

> I really don't get the 'we must not ship with a mish-mash of log 
> messages' argument. It's hardly like the current total mish-mash of 
> DRM_DEBUG, DRM_DEBUG_DRIVER, pr_dbg, drm_dbg, ... is even remotely 
> consistent. It's also not like the patch is massively changing the 
> format of the output messages. So some prints get a useful extra three 
> characters in advance of others. Does anyone care? Would anyone except 
> us even notice?
> 
> Is it really a problem to merge this patch now to get the process 
> started? And other sub-components get updated as and when people get the 
> time to do them? You could maybe even help rather than posting 

It can be a problem. You opened up a topic which leads to an overall 
improvement and if we merge churn before we decide on the final 
direction, then we just get more churn later.

> completely conflicting patch sets that basically duplicate all the 
> effort for no actual benefit.

Completely conflicting?

Yours:

  drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 25 +++--
  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  9 +-
  .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 50 ++++------
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  9 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c     | 17 ++--
  drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 49 +++++-----
  drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c     |  3 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   |  6 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 ++++++------
  drivers/gpu/drm/i915/gt/uc/intel_huc.c        | 20 ++--
  drivers/gpu/drm/i915/gt/uc/intel_uc.c         | 84 ++++++++---------
  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      | 91 +++++++++----------
  drivers/gpu/drm/i915/gt/uc/selftest_guc.c     | 36 ++++----
  .../drm/i915/gt/uc/selftest_guc_hangcheck.c   | 22 ++---
  .../drm/i915/gt/uc/selftest_guc_multi_lrc.c   | 10 +-
  15 files changed, 228 insertions(+), 259 deletions(-)

Mine:

  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  2 +-
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 23 ++++++----
  .../drm/i915/gt/intel_execlists_submission.c  | 13 +++---
  drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c  |  4 +-
  drivers/gpu/drm/i915/gt/intel_gt.c            |  4 +-
  drivers/gpu/drm/i915/gt/intel_gt_irq.c        |  8 ++--
  drivers/gpu/drm/i915/gt/intel_rps.c           |  6 ++-
  drivers/gpu/drm/i915/gt/intel_workarounds.c   | 43 +++++++++++--------
  .../gpu/drm/i915/gt/intel_workarounds_types.h |  4 ++
  .../gpu/drm/i915/gt/selftest_workarounds.c    |  4 +-
  drivers/gpu/drm/i915/i915_debugfs.c           |  4 +-
  drivers/gpu/drm/i915/i915_gem.c               |  2 +-
  drivers/gpu/drm/i915/i915_getparam.c          |  2 +-
  drivers/gpu/drm/i915/i915_irq.c               | 12 +++---
  drivers/gpu/drm/i915/i915_perf.c              | 14 +++---
  drivers/gpu/drm/i915/i915_query.c             | 12 +++---
  drivers/gpu/drm/i915/i915_sysfs.c             |  3 +-
  drivers/gpu/drm/i915/i915_vma.c               | 16 ++++---
  drivers/gpu/drm/i915/intel_uncore.c           | 21 +++++----
  19 files changed, 116 insertions(+), 81 deletions(-)

I deliberately did not touch gt/uc so they don't touch a single same file!

Unless you are referring to the fact gt logging will need to touch some 
of the same lines my patch touches. Well yes, some, not all, and I 
reckoned it will be much quicker since there aren't really any 
contentious points with DRM_XXX -> drm_xxx. Plus I added struct intel_gt 
or whatever or it wasn't before so it would make your patch easier/smaller.

Regards,

Tvrtko


More information about the Intel-gfx mailing list