[Intel-gfx] [PATCH 1/2] drm/i915/guc: Add error-capture init warnings when needed

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Tue Oct 18 08:00:59 UTC 2022


On 17/10/2022 18:46, Teres Alexis, Alan Previn wrote:
> On Mon, 2022-10-17 at 09:42 +0100, Tvrtko Ursulin wrote:
>> On 15/10/2022 04:59, Alan Previn wrote:
>>> If GuC is being used and we initialized GuC-error-capture,
>>> we need to be warning if we don't provide an error-capture
>>> register list in the firmware ADS, for valid GT engines.
>>> A warning makes sense as this would impact debugability
>>> without realizing why a reglist wasn't retrieved and reported
>>> by GuC.>
>>> However, depending on the platform, we might have certain
>>> engines that have a register list for engine instance error state
>>> but not for engine class. Thus, add a check only to warn if the
>>> register list was non existent vs an empty list (use the
>>> empty lists to skip the warning).
>>>
>>> Signed-off-by: Alan Previn <alan.previn.teres.alexis at intel.com>
>>> ---
>>>    .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 55 ++++++++++++++++++-
>>>    1 file changed, 53 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
>>> index 8f1165146013..290c1e1343dd 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
>>> @@ -419,6 +419,44 @@ guc_capture_get_device_reglist(struct intel_guc *guc)
>>>    	return default_lists;
>>>    }
>>>    
>>> +static const char *
>>> +__stringify_type(u32 type)
>>> +{
>>> +	switch (type) {
>>> +	case GUC_CAPTURE_LIST_TYPE_GLOBAL:
>>> +		return "Global";
>>> +	case GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS:
>>> +		return "Class";
>>> +	case GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE:
>>> +		return "Instance";
>>> +	default:
>>> +		return "unknown";
>>> +	}
>>> +
>>> +	return "";
>>
>> What's the point of these returns? Gcc complains about not returning a type from const char * return function?
>>
> Sorry i am not sure I saw any complain for Gcc. If you are referring to "" then i can re-rev without the default and
> just let the path outside return the unknown. Is that what you are referring to?

Yes, it is an unreachable path, handled by default switch branch already.

>>> +}
>>> +
>>> +static const char *
>>> +__stringify_engclass(u32 class)
>>> +{
>>> +	switch (class) {
>>> +	case GUC_RENDER_CLASS:
>>> +		return "Render";
>>> +	case GUC_VIDEO_CLASS:
>>> +		return "Video";
>>> +	case GUC_VIDEOENHANCE_CLASS:
>>> +		return "VideoEnhance";
>>> +	case GUC_BLITTER_CLASS:
>>> +		return "Blitter";
>>> +	case GUC_COMPUTE_CLASS:
>>> +		return "Compute";
>>> +	default:
>>> +		return "unknown";
>>> +	}
>>> +
>>> +	return "";
>>> +}
>>> +
>>>    static int
>>>    guc_capture_list_init(struct intel_guc *guc, u32 owner, u32 type, u32 classid,
>>>    		      struct guc_mmio_reg *ptr, u16 num_entries)
>>> @@ -487,19 +525,32 @@ intel_guc_capture_getlistsize(struct intel_guc *guc, u32 owner, u32 type, u32 cl
>>>    			      size_t *size)
>>>    {
>>>    	struct intel_guc_state_capture *gc = guc->capture;
>>> +	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
>>>    	struct __guc_capture_ads_cache *cache = &gc->ads_cache[owner][type][classid];
>>>    	int num_regs;
>>>    
>>> -	if (!gc->reglists)
>>> +	if (!gc->reglists) {
>>> +		drm_warn(&i915->drm, "GuC-capture: No reglist on this device\n");
>>>    		return -ENODEV;
>>> +	}
>>>    
>>>    	if (cache->is_valid) {
>>>    		*size = cache->size;
>>>    		return cache->status;
>>>    	}
>>>    
>>> +	if (!guc_capture_get_one_list(gc->reglists, owner, type, classid)) {
>>> +		if (owner == GUC_CAPTURE_LIST_INDEX_PF && type == GUC_CAPTURE_LIST_TYPE_GLOBAL)
>>> +			drm_warn(&i915->drm, "GuC-capture: missing reglist type-Global\n");
>>> +		if (owner == GUC_CAPTURE_LIST_INDEX_PF)
>>
>> GUC_CAPTURE_LIST_INDEX_PF could be made once on the enclosing if statement?
> Sure - will do.
>>
>> Btw what's with the PF and VF (cover letter) references while SRIOV does not exists upstream?
> To maintain a scalable code flow across both the ADS code and guc-error-capture code, we do have to skip over this enum
> else we'll encounter lots of warnings about missing VF-reglist support (which we cant check for since we dont even have
> support - i.e we dont even have a "is not supported" check) but the GuC firmware ADS buffer allocation includes an entry
> for VFs so we have to skip over it. This is the cleanest way i can think of without impacting other code areas and also
> while adding the ability to warn when its important.
>>> +			drm_warn(&i915->drm, "GuC-capture: missing regiist type(%d)-%s : "
>>
>> reglist
> thanks - will fix
>>
>>> +				 "%s(%d)-Engine\n", type, __stringify_type(type),
>>> +				 __stringify_engclass(classid), classid);
>>
>> One details to consider from Documentation/process/coding-style.rst
>> """
>> However, never break user-visible strings such as printk messages because that breaks the ability to grep for them.
>> """
>>
> I totally agree with you but i cant find a way to keep totally grep-able way without creating a whole set of error
> strings for the various list-types, list-owners and class-types. However i did ensure the first part of the message
> is grep-able : "GuC-capture: missing reglist type". Do you an alternative proposal?

Yeah it is not very greppable being largely constructed at runtime, but 
just don't break the string. IMO no gain to diverge from coding style here.

Or maybe with one level of indentation less as discussed, and maybe 
remove "GuC-capture" if it can be implied (are there other reglists not 
relating to error capture?), maybe it becomes short enough?

"Missing GuC reglist %s(%u):%s(%u)!", ...

?

Type will never be unknown I suspect since it should always be added 
very early during development. So type and engine suffixes may be 
redundant? Or keep it verbose if that fits better with existing GuC 
error capture logging, I don't know.

> 
>> Also commit message you can aim to wrap at 75 chars as per submitting-patches.rst.
>>
>>> +		return -ENODATA;
>>
>> Is this a new exit condition or the thing would exit on the !num_regs check below anyway? Just wondering if the series is only about logging changes or there is more to it.
> Its no different from previous behavior - and yes its about logging the missing reg lists for hw that needs it as we are
> missing this for DG2 we we didn't notice (we did a previous revert to remove these warnings but that time the warnings
> was too verbose - even complaining for the intentional empty lists and for VF cases that isnt supported).

Okay think I get it, thanks. If the "positive match" logging of empty 
list is more future proof than "negative - don't log these" you will 
know better.

Regards,

Tvrtko

>>
>>> +	}
>>> +
>>>    	num_regs = guc_cap_list_num_regs(gc, owner, type, classid);
>>> -	if (!num_regs)
>>> +	if (!num_regs) /* intentional empty lists can exist depending on hw config */
>>>    		return -ENODATA;
>>>    
>>>    	*size = PAGE_ALIGN((sizeof(struct guc_debug_capture_list)) +
>>
>> Regards,
>>
>> Tvrtko
> 


More information about the Intel-gfx mailing list