[PATCH] drm/i915: Add debug print about hw config table size

John Harrison john.c.harrison at intel.com
Tue Jan 7 01:15:14 UTC 2025


On 12/24/2024 11:10, Daniele Ceraolo Spurio wrote:
> On 12/24/2024 10:13 AM, John Harrison wrote:
>> On 12/23/2024 15:20, Daniele Ceraolo Spurio wrote:
>>> On 12/20/2024 5:19 PM, John.C.Harrison at Intel.com wrote:
>>>> From: John Harrison<John.C.Harrison at Intel.com>
>>>>
>>>> Add debug info to help investigate a very rare bug:
>>>>    https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13385
>>>>
>>>> Signed-off-by: John Harrison<John.C.Harrison at Intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c | 3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
>>>> index b67a15f742762..868195c33f5b3 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
>>>> @@ -7,6 +7,7 @@
>>>>   #include "gt/intel_hwconfig.h"
>>>>   #include "i915_drv.h"
>>>>   #include "i915_memcpy.h"
>>>> +#include "intel_guc_print.h"
>>>>   
>>>>   /*
>>>>    * GuC has a blob containing hardware configuration information (HWConfig).
>>>> @@ -42,6 +43,8 @@ static int __guc_action_get_hwconfig(struct intel_guc *guc,
>>>>   	};
>>>>   	int ret;
>>>>   
>>>> +	guc_dbg(guc, "Querying HW config table: size = %d, offset = 0x%08X\n",
>>>> +		ggtt_size, ggtt_offset);
>>>
>>> This seems to result in a double-log where the first print has no 
>>> useful information, e.g.:
>>>
>>> [drm:__guc_action_get_hwconfig [i915]] GT0: GUC: Querying HW config 
>>> table: size = 0, offset = 0x00000000
>>> [drm:__guc_action_get_hwconfig [i915]] GT0: GUC: Querying HW config 
>>> table: size = 752, offset = 0x00D05000
>>>
>>> Given that only the second log is useful, IMO better to move the 
>>> guc_dbg to guc_hwconfig_fill_buffer(), because the info needed for 
>>> the second print is available there and it is only called once.
>> I disagree that the first print has no useful information. It tells 
>> us that a call is being made and these are the parameters. We do not 
>> know what the failure is. It seems highly unlikely that the size 
>> changes from query to the next given that the table is a fixed 
>> entity. It is much more likely to be a caching type issue with GuC 
>> reading data the KMD did not write. If so, GuC could potentially read 
>> non-zero data for the initial size query and complain that data is 
>> invalid.
>>
>> The intention is to report all calls with their parameters to try to 
>> narrow down exactly what calls are not working.
>
> But we don't need both prints to know which of the 2 calls has failed, 
> if the error comes before we get the second print then we know the 
> failure was in the first call, otherwise it was in the second.
Is it really such a problem to have two lines of debug print instead of 
one? Given that this is a 'this cannot possibly happen' type bug, I 
would much rather have totally explicit debug at the point of operation 
rather than something further back that requires making assumptions 
about what is or is not happening.

John.

>
> Daniele
>
>>
>> John.
>>
>>
>>>
>>> Daniele
>>>
>>>>   	ret = intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), NULL, 0);
>>>>   	if (ret == -ENXIO)
>>>>   		return -ENOENT;
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20250106/dd261edc/attachment.htm>


More information about the Intel-gfx mailing list