[PATCH v3 2/2] drm/xe/guc: Improve robustness of GuC log dumping to dmesg

Tue May 14 18:31:02 UTC 2024

On 5/14/2024 09:01, Michal Wajdeczko wrote:
> On 09.05.2024 00:49, John.C.Harrison at Intel.com wrote:
>> From: John Harrison <John.C.Harrison at Intel.com>
>>
>> There is a debug mechanism for dumping the GuC log as an ASCII hex
>> stream via dmesg. This is extremely useful for situations where it is
>> not possibe to query the log from debugfs (self tests, bugs that cause
>> the driver to fail to load, system hangs, etc.). However, dumping via
>> dmesg is not the most reliable. The dmesg buffer is limited in size,
>> can be rate limited and a simple hex stream is hard to parse by tools.
>>
>> So add extra information to the dump to make it more robust and
>> parsable. This includes adding start and end tags to delimit the dump,
>> using longer lines to reduce the per line overhead, adding a rolling
>> count to check for missing lines and interleaved concurrent dumps and
>> adding other important information such as the GuC version number and
>> timestamp offset.
>>
>> v2: Remove pm get/put as unnecessary (review feedback from Matthew B).
>> v3: Add firmware filename and 'wanted' version number.
>>
>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>> ---
>>   drivers/gpu/drm/xe/regs/xe_guc_regs.h |  1 +
>>   drivers/gpu/drm/xe/xe_guc_log.c       | 85 ++++++++++++++++++++++-----
>>   2 files changed, 71 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>> index 11682e675e0f..45fb3707fabe 100644
>> --- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>> +++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>> @@ -82,6 +82,7 @@
>>   #define   HUC_LOADING_AGENT_GUC			REG_BIT(1)
>>   #define   GUC_WOPCM_OFFSET_VALID		REG_BIT(0)
>>   #define GUC_MAX_IDLE_COUNT			XE_REG(0xc3e4)
>> +#define GUC_PMTIMESTAMP				XE_REG(0xc3e8)
>>   
>>   #define GUC_SEND_INTERRUPT			XE_REG(0xc4c8)
>>   #define   GUC_SEND_TRIGGER			REG_BIT(0)
>> diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c
>> index a37ee3419428..7e7e2fdc9a11 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_log.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_log.c
>> @@ -7,11 +7,19 @@
>>   
>>   #include <drm/drm_managed.h>
>>   
>> +#include "regs/xe_guc_regs.h"
>>   #include "xe_bo.h"
>>   #include "xe_gt.h"
>>   #include "xe_map.h"
>> +#include "xe_mmio.h"
>>   #include "xe_module.h"
>>   
>> +static struct xe_guc *
>> +log_to_guc(struct xe_guc_log *log)
>> +{
>> +	return container_of(log, struct xe_guc, log);
>> +}
>> +
>>   static struct xe_gt *
>>   log_to_gt(struct xe_guc_log *log)
> as you have log_to_guc() then this log_to_gt() could be updated to:
>
> 	return guc_to_gt(log_to_guc(log));
Is there any point? The existing version works fine so why replace a 
single indirection with a double indirection?

>
>>   {
>> @@ -49,32 +57,79 @@ static size_t guc_log_size(void)
>>   		CAPTURE_BUFFER_SIZE;
>>   }
>>   
>> +#define BYTES_PER_WORD		sizeof(u32)
>> +#define WORDS_PER_DUMP		8
>> +#define DUMPS_PER_LINE		4
>> +#define LINES_PER_READ		4
>> +#define WORDS_PER_READ		(WORDS_PER_DUMP * DUMPS_PER_LINE * LINES_PER_READ)
>> +
> as you are heavily updating this function, maybe it's good time to add
> kernel-doc for it ?
Good idea. Will do.

>
>>   void xe_guc_log_print(struct xe_guc_log *log, struct drm_printer *p)
>>   {
>> +	static int g_count;
>
>> +	struct xe_gt *gt = log_to_gt(log);
>> +	struct xe_guc *guc = log_to_guc(log);
>> +	struct xe_uc_fw_version *ver_f = &guc->fw.versions.found[XE_UC_FW_VER_RELEASE];
>> +	struct xe_uc_fw_version *ver_w = &guc->fw.versions.wanted;
>>   	struct xe_device *xe = log_to_xe(log);
>>   	size_t size;
>> -	int i, j;
>> +	char line_buff[DUMPS_PER_LINE * WORDS_PER_DUMP * 9 + 1];
>> +	int l_count = g_count++;
>> +	int line = 0;
>> +	int i, j, k;
>> +	u64 ktime;
>> +	u32 stamp;
>>   
>>   	xe_assert(xe, log->bo);
>>   
>>   	size = log->bo->size;
>>   
>> -#define DW_PER_READ		128
>> -	xe_assert(xe, !(size % (DW_PER_READ * sizeof(u32))));
>> -	for (i = 0; i < size / sizeof(u32); i += DW_PER_READ) {
>> -		u32 read[DW_PER_READ];
>> -
>> -		xe_map_memcpy_from(xe, read, &log->bo->vmap, i * sizeof(u32),
>> -				   DW_PER_READ * sizeof(u32));
>> -#define DW_PER_PRINT		4
>> -		for (j = 0; j < DW_PER_READ / DW_PER_PRINT; ++j) {
>> -			u32 *print = read + j * DW_PER_PRINT;
>> -
>> -			drm_printf(p, "0x%08x 0x%08x 0x%08x 0x%08x\n",
>> -				   *(print + 0), *(print + 1),
>> -				   *(print + 2), *(print + 3));
>> +	drm_printf(p, "[Capture/%d.%d] Dumping GuC log for %ps...\n",
>> +		   l_count, line++, __builtin_return_address(0));
> this function is also used in debugfs outputs and prefixing all lines
> with "[Capture/n.m]" is pointless there (and will also make collecting
> GuC log over debugfs even more inefficient)
>
> and as you likely don't want to have separate print functions (one for
> reliable dmesg, other for debugfs) then maybe consider use of cascaded
> drm_printer as proposed in [1] - it will also make your code tidier
>
> [1] https://patchwork.freedesktop.org/series/133613/
As already discussed, the intention was to keep this as simple as 
possible and not over engineer a stop gap measure. Yes, the debugfs 
version gets some extra overhead (but mitigated by using longer lines). 
But size of the debugfs file is really not an issue, and it does provide 
extra robustness. The prefix is also trivially easy to remove the prefix 
if desired with "cut -d ' ' -f 2' < in > out".

>
>> +
>> +	drm_printf(p, "[Capture/%d.%d] GuC version %u.%u.%u (wanted %u.%u.%u)\n",
>> +		   l_count, line++,
>> +		   ver_f->major, ver_f->minor, ver_f->patch,
>> +		   ver_w->major, ver_w->minor, ver_w->patch);
> hmm, what's the relation between "wanted version" and actual "guc log
> buffer format" ? IMO it doesn't really matter what driver wanted to
> load, this supposed to be "GuC-log-print" so then only actually running
> version matters as it implies schema version needed for proper decoding.
It is not necessary but it is potentially useful information that can be 
added pretty much for free, so why not?

>
>> +	drm_printf(p, "[Capture/%d.%d] GuC firmware: %s\n", l_count, line++, guc->fw.path);
> again, why do we want include firmware filename here? it's not relevant
> to the log buffer content/format (as we already have 'found version')
Actually, it is important. The filename gives the GuC platform. And that 
is required to know what quirks need to be applied when decoding the 
log. E.g. context switch logs on a TGL platform are truncated because 
the hardware has fewer bits for the context id. The decoder needs to 
know that to correctly track context switching.

>
> maybe more interesting thing would be status of the GuC firmware?
> whether it is still running and writing logs or it is already dead
Not sure how you would get that information? Unless the GuC is actually 
in reset for the duration of the dump, there is no way to know whether 
it is alive, actively logging, idle, or what.

>
>> +
>> +	ktime = ktime_get_boottime_ns();
>> +	drm_printf(p, "[Capture/%d.%d] Kernel timestamp: 0x%08llX [%llu]\n",
>> +		   l_count, line++, ktime, ktime);
>> +
>> +	stamp = xe_mmio_read32(gt, GUC_PMTIMESTAMP);
>> +	drm_printf(p, "[Capture/%d.%d] GuC timestamp: 0x%08X [%u]\n",
>> +		   l_count, line++, stamp, stamp);
>> +
>> +	drm_printf(p, "[Capture/%d.%d] CS timestamp frequency: %u Hz\n",
>> +		   l_count, line++, gt->info.reference_clock);
>> +
>> +	xe_assert(xe, !(size % (WORDS_PER_READ * BYTES_PER_WORD)));
>> +	for (i = 0; i < size / BYTES_PER_WORD; i += WORDS_PER_READ) {
>> +		u32 read[WORDS_PER_READ];
>> +
>> +		xe_map_memcpy_from(xe, read, &log->bo->vmap, i * BYTES_PER_WORD,
>> +				   WORDS_PER_READ * BYTES_PER_WORD);
>> +
>> +		for (j = 0; j < WORDS_PER_READ; ) {
>> +			u32 done = 0;
>> +
>> +			for (k = 0; k < DUMPS_PER_LINE; k++) {
>> +				line_buff[done++] = ' ';
>> +				done += hex_dump_to_buffer(read + j,
>> +							   sizeof(*read) * (WORDS_PER_READ - j),
>> +							   WORDS_PER_DUMP * BYTES_PER_WORD,
>> +							   BYTES_PER_WORD,
>> +							   line_buff + done,
>> +							   sizeof(line_buff) - done,
>> +							   false);
>> +				j += WORDS_PER_DUMP;
>> +			}
> as there could be many holes (zeros) in full GuC log, did you consider
> to skip such lines and update custom parser to understand that?
As per other response, in a real world live system, there won't be big 
chunks of zeros. And if you are specifically debugging a start of day 
issue then you can just use a smaller log size.

Rather than attempting to implement some kind of simple RLE, a real and 
significant benefit would be to re-use the compression mechanism we had 
in i915. That is a lot more effort, though. So that could be done as a 
follow up, but it is not worth holding up the current set of trivial 
fixes for some long term goal.

>
>> +
>> +			drm_printf(p, "[Capture/%d.%d]%s\n", l_count, line++, line_buff);
>>   		}
>>   	}
>> +
>> +	drm_printf(p, "[Capture/%d.%d] Done.\n", l_count, line++);
>>   }
> per BKM shouldn't we #undef not used any more local macros ?
One could. It didn't seem necessary given that this is the end of the file.

John.

>
>>   
>>   int xe_guc_log_init(struct xe_guc_log *log)