[PATCH v3 2/2] drm/xe/guc: Improve robustness of GuC log dumping to dmesg

Tue May 14 21:05:37 UTC 2024

On 14.05.2024 20:31, John Harrison wrote:
> On 5/14/2024 09:01, Michal Wajdeczko wrote:
>> On 09.05.2024 00:49, John.C.Harrison at Intel.com wrote:
>>> From: John Harrison <John.C.Harrison at Intel.com>
>>>
>>> There is a debug mechanism for dumping the GuC log as an ASCII hex
>>> stream via dmesg. This is extremely useful for situations where it is
>>> not possibe to query the log from debugfs (self tests, bugs that cause
>>> the driver to fail to load, system hangs, etc.). However, dumping via
>>> dmesg is not the most reliable. The dmesg buffer is limited in size,
>>> can be rate limited and a simple hex stream is hard to parse by tools.
>>>
>>> So add extra information to the dump to make it more robust and
>>> parsable. This includes adding start and end tags to delimit the dump,
>>> using longer lines to reduce the per line overhead, adding a rolling
>>> count to check for missing lines and interleaved concurrent dumps and
>>> adding other important information such as the GuC version number and
>>> timestamp offset.
>>>
>>> v2: Remove pm get/put as unnecessary (review feedback from Matthew B).
>>> v3: Add firmware filename and 'wanted' version number.
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/regs/xe_guc_regs.h |  1 +
>>>   drivers/gpu/drm/xe/xe_guc_log.c       | 85 ++++++++++++++++++++++-----
>>>   2 files changed, 71 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>>> b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>>> index 11682e675e0f..45fb3707fabe 100644
>>> --- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>>> +++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>>> @@ -82,6 +82,7 @@
>>>   #define   HUC_LOADING_AGENT_GUC            REG_BIT(1)
>>>   #define   GUC_WOPCM_OFFSET_VALID        REG_BIT(0)
>>>   #define GUC_MAX_IDLE_COUNT            XE_REG(0xc3e4)
>>> +#define GUC_PMTIMESTAMP                XE_REG(0xc3e8)
>>>     #define GUC_SEND_INTERRUPT            XE_REG(0xc4c8)
>>>   #define   GUC_SEND_TRIGGER            REG_BIT(0)
>>> diff --git a/drivers/gpu/drm/xe/xe_guc_log.c
>>> b/drivers/gpu/drm/xe/xe_guc_log.c
>>> index a37ee3419428..7e7e2fdc9a11 100644
>>> --- a/drivers/gpu/drm/xe/xe_guc_log.c
>>> +++ b/drivers/gpu/drm/xe/xe_guc_log.c
>>> @@ -7,11 +7,19 @@
>>>     #include <drm/drm_managed.h>
>>>   +#include "regs/xe_guc_regs.h"
>>>   #include "xe_bo.h"
>>>   #include "xe_gt.h"
>>>   #include "xe_map.h"
>>> +#include "xe_mmio.h"
>>>   #include "xe_module.h"
>>>   +static struct xe_guc *
>>> +log_to_guc(struct xe_guc_log *log)
>>> +{
>>> +    return container_of(log, struct xe_guc, log);
>>> +}
>>> +
>>>   static struct xe_gt *
>>>   log_to_gt(struct xe_guc_log *log)
>> as you have log_to_guc() then this log_to_gt() could be updated to:
>>
>>     return guc_to_gt(log_to_guc(log));
> Is there any point? The existing version works fine so why replace a
> single indirection with a double indirection?

compiler can inline that, but from our POV we will have single place
that must know exact placement of the .log member in our xe tree

> 
> 
>>
>>>   {
>>> @@ -49,32 +57,79 @@ static size_t guc_log_size(void)
>>>           CAPTURE_BUFFER_SIZE;
>>>   }
>>>   +#define BYTES_PER_WORD        sizeof(u32)
>>> +#define WORDS_PER_DUMP        8
>>> +#define DUMPS_PER_LINE        4
>>> +#define LINES_PER_READ        4
>>> +#define WORDS_PER_READ        (WORDS_PER_DUMP * DUMPS_PER_LINE *
>>> LINES_PER_READ)
>>> +
>> as you are heavily updating this function, maybe it's good time to add
>> kernel-doc for it ?
> Good idea. Will do.
> 
>>
>>>   void xe_guc_log_print(struct xe_guc_log *log, struct drm_printer *p)
>>>   {
>>> +    static int g_count;
>>
>>> +    struct xe_gt *gt = log_to_gt(log);
>>> +    struct xe_guc *guc = log_to_guc(log);
>>> +    struct xe_uc_fw_version *ver_f =
>>> &guc->fw.versions.found[XE_UC_FW_VER_RELEASE];
>>> +    struct xe_uc_fw_version *ver_w = &guc->fw.versions.wanted;
>>>       struct xe_device *xe = log_to_xe(log);
>>>       size_t size;
>>> -    int i, j;
>>> +    char line_buff[DUMPS_PER_LINE * WORDS_PER_DUMP * 9 + 1];
>>> +    int l_count = g_count++;
>>> +    int line = 0;
>>> +    int i, j, k;
>>> +    u64 ktime;
>>> +    u32 stamp;
>>>         xe_assert(xe, log->bo);
>>>         size = log->bo->size;
>>>   -#define DW_PER_READ        128
>>> -    xe_assert(xe, !(size % (DW_PER_READ * sizeof(u32))));
>>> -    for (i = 0; i < size / sizeof(u32); i += DW_PER_READ) {
>>> -        u32 read[DW_PER_READ];
>>> -
>>> -        xe_map_memcpy_from(xe, read, &log->bo->vmap, i * sizeof(u32),
>>> -                   DW_PER_READ * sizeof(u32));
>>> -#define DW_PER_PRINT        4
>>> -        for (j = 0; j < DW_PER_READ / DW_PER_PRINT; ++j) {
>>> -            u32 *print = read + j * DW_PER_PRINT;
>>> -
>>> -            drm_printf(p, "0x%08x 0x%08x 0x%08x 0x%08x\n",
>>> -                   *(print + 0), *(print + 1),
>>> -                   *(print + 2), *(print + 3));
>>> +    drm_printf(p, "[Capture/%d.%d] Dumping GuC log for %ps...\n",
>>> +           l_count, line++, __builtin_return_address(0));
>> this function is also used in debugfs outputs and prefixing all lines
>> with "[Capture/n.m]" is pointless there (and will also make collecting
>> GuC log over debugfs even more inefficient)
>>
>> and as you likely don't want to have separate print functions (one for
>> reliable dmesg, other for debugfs) then maybe consider use of cascaded
>> drm_printer as proposed in [1] - it will also make your code tidier
>>
>> [1] https://patchwork.freedesktop.org/series/133613/
> As already discussed, the intention was to keep this as simple as
> possible and not over engineer a stop gap measure. Yes, the debugfs
> version gets some extra overhead (but mitigated by using longer lines).

but OTOH exposing useless data over debugfs could be perceived as signs
of bad design and/or laziness, no?

> But size of the debugfs file is really not an issue, and it does provide
> extra robustness. 

hmm, are you suggesting that reading over debugfs is not robust and
requires extra protection? and it is xe driver specific issue?

> The prefix is also trivially easy to remove the prefix
> if desired with "cut -d ' ' -f 2' < in > out".

equally we can trivially avoid adding this prefix

> 
>>
>>> +
>>> +    drm_printf(p, "[Capture/%d.%d] GuC version %u.%u.%u (wanted
>>> %u.%u.%u)\n",
>>> +           l_count, line++,
>>> +           ver_f->major, ver_f->minor, ver_f->patch,
>>> +           ver_w->major, ver_w->minor, ver_w->patch);
>> hmm, what's the relation between "wanted version" and actual "guc log
>> buffer format" ? IMO it doesn't really matter what driver wanted to
>> load, this supposed to be "GuC-log-print" so then only actually running
>> version matters as it implies schema version needed for proper decoding.
> It is not necessary but it is potentially useful information that can be
> added pretty much for free, so why not?

because this function (undocumented) is still named as xe_guc_log_print
and is located in xe_guc_log.c which strongly suggests that it is about
GuC log, not anything else from the GuC

if you need combo print function, place it in guc.c and name properly

> 
>>
>>> +    drm_printf(p, "[Capture/%d.%d] GuC firmware: %s\n", l_count,
>>> line++, guc->fw.path);
>> again, why do we want include firmware filename here? it's not relevant
>> to the log buffer content/format (as we already have 'found version')
> Actually, it is important. The filename gives the GuC platform. And that
> is required to know what quirks need to be applied when decoding the
> log. E.g. context switch logs on a TGL platform are truncated because
> the hardware has fewer bits for the context id. The decoder needs to
> know that to correctly track context switching.

then you should explicitly capture the platform name, as firmware file
name could be provided by the user as modparam xe.guc_firmware_path and
named anything like foo.bin

> 
>>
>> maybe more interesting thing would be status of the GuC firmware?
>> whether it is still running and writing logs or it is already dead
> Not sure how you would get that information? Unless the GuC is actually
> in reset for the duration of the dump, there is no way to know whether
> it is alive, actively logging, idle, or what.

we could start with capturing GUC_STATUS(0xc000)

but we can also assume that captured log snapshot is always incomplete
due to a living nature of the GuC firmware (so there is expected
mismatch between what we can find in header vs actual last log entry)

> 
>>
>>> +
>>> +    ktime = ktime_get_boottime_ns();
>>> +    drm_printf(p, "[Capture/%d.%d] Kernel timestamp: 0x%08llX
>>> [%llu]\n",
>>> +           l_count, line++, ktime, ktime);
>>> +
>>> +    stamp = xe_mmio_read32(gt, GUC_PMTIMESTAMP);
>>> +    drm_printf(p, "[Capture/%d.%d] GuC timestamp: 0x%08X [%u]\n",
>>> +           l_count, line++, stamp, stamp);
>>> +
>>> +    drm_printf(p, "[Capture/%d.%d] CS timestamp frequency: %u Hz\n",
>>> +           l_count, line++, gt->info.reference_clock);
>>> +
>>> +    xe_assert(xe, !(size % (WORDS_PER_READ * BYTES_PER_WORD)));
>>> +    for (i = 0; i < size / BYTES_PER_WORD; i += WORDS_PER_READ) {
>>> +        u32 read[WORDS_PER_READ];
>>> +
>>> +        xe_map_memcpy_from(xe, read, &log->bo->vmap, i *
>>> BYTES_PER_WORD,
>>> +                   WORDS_PER_READ * BYTES_PER_WORD);
>>> +
>>> +        for (j = 0; j < WORDS_PER_READ; ) {
>>> +            u32 done = 0;
>>> +
>>> +            for (k = 0; k < DUMPS_PER_LINE; k++) {
>>> +                line_buff[done++] = ' ';
>>> +                done += hex_dump_to_buffer(read + j,
>>> +                               sizeof(*read) * (WORDS_PER_READ - j),
>>> +                               WORDS_PER_DUMP * BYTES_PER_WORD,
>>> +                               BYTES_PER_WORD,
>>> +                               line_buff + done,
>>> +                               sizeof(line_buff) - done,
>>> +                               false);
>>> +                j += WORDS_PER_DUMP;
>>> +            }
>> as there could be many holes (zeros) in full GuC log, did you consider
>> to skip such lines and update custom parser to understand that?
> As per other response, in a real world live system, there won't be big
> chunks of zeros. And if you are specifically debugging a start of day
> issue then you can just use a smaller log size.
> 
> Rather than attempting to implement some kind of simple RLE, a real and
> significant benefit would be to re-use the compression mechanism we had
> in i915. That is a lot more effort, though. So that could be done as a
> follow up, but it is not worth holding up the current set of trivial
> fixes for some long term goal.
> 
>>
>>> +
>>> +            drm_printf(p, "[Capture/%d.%d]%s\n", l_count, line++,
>>> line_buff);
>>>           }
>>>       }
>>> +
>>> +    drm_printf(p, "[Capture/%d.%d] Done.\n", l_count, line++);
>>>   }
>> per BKM shouldn't we #undef not used any more local macros ?
> One could. It didn't seem necessary given that this is the end of the file.
> 
> John.
> 
>>
>>>     int xe_guc_log_init(struct xe_guc_log *log)
>