[PATCH v3 2/2] drm/xe/guc: Improve robustness of GuC log dumping to dmesg

Thu May 16 00:00:58 UTC 2024

On 5/14/2024 14:05, Michal Wajdeczko wrote:
> On 14.05.2024 20:31, John Harrison wrote:
>> On 5/14/2024 09:01, Michal Wajdeczko wrote:
>>> On 09.05.2024 00:49, John.C.Harrison at Intel.com wrote:
>>>> From: John Harrison <John.C.Harrison at Intel.com>
>>>>
>>>> There is a debug mechanism for dumping the GuC log as an ASCII hex
>>>> stream via dmesg. This is extremely useful for situations where it is
>>>> not possibe to query the log from debugfs (self tests, bugs that cause
>>>> the driver to fail to load, system hangs, etc.). However, dumping via
>>>> dmesg is not the most reliable. The dmesg buffer is limited in size,
>>>> can be rate limited and a simple hex stream is hard to parse by tools.
>>>>
>>>> So add extra information to the dump to make it more robust and
>>>> parsable. This includes adding start and end tags to delimit the dump,
>>>> using longer lines to reduce the per line overhead, adding a rolling
>>>> count to check for missing lines and interleaved concurrent dumps and
>>>> adding other important information such as the GuC version number and
>>>> timestamp offset.
>>>>
>>>> v2: Remove pm get/put as unnecessary (review feedback from Matthew B).
>>>> v3: Add firmware filename and 'wanted' version number.
>>>>
>>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>>> ---
>>>>    drivers/gpu/drm/xe/regs/xe_guc_regs.h |  1 +
>>>>    drivers/gpu/drm/xe/xe_guc_log.c       | 85 ++++++++++++++++++++++-----
>>>>    2 files changed, 71 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>>>> b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>>>> index 11682e675e0f..45fb3707fabe 100644
>>>> --- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>>>> +++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>>>> @@ -82,6 +82,7 @@
>>>>    #define   HUC_LOADING_AGENT_GUC            REG_BIT(1)
>>>>    #define   GUC_WOPCM_OFFSET_VALID        REG_BIT(0)
>>>>    #define GUC_MAX_IDLE_COUNT            XE_REG(0xc3e4)
>>>> +#define GUC_PMTIMESTAMP                XE_REG(0xc3e8)
>>>>      #define GUC_SEND_INTERRUPT            XE_REG(0xc4c8)
>>>>    #define   GUC_SEND_TRIGGER            REG_BIT(0)
>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_log.c
>>>> b/drivers/gpu/drm/xe/xe_guc_log.c
>>>> index a37ee3419428..7e7e2fdc9a11 100644
>>>> --- a/drivers/gpu/drm/xe/xe_guc_log.c
>>>> +++ b/drivers/gpu/drm/xe/xe_guc_log.c
>>>> @@ -7,11 +7,19 @@
>>>>      #include <drm/drm_managed.h>
>>>>    +#include "regs/xe_guc_regs.h"
>>>>    #include "xe_bo.h"
>>>>    #include "xe_gt.h"
>>>>    #include "xe_map.h"
>>>> +#include "xe_mmio.h"
>>>>    #include "xe_module.h"
>>>>    +static struct xe_guc *
>>>> +log_to_guc(struct xe_guc_log *log)
>>>> +{
>>>> +    return container_of(log, struct xe_guc, log);
>>>> +}
>>>> +
>>>>    static struct xe_gt *
>>>>    log_to_gt(struct xe_guc_log *log)
>>> as you have log_to_guc() then this log_to_gt() could be updated to:
>>>
>>>      return guc_to_gt(log_to_guc(log));
>> Is there any point? The existing version works fine so why replace a
>> single indirection with a double indirection?
> compiler can inline that, but from our POV we will have single place
> that must know exact placement of the .log member in our xe tree
>
>>
>>>>    {
>>>> @@ -49,32 +57,79 @@ static size_t guc_log_size(void)
>>>>            CAPTURE_BUFFER_SIZE;
>>>>    }
>>>>    +#define BYTES_PER_WORD        sizeof(u32)
>>>> +#define WORDS_PER_DUMP        8
>>>> +#define DUMPS_PER_LINE        4
>>>> +#define LINES_PER_READ        4
>>>> +#define WORDS_PER_READ        (WORDS_PER_DUMP * DUMPS_PER_LINE *
>>>> LINES_PER_READ)
>>>> +
>>> as you are heavily updating this function, maybe it's good time to add
>>> kernel-doc for it ?
>> Good idea. Will do.
>>
>>>>    void xe_guc_log_print(struct xe_guc_log *log, struct drm_printer *p)
>>>>    {
>>>> +    static int g_count;
>>>> +    struct xe_gt *gt = log_to_gt(log);
>>>> +    struct xe_guc *guc = log_to_guc(log);
>>>> +    struct xe_uc_fw_version *ver_f =
>>>> &guc->fw.versions.found[XE_UC_FW_VER_RELEASE];
>>>> +    struct xe_uc_fw_version *ver_w = &guc->fw.versions.wanted;
>>>>        struct xe_device *xe = log_to_xe(log);
>>>>        size_t size;
>>>> -    int i, j;
>>>> +    char line_buff[DUMPS_PER_LINE * WORDS_PER_DUMP * 9 + 1];
>>>> +    int l_count = g_count++;
>>>> +    int line = 0;
>>>> +    int i, j, k;
>>>> +    u64 ktime;
>>>> +    u32 stamp;
>>>>          xe_assert(xe, log->bo);
>>>>          size = log->bo->size;
>>>>    -#define DW_PER_READ        128
>>>> -    xe_assert(xe, !(size % (DW_PER_READ * sizeof(u32))));
>>>> -    for (i = 0; i < size / sizeof(u32); i += DW_PER_READ) {
>>>> -        u32 read[DW_PER_READ];
>>>> -
>>>> -        xe_map_memcpy_from(xe, read, &log->bo->vmap, i * sizeof(u32),
>>>> -                   DW_PER_READ * sizeof(u32));
>>>> -#define DW_PER_PRINT        4
>>>> -        for (j = 0; j < DW_PER_READ / DW_PER_PRINT; ++j) {
>>>> -            u32 *print = read + j * DW_PER_PRINT;
>>>> -
>>>> -            drm_printf(p, "0x%08x 0x%08x 0x%08x 0x%08x\n",
>>>> -                   *(print + 0), *(print + 1),
>>>> -                   *(print + 2), *(print + 3));
>>>> +    drm_printf(p, "[Capture/%d.%d] Dumping GuC log for %ps...\n",
>>>> +           l_count, line++, __builtin_return_address(0));
>>> this function is also used in debugfs outputs and prefixing all lines
>>> with "[Capture/n.m]" is pointless there (and will also make collecting
>>> GuC log over debugfs even more inefficient)
>>>
>>> and as you likely don't want to have separate print functions (one for
>>> reliable dmesg, other for debugfs) then maybe consider use of cascaded
>>> drm_printer as proposed in [1] - it will also make your code tidier
>>>
>>> [1] https://patchwork.freedesktop.org/series/133613/
>> As already discussed, the intention was to keep this as simple as
>> possible and not over engineer a stop gap measure. Yes, the debugfs
>> version gets some extra overhead (but mitigated by using longer lines).
> but OTOH exposing useless data over debugfs could be perceived as signs
> of bad design and/or laziness, no?
Meaning just the 'Capture X/Y' prefix or all the meta data as well? The 
meta data is absolutely not  useless data.

For the Capture prefix, if you are so set on removing it then feel free 
to post a patch on top. Personally, I do not consider it useless. See below.

>> But size of the debugfs file is really not an issue, and it does provide
>> extra robustness.
> hmm, are you suggesting that reading over debugfs is not robust and
> requires extra protection? and it is xe driver specific issue?
Not debugfs itself, but copying files around afterwards. It might be 
extremely unlikely but it is not impossible. Many file formats have CRCs 
and such built in for that reason.

>
>> The prefix is also trivially easy to remove the prefix
>> if desired with "cut -d ' ' -f 2' < in > out".
> equally we can trivially avoid adding this prefix
Not that trivially.

>
>>>> +
>>>> +    drm_printf(p, "[Capture/%d.%d] GuC version %u.%u.%u (wanted
>>>> %u.%u.%u)\n",
>>>> +           l_count, line++,
>>>> +           ver_f->major, ver_f->minor, ver_f->patch,
>>>> +           ver_w->major, ver_w->minor, ver_w->patch);
>>> hmm, what's the relation between "wanted version" and actual "guc log
>>> buffer format" ? IMO it doesn't really matter what driver wanted to
>>> load, this supposed to be "GuC-log-print" so then only actually running
>>> version matters as it implies schema version needed for proper decoding.
>> It is not necessary but it is potentially useful information that can be
>> added pretty much for free, so why not?
> because this function (undocumented) is still named as xe_guc_log_print
> and is located in xe_guc_log.c which strongly suggests that it is about
> GuC log, not anything else from the GuC
>
> if you need combo print function, place it in guc.c and name properly
It is about the GuC log. It is for dumping everything and anything that 
is required to get a properly decoded and understood GuC log in a single 
file. When we get time to implement VFDs, all of this data would be 
added to the log structure itself as driver specified data blocks.

>
>>>> +    drm_printf(p, "[Capture/%d.%d] GuC firmware: %s\n", l_count,
>>>> line++, guc->fw.path);
>>> again, why do we want include firmware filename here? it's not relevant
>>> to the log buffer content/format (as we already have 'found version')
>> Actually, it is important. The filename gives the GuC platform. And that
>> is required to know what quirks need to be applied when decoding the
>> log. E.g. context switch logs on a TGL platform are truncated because
>> the hardware has fewer bits for the context id. The decoder needs to
>> know that to correctly track context switching.
> then you should explicitly capture the platform name, as firmware file
> name could be provided by the user as modparam xe.guc_firmware_path and
> named anything like foo.bin
If the that is happening then it is probably a custom build firmware 
file and requires a matching custom build of the schema in which case 
nothing will work unless the person doing the decode is passing custom 
options to the decoder anyway. And if an end user is arbitrarily 
renaming files for no valid reason, then they won't get their bug 
investigated. Besides, there is no driver side state that identifies the 
GuC platform. It is not necessarily the same as the actual hardware 
platform because multiple hardware platforms share the same GuC 
firmware. Indeed, there are also multiple GuC platform builds that can 
be run on a single hardware platform.

>
>>> maybe more interesting thing would be status of the GuC firmware?
>>> whether it is still running and writing logs or it is already dead
>> Not sure how you would get that information? Unless the GuC is actually
>> in reset for the duration of the dump, there is no way to know whether
>> it is alive, actively logging, idle, or what.
> we could start with capturing GUC_STATUS(0xc000)
>
> but we can also assume that captured log snapshot is always incomplete
> due to a living nature of the GuC firmware (so there is expected
> mismatch between what we can find in header vs actual last log entry)
Which is why the decoder script has support for detecting unexpected 
wrap points and re-wrapping the log before passing it to the actual 
decoder executable.

But you still have to do a single atomic-as-possible dump of the buffer 
into a single log file to get as close as possible to coherency. The 
more separate reads you split it into, the more likely you are to have 
inconsistencies.

John.

>
>>>> +
>>>> +    ktime = ktime_get_boottime_ns();
>>>> +    drm_printf(p, "[Capture/%d.%d] Kernel timestamp: 0x%08llX
>>>> [%llu]\n",
>>>> +           l_count, line++, ktime, ktime);
>>>> +
>>>> +    stamp = xe_mmio_read32(gt, GUC_PMTIMESTAMP);
>>>> +    drm_printf(p, "[Capture/%d.%d] GuC timestamp: 0x%08X [%u]\n",
>>>> +           l_count, line++, stamp, stamp);
>>>> +
>>>> +    drm_printf(p, "[Capture/%d.%d] CS timestamp frequency: %u Hz\n",
>>>> +           l_count, line++, gt->info.reference_clock);
>>>> +
>>>> +    xe_assert(xe, !(size % (WORDS_PER_READ * BYTES_PER_WORD)));
>>>> +    for (i = 0; i < size / BYTES_PER_WORD; i += WORDS_PER_READ) {
>>>> +        u32 read[WORDS_PER_READ];
>>>> +
>>>> +        xe_map_memcpy_from(xe, read, &log->bo->vmap, i *
>>>> BYTES_PER_WORD,
>>>> +                   WORDS_PER_READ * BYTES_PER_WORD);
>>>> +
>>>> +        for (j = 0; j < WORDS_PER_READ; ) {
>>>> +            u32 done = 0;
>>>> +
>>>> +            for (k = 0; k < DUMPS_PER_LINE; k++) {
>>>> +                line_buff[done++] = ' ';
>>>> +                done += hex_dump_to_buffer(read + j,
>>>> +                               sizeof(*read) * (WORDS_PER_READ - j),
>>>> +                               WORDS_PER_DUMP * BYTES_PER_WORD,
>>>> +                               BYTES_PER_WORD,
>>>> +                               line_buff + done,
>>>> +                               sizeof(line_buff) - done,
>>>> +                               false);
>>>> +                j += WORDS_PER_DUMP;
>>>> +            }
>>> as there could be many holes (zeros) in full GuC log, did you consider
>>> to skip such lines and update custom parser to understand that?
>> As per other response, in a real world live system, there won't be big
>> chunks of zeros. And if you are specifically debugging a start of day
>> issue then you can just use a smaller log size.
>>
>> Rather than attempting to implement some kind of simple RLE, a real and
>> significant benefit would be to re-use the compression mechanism we had
>> in i915. That is a lot more effort, though. So that could be done as a
>> follow up, but it is not worth holding up the current set of trivial
>> fixes for some long term goal.
>>
>>>> +
>>>> +            drm_printf(p, "[Capture/%d.%d]%s\n", l_count, line++,
>>>> line_buff);
>>>>            }
>>>>        }
>>>> +
>>>> +    drm_printf(p, "[Capture/%d.%d] Done.\n", l_count, line++);
>>>>    }
>>> per BKM shouldn't we #undef not used any more local macros ?
>> One could. It didn't seem necessary given that this is the end of the file.
>>
>> John.
>>
>>>>      int xe_guc_log_init(struct xe_guc_log *log)