[PATCH] drm/xe/guc: Improve robustness of GuC log dumping to dmesg

Wed May 1 02:26:11 UTC 2024

On Tue, Apr 30, 2024 at 04:56:49PM -0700, John Harrison wrote:
> On 4/30/2024 16:44, Matthew Brost wrote:
> > On Tue, Apr 30, 2024 at 04:34:36PM -0700, John.C.Harrison at Intel.com wrote:
> > > From: John Harrison <John.C.Harrison at Intel.com>
> > > 
> > > There is a debug mechanism for dumping the GuC log as an ASCII hex
> > > stream via dmesg. This is extremely useful for situations where it is
> > > not possibe to query the log from debugfs (self tests, bugs that cause
> > > the driver to fail to load, system hangs, etc.). However, dumping via
> > > dmesg is not the most reliable. The dmesg buffer is limited in size,
> > > can be rate limited and a simple hex stream is hard to parse by tools.
> > > 
> > > So add extra information to the dump to make it more robust and
> > > parsable. This includes adding start and end tags to delimit the dump,
> > > using longer lines to reduce the per line overhead, adding a rolling
> > > count to check for missing lines and interleaved concurrent dumps and
> > > adding other important information such as the GuC version number and
> > > timestamp offset.
> > > 
> > > Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/regs/xe_guc_regs.h |  1 +
> > >   drivers/gpu/drm/xe/xe_guc_log.c       | 78 ++++++++++++++++++++++-----
> > >   2 files changed, 66 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > > index 11682e675e0f..45fb3707fabe 100644
> > > --- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > > +++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
> > > @@ -82,6 +82,7 @@
> > >   #define   HUC_LOADING_AGENT_GUC			REG_BIT(1)
> > >   #define   GUC_WOPCM_OFFSET_VALID		REG_BIT(0)
> > >   #define GUC_MAX_IDLE_COUNT			XE_REG(0xc3e4)
> > > +#define GUC_PMTIMESTAMP				XE_REG(0xc3e8)
> > >   #define GUC_SEND_INTERRUPT			XE_REG(0xc4c8)
> > >   #define   GUC_SEND_TRIGGER			REG_BIT(0)
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c
> > > index a37ee3419428..ea269efd9c21 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_log.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_log.c
> > > @@ -7,10 +7,19 @@
> > >   #include <drm/drm_managed.h>
> > > +#include "regs/xe_guc_regs.h"
> > >   #include "xe_bo.h"
> > >   #include "xe_gt.h"
> > >   #include "xe_map.h"
> > > +#include "xe_mmio.h"
> > >   #include "xe_module.h"
> > > +#include "xe_pm.h"
> > > +
> > > +static struct xe_guc *
> > > +log_to_guc(struct xe_guc_log *log)
> > > +{
> > > +	return container_of(log, struct xe_guc, log);
> > > +}
> > >   static struct xe_gt *
> > >   log_to_gt(struct xe_guc_log *log)
> > > @@ -49,32 +58,75 @@ static size_t guc_log_size(void)
> > >   		CAPTURE_BUFFER_SIZE;
> > >   }
> > > +#define BYTES_PER_WORD		sizeof(u32)
> > > +#define WORDS_PER_DUMP		8
> > > +#define DUMPS_PER_LINE		4
> > > +#define LINES_PER_READ		4
> > > +#define WORDS_PER_READ		(WORDS_PER_DUMP * DUMPS_PER_LINE * LINES_PER_READ)
> > > +
> > >   void xe_guc_log_print(struct xe_guc_log *log, struct drm_printer *p)
> > >   {
> > > +	static int g_count;
> > > +	struct xe_uc_fw_version *ver = &log_to_guc(log)->fw.versions.found[XE_UC_FW_VER_RELEASE];
> > >   	struct xe_device *xe = log_to_xe(log);
> > >   	size_t size;
> > > -	int i, j;
> > > +	char line_buff[DUMPS_PER_LINE * WORDS_PER_DUMP * 9 + 1];
> > > +	int l_count = g_count++;
> > > +	int line = 0;
> > > +	int i, j, k;
> > > +	u64 ktime;
> > > +	u32 stamp;
> > >   	xe_assert(xe, log->bo);
> > >   	size = log->bo->size;
> > > -#define DW_PER_READ		128
> > > -	xe_assert(xe, !(size % (DW_PER_READ * sizeof(u32))));
> > > -	for (i = 0; i < size / sizeof(u32); i += DW_PER_READ) {
> > > -		u32 read[DW_PER_READ];
> > > +	drm_printf(p, "[Capture/%d.%d] Dumping GuC log for %ps...\n",
> > > +		   l_count, line++, __builtin_return_address(0));
> > > +
> > > +	drm_printf(p, "[Capture/%d.%d] GuC version %u.%u.%u\n",
> > > +		   l_count, line++, ver->major, ver->minor, ver->patch);
> > > +
> > > +	ktime = ktime_get_boottime_ns();
> > > +	drm_printf(p, "[Capture/%d.%d] Kernel timestamp: 0x%08llX [%llu]\n",
> > > +		   l_count, line++, ktime, ktime);
> > > -		xe_map_memcpy_from(xe, read, &log->bo->vmap, i * sizeof(u32),
> > > -				   DW_PER_READ * sizeof(u32));
> > > -#define DW_PER_PRINT		4
> > > -		for (j = 0; j < DW_PER_READ / DW_PER_PRINT; ++j) {
> > > -			u32 *print = read + j * DW_PER_PRINT;
> > > +	xe_pm_runtime_get(xe);
> > > +	stamp = xe_mmio_read32(log_to_gt(log), GUC_PMTIMESTAMP);
> > > +	xe_pm_runtime_put(xe);
> > Just a quick drive by comment, fine grained PM control like this frowned
> > upon in Xe or maybe even not allowed. Rodrigo / Matt Auld would would be
> > the experts here.
> > 
> > Presumbly all the callers should already have a PM ref.
> It wasn't clear that this would be the case.
> 
> And the whole pm_runtime interface is exceedingly confusing. _get calls
> get_noresume then does a task ownership test and calls resume,
> _resume_and_get does exactly the same but in a different order, plus there
> is _get_if_active and _get_if_in_use. Plus _get_ioctl which is for calling
> *before* making an IOCTL (where an IOCTL is a call from the user into the
> kernel!). And none of these have any documentation beyond a single sentence
> that is simply re-writing the function name with with prepositions added.
> 
> So I am unsurprised if this is not the correct way to do it but I have no
> idea what would be the correct method.
> 

There is a little documentation, from a xe_pm.c:

 60  * Also, Xe PM provides get and put functions that Xe driver will use to
 61  * indicate activity. In order to avoid locking complications with the memory
 62  * management, whenever possible, these get and put functions needs to be called
 63  * from the higher/outer levels.
 64  * The main cases that need to be protected from the outer levels are: IOCTL,
 65  * sysfs, debugfs, dma-buf sharing, GPU execution.

Also the functions appear to have kernel doc too. Agree it could use a
little improvement.

I don't think the work on PM is completely done either, but the general
rule is PM refs are taken at outer most functions per the doc above.

So if this function could be called during driver load, a GT reset,
debugfs, or a self test those are the places where PM ref would be
taken. With this, most parts of driver should never have to take a PM
ref a fined grained level like this. Again Rodrigo is the expert here
if you want more details. Or he can correct me if I'm wrong about
something :)

> > 
> > > +	drm_printf(p, "[Capture/%d.%d] GuC timestamp: 0x%08X [%u]\n",
> > > +		   l_count, line++, stamp, stamp);
> > > -			drm_printf(p, "0x%08x 0x%08x 0x%08x 0x%08x\n",
> > > -				   *(print + 0), *(print + 1),
> > > -				   *(print + 2), *(print + 3));
> > > +	drm_printf(p, "[Capture/%d.%d] CS timestamp frequency: %u Hz\n",
> > > +		   l_count, line++, log_to_gt(log)->info.reference_clock);
> > > +
> > > +	xe_assert(xe, !(size % (WORDS_PER_READ * BYTES_PER_WORD)));
> > > +	for (i = 0; i < size / BYTES_PER_WORD; i += WORDS_PER_READ) {
> > > +		u32 read[WORDS_PER_READ];
> > > +
> > > +		xe_map_memcpy_from(xe, read, &log->bo->vmap, i * BYTES_PER_WORD,
> > > +				   WORDS_PER_READ * BYTES_PER_WORD);
> > > +
> > Also I think the above pukes without a PM ref too.
> If that is the case, should there not have been an assert about having a PM
> reference at the start of the function?
> 

Any MMIO or memory access has that assert. Adding it in every function
that touches MMIO or memory is a bit redunant.

> Although I don't see why a PM reference would be required to read from
> system memory on an integrated part? Maybe for a discrete part. But I would

It might not be... but in Xe we enforce that all MMIO and memory access
(at least memory access through xe_map.h) has a PM ref. That design
makes sense to me rather than special casing sram vs. vram. This also
aligns with coarser grainer PM control.

> assume that having a vmap in existence would mean the memory is locked and
> accessible. Either that, or a vmap is only allowed to be used by accessor
> functions which themselves do whatever is required to make the access work.
> 

We pin memory that has a vmap so it is not moving. A vmap ultimately just
resolves to pointer too. On PM suspend / resume we restore to same
physical location too so I don't think we ever unmap / remap it either.
I could be mistaken about this, haven't look at that in a while.

Matt

> John.
> 
> > 
> > Matt
> > 
> > > +		for (j = 0; j < WORDS_PER_READ; ) {
> > > +			u32 done = 0;
> > > +
> > > +			for (k = 0; k < DUMPS_PER_LINE; k++) {
> > > +				line_buff[done++] = ' ';
> > > +				done += hex_dump_to_buffer(read + j,
> > > +							   sizeof(*read) * (WORDS_PER_READ - j),
> > > +							   WORDS_PER_DUMP * BYTES_PER_WORD,
> > > +							   BYTES_PER_WORD,
> > > +							   line_buff + done,
> > > +							   sizeof(line_buff) - done,
> > > +							   false);
> > > +				j += WORDS_PER_DUMP;
> > > +			}
> > > +
> > > +			drm_printf(p, "[Capture/%d.%d]%s\n", l_count, line++, line_buff);
> > >   		}
> > >   	}
> > > +
> > > +	drm_printf(p, "[Capture/%d.%d] Done.\n", l_count, line++);
> > >   }
> > >   int xe_guc_log_init(struct xe_guc_log *log)
> > > -- 
> > > 2.43.2
> > > 
>