[PATCH 2/2] drm/xe/guc: Add support for NPK as a GuC log target

Thu Jun 12 19:52:55 UTC 2025

-----Original Message-----
From: Harrison, John C <john.c.harrison at intel.com> 
Sent: Thursday, June 12, 2025 11:47 AM
To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; Intel-Xe at Lists.FreeDesktop.Org
Subject: Re: [PATCH 2/2] drm/xe/guc: Add support for NPK as a GuC log target
> 
> On 6/12/2025 11:27 AM, Cavitt, Jonathan wrote:
> > ----Original Message-----
> > From: Harrison, John C <john.c.harrison at intel.com>
> > Sent: Thursday, June 12, 2025 10:50 AM
> > To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; Intel-Xe at Lists.FreeDesktop.Org
> > Subject: Re: [PATCH 2/2] drm/xe/guc: Add support for NPK as a GuC log target
> >> On 6/12/2025 7:32 AM, Cavitt, Jonathan wrote:
> >>> -----Original Message-----
> >>> From: Harrison, John C <john.c.harrison at intel.com>
> >>> Sent: Wednesday, June 11, 2025 4:57 PM
> >>> To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; Intel-Xe at Lists.FreeDesktop.Org
> >>> Subject: Re: [PATCH 2/2] drm/xe/guc: Add support for NPK as a GuC log target
> >>>> On 6/11/2025 3:04 PM, Cavitt, Jonathan wrote:
> >>>>> -----Original Message-----
> >>>>> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of John.C.Harrison at Intel.com
> >>>>> Sent: Wednesday, June 11, 2025 2:06 PM
> >>>>> To: Intel-Xe at Lists.FreeDesktop.Org
> >>>>> Cc: Harrison, John C <john.c.harrison at intel.com>
> >>>>> Subject: [PATCH 2/2] drm/xe/guc: Add support for NPK as a GuC log target
> >>>>>> From: John Harrison <John.C.Harrison at Intel.com>
> >>>>>>
> >>>>>> The GuC has an option to write log data via NPK. This is basically a
> >>>>>> magic IO address that GuC writes arbitrary data to and which can be
> >>>>>> logged by a suitable hardware logger. This can allow retrieval of the
> >>>>>> GuC log in hardware debug environments even when the system as a whole
> >>>>>> dies horribly.
> >>>>>>
> >>>>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> >>>>> So, this is basically a new modparam value that redirects GuC logs to
> >>>>> a specific IO address?  I take it guc_log_target = 2 is the default value, and
> >>>>> guc_log_target = 1 would print the logs to stdout, then?  I'd ask why we
> >>>>> use 0 as a default value and not just default to 2 all the time, but I think I
> >>>>> already know why (we need to guard against guc_log_target = 0 anyways
> >>>>> to prevent printing to stdin).
> >>>> Um, read the patch - "(0=memory [default], 1 = NPK, 2 = memory + NPK)".
> >>>> The default is zero. And no, nothing prints to stdout. This is about
> >>>> hardware level debugging. It has nothing to do with stdin/stdout/stderr.
> >>>> Those concepts do not exist in hardware nor in the KMD. If you send the
> >>>> GuC log to the NPK target then you need a hardware debugger (JTAG, etc.)
> >>>> to read it, as described in the commit message.
> >>> Ah, okay.  When I read "This is basically a magic IO address that GuC writes
> >>> arbitrary data to", I thought that was indicating that we're writing to the
> >>> in/out/err IO addresses, and that the data being logged "by a suitable
> >> You say "the in/out/err IO addresses" like there is such a thing.
> >> In/out/err generally refers to stdin/stdout/stderr, being the default
> >> first three file handles of a unix process. File handles are not IO
> >> addresses. Such file handles also do not exist in the kernel. And they
> >> absolutely do not exist inside the GT hardware. My comment was referring
> > Okay, right.  The last time the distinction between I/O as a device concept
> > and I/O as an interface concept was relevant to me was about 8 years ago,
> > so I can understand how I got confused there.
> >
> >> to memory mapped IO addresses, i.e. hardware registers. On a normal
> >> system, there is nothing connected to said hardware so the logged output
> >> goes nowhere. However, if you have a hardware debugger attached to the
> >> system then it can trap those accesses and record the log.
> > You know, sometimes I forget that the intended customers for these
> > products are major companies that end up shoving these cards into
> > giant server racks and not PC users with screens and keyboards.
> >
> >>> hardware logger" was occurring separately.  I guess I also thought NPK
> >>> was a writing protocol and not a hardware address.
> >> NPK is neither a protocol nor an address. It is a block of silicon
> >> called North Peak, also known as the Intel Trace Hub.
> > "The GuC has an option to write log data via NPK. This is basically a magic IO address..."
> >
> > If NPK is "a block of silicon" and not an IO address, then perhaps this would be better
> > worded as:
> >
> > "The GuC has the option to write log data to NPK, which is basically a block of silicon..."
> NPK is a block which implements a register which is written to by GuC as 
> an MMIO access. The point of saying 'basically' is because this is very 
> simple view - GuC writes to a register. All else is irrelevant. If you 
> want the full details then there is a white paper about it. But 
> generally speaking, if you don't know already then it isn't relevant to 
> you because you don't have the hundreds of thousands of dollars of 
> hardware required to make use of it. So complicated descriptions are 
> pointless.

We can still keep the description precise without overloading it with "pointless"
details:

"""
The GuC has the option to write log data through NPK.  NPK acts as a
transport layer that can send arbitrary data to a dedicated hardware
logger if one is connected.  This allows for the retrieval of the GuC log
even in the event of catastrophic hardware failure.
"""

> 
> >>> It didn't occur to me that we'd need to directly write the logs to the
> >>> hardware logger in cases of catastrophic failure because we already have
> >>> methods of streaming the logs directly via the serial ports.  Though I
> >>> suppose that we're talking about different "logs" at this point?
> >> I don't know what logs you are talking about. This patch is quite
> >> clearly only talking about the GuC log. Which is generally accessible
> >> via debugfs snapshots or as part of a devcoredump capture.
> >> It is not ever 'streamed directly via the serial ports'.
> > I was referring to the dmesg logs at the time, though I will admit that I forgot
> > there were classes of logs that never get printed to dmesg.  I don't
> > personally agree with the practice and think that all relevant logs should
> > be printed to dmesg if possible, even if only at certain debug levels or
> Sure, lets write to dmesg from inside the GT hardware. I'm sure we can 
> add that in to the next product...

Your sarcasm is noted...

> 
> DMesg is for super important kernel messages. Sometimes, it is the only 
> way to debug kernel related problems because the system dies and 
> userland is no longer functional. But it is absolutely not meant to be 
> the default output method for all possible logging systems. For example, 
> the ftrace mechanism exists precisely because dmesg is not the right way 
> to log many things.

We can agree to disagree, I guess.

> 
> > upon direct request.  However, I can at least see why we'd want to store those
> > separately in the NPK silicon block in case of catastrophic failure given the files
> > they normally get saved to are wiped on system reset.
> Nothing is stored. NPK is simply a transport mechanism here. If there is 
> nothing connected to it then it goes nowhere. /dev/null if you prefer. 
> And there is no file to get wiped. The normal target for GuC logging is 
> a memory buffer. Any access via a debugfs or sysfs 'file' is just code 
> being run inside the kernel to read that memory buffer and return it to 
> the user.

Regardless, information is being wiped, and that's preventing us from
accessing the log after system reset because the log (or I guess "the data
used to construct the log" if we're being pedantic) doesn't exist anymore.

> 
> John.
> 
> >
> >> John.
> >>
> >>> The reviewed-by still stands.
> >>> -Jonathan Cavitt
> >>>
> >>>>> I also take it this is modified on boot by, for example, writing
> >>>>> "xe.guc_log_target=1" to CMDLINE_LINUX_DEFAULT as a part of the grub file.
> >>>> That is generally how module parameters work. You can also set via
> >>>> modprobe.conf files as long as the Xe driver is a module and not
> >>>> compiled in.
> >>>>
> >>>> John.
> >>>>
> >>>>> Yeah, seems good.
> >>>>> Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> >>>>> -Jonathan Cavitt
> >>>>>
> >>>>>> ---
> >>>>>>     drivers/gpu/drm/xe/xe_guc.c    | 4 ++++
> >>>>>>     drivers/gpu/drm/xe/xe_module.c | 4 ++++
> >>>>>>     drivers/gpu/drm/xe/xe_module.h | 1 +
> >>>>>>     3 files changed, 9 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> >>>>>> index e16d19b44bcc..9c0e3113f7d5 100644
> >>>>>> --- a/drivers/gpu/drm/xe/xe_guc.c
> >>>>>> +++ b/drivers/gpu/drm/xe/xe_guc.c
> >>>>>> @@ -35,6 +35,7 @@
> >>>>>>     #include "xe_guc_submit.h"
> >>>>>>     #include "xe_memirq.h"
> >>>>>>     #include "xe_mmio.h"
> >>>>>> +#include "xe_module.h"
> >>>>>>     #include "xe_platform_types.h"
> >>>>>>     #include "xe_sriov.h"
> >>>>>>     #include "xe_uc.h"
> >>>>>> @@ -74,6 +75,9 @@ static u32 guc_ctl_debug_flags(struct xe_guc *guc)
> >>>>>>     	else
> >>>>>>     		flags |= FIELD_PREP(GUC_LOG_VERBOSITY, GUC_LOG_LEVEL_TO_VERBOSITY(level));
> >>>>>>     
> >>>>>> +	if (xe_modparam.guc_log_target)
> >>>>>> +		flags |= FIELD_PREP(GUC_LOG_DESTINATION, xe_modparam.guc_log_target);
> >>>>>> +
> >>>>>>     	return flags;
> >>>>>>     }
> >>>>>>     
> >>>>>> diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
> >>>>>> index 1c4dfafbcd0b..fc8c681819b9 100644
> >>>>>> --- a/drivers/gpu/drm/xe/xe_module.c
> >>>>>> +++ b/drivers/gpu/drm/xe/xe_module.c
> >>>>>> @@ -21,6 +21,7 @@
> >>>>>>     struct xe_modparam xe_modparam = {
> >>>>>>     	.probe_display = true,
> >>>>>>     	.guc_log_level = 3,
> >>>>>> +	.guc_log_target = 0,
> >>>>>>     	.force_probe = CONFIG_DRM_XE_FORCE_PROBE,
> >>>>>>     #ifdef CONFIG_PCI_IOV
> >>>>>>     	.max_vfs = IS_ENABLED(CONFIG_DRM_XE_DEBUG) ? ~0 : 0,
> >>>>>> @@ -45,6 +46,9 @@ MODULE_PARM_DESC(vram_bar_size, "Set the vram bar size (in MiB) - <0=disable-res
> >>>>>>     module_param_named(guc_log_level, xe_modparam.guc_log_level, int, 0600);
> >>>>>>     MODULE_PARM_DESC(guc_log_level, "GuC firmware logging level (0=disable, 1..5=enable with verbosity min..max)");
> >>>>>>     
> >>>>>> +module_param_named(guc_log_target, xe_modparam.guc_log_target, int, 0600);
> >>>>>> +MODULE_PARM_DESC(guc_log_target, "GuC firmware logging target (0=memory [default], 1 = NPK, 2 = memory + NPK)");
> >>>>>> +
> >>>>>>     module_param_named_unsafe(guc_firmware_path, xe_modparam.guc_firmware_path, charp, 0400);
> >>>>>>     MODULE_PARM_DESC(guc_firmware_path,
> >>>>>>     		 "GuC firmware path to use instead of the default one");
> >>>>>> diff --git a/drivers/gpu/drm/xe/xe_module.h b/drivers/gpu/drm/xe/xe_module.h
> >>>>>> index 5a3bfea8b7b4..4d978f6f26b6 100644
> >>>>>> --- a/drivers/gpu/drm/xe/xe_module.h
> >>>>>> +++ b/drivers/gpu/drm/xe/xe_module.h
> >>>>>> @@ -14,6 +14,7 @@ struct xe_modparam {
> >>>>>>     	bool probe_display;
> >>>>>>     	u32 force_vram_bar_size;
> >>>>>>     	int guc_log_level;
> >>>>>> +	int guc_log_target;
> >>>>>>     	char *guc_firmware_path;
> >>>>>>     	char *huc_firmware_path;
> >>>>>>     	char *gsc_firmware_path;
> >>>>>> -- 
> >>>>>> 2.49.0
> >>>>>>
> >>>>>>
> >>
> 
>