[PATCH 2/2] drm/xe/guc: Add support for NPK as a GuC log target
Cavitt, Jonathan
jonathan.cavitt at intel.com
Thu Jun 12 18:27:16 UTC 2025
----Original Message-----
From: Harrison, John C <john.c.harrison at intel.com>
Sent: Thursday, June 12, 2025 10:50 AM
To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; Intel-Xe at Lists.FreeDesktop.Org
Subject: Re: [PATCH 2/2] drm/xe/guc: Add support for NPK as a GuC log target
>
> On 6/12/2025 7:32 AM, Cavitt, Jonathan wrote:
> > -----Original Message-----
> > From: Harrison, John C <john.c.harrison at intel.com>
> > Sent: Wednesday, June 11, 2025 4:57 PM
> > To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; Intel-Xe at Lists.FreeDesktop.Org
> > Subject: Re: [PATCH 2/2] drm/xe/guc: Add support for NPK as a GuC log target
> >> On 6/11/2025 3:04 PM, Cavitt, Jonathan wrote:
> >>> -----Original Message-----
> >>> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of John.C.Harrison at Intel.com
> >>> Sent: Wednesday, June 11, 2025 2:06 PM
> >>> To: Intel-Xe at Lists.FreeDesktop.Org
> >>> Cc: Harrison, John C <john.c.harrison at intel.com>
> >>> Subject: [PATCH 2/2] drm/xe/guc: Add support for NPK as a GuC log target
> >>>> From: John Harrison <John.C.Harrison at Intel.com>
> >>>>
> >>>> The GuC has an option to write log data via NPK. This is basically a
> >>>> magic IO address that GuC writes arbitrary data to and which can be
> >>>> logged by a suitable hardware logger. This can allow retrieval of the
> >>>> GuC log in hardware debug environments even when the system as a whole
> >>>> dies horribly.
> >>>>
> >>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> >>> So, this is basically a new modparam value that redirects GuC logs to
> >>> a specific IO address? I take it guc_log_target = 2 is the default value, and
> >>> guc_log_target = 1 would print the logs to stdout, then? I'd ask why we
> >>> use 0 as a default value and not just default to 2 all the time, but I think I
> >>> already know why (we need to guard against guc_log_target = 0 anyways
> >>> to prevent printing to stdin).
> >> Um, read the patch - "(0=memory [default], 1 = NPK, 2 = memory + NPK)".
> >> The default is zero. And no, nothing prints to stdout. This is about
> >> hardware level debugging. It has nothing to do with stdin/stdout/stderr.
> >> Those concepts do not exist in hardware nor in the KMD. If you send the
> >> GuC log to the NPK target then you need a hardware debugger (JTAG, etc.)
> >> to read it, as described in the commit message.
> > Ah, okay. When I read "This is basically a magic IO address that GuC writes
> > arbitrary data to", I thought that was indicating that we're writing to the
> > in/out/err IO addresses, and that the data being logged "by a suitable
> You say "the in/out/err IO addresses" like there is such a thing.
> In/out/err generally refers to stdin/stdout/stderr, being the default
> first three file handles of a unix process. File handles are not IO
> addresses. Such file handles also do not exist in the kernel. And they
> absolutely do not exist inside the GT hardware. My comment was referring
Okay, right. The last time the distinction between I/O as a device concept
and I/O as an interface concept was relevant to me was about 8 years ago,
so I can understand how I got confused there.
> to memory mapped IO addresses, i.e. hardware registers. On a normal
> system, there is nothing connected to said hardware so the logged output
> goes nowhere. However, if you have a hardware debugger attached to the
> system then it can trap those accesses and record the log.
You know, sometimes I forget that the intended customers for these
products are major companies that end up shoving these cards into
giant server racks and not PC users with screens and keyboards.
>
> > hardware logger" was occurring separately. I guess I also thought NPK
> > was a writing protocol and not a hardware address.
> NPK is neither a protocol nor an address. It is a block of silicon
> called North Peak, also known as the Intel Trace Hub.
"The GuC has an option to write log data via NPK. This is basically a magic IO address..."
If NPK is "a block of silicon" and not an IO address, then perhaps this would be better
worded as:
"The GuC has the option to write log data to NPK, which is basically a block of silicon..."
>
> >
> > It didn't occur to me that we'd need to directly write the logs to the
> > hardware logger in cases of catastrophic failure because we already have
> > methods of streaming the logs directly via the serial ports. Though I
> > suppose that we're talking about different "logs" at this point?
> I don't know what logs you are talking about. This patch is quite
> clearly only talking about the GuC log. Which is generally accessible
> via debugfs snapshots or as part of a devcoredump capture.
> It is not ever 'streamed directly via the serial ports'.
I was referring to the dmesg logs at the time, though I will admit that I forgot
there were classes of logs that never get printed to dmesg. I don't
personally agree with the practice and think that all relevant logs should
be printed to dmesg if possible, even if only at certain debug levels or
upon direct request. However, I can at least see why we'd want to store those
separately in the NPK silicon block in case of catastrophic failure given the files
they normally get saved to are wiped on system reset.
>
> John.
>
> >
> > The reviewed-by still stands.
> > -Jonathan Cavitt
> >
> >>> I also take it this is modified on boot by, for example, writing
> >>> "xe.guc_log_target=1" to CMDLINE_LINUX_DEFAULT as a part of the grub file.
> >> That is generally how module parameters work. You can also set via
> >> modprobe.conf files as long as the Xe driver is a module and not
> >> compiled in.
> >>
> >> John.
> >>
> >>> Yeah, seems good.
> >>> Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> >>> -Jonathan Cavitt
> >>>
> >>>> ---
> >>>> drivers/gpu/drm/xe/xe_guc.c | 4 ++++
> >>>> drivers/gpu/drm/xe/xe_module.c | 4 ++++
> >>>> drivers/gpu/drm/xe/xe_module.h | 1 +
> >>>> 3 files changed, 9 insertions(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> >>>> index e16d19b44bcc..9c0e3113f7d5 100644
> >>>> --- a/drivers/gpu/drm/xe/xe_guc.c
> >>>> +++ b/drivers/gpu/drm/xe/xe_guc.c
> >>>> @@ -35,6 +35,7 @@
> >>>> #include "xe_guc_submit.h"
> >>>> #include "xe_memirq.h"
> >>>> #include "xe_mmio.h"
> >>>> +#include "xe_module.h"
> >>>> #include "xe_platform_types.h"
> >>>> #include "xe_sriov.h"
> >>>> #include "xe_uc.h"
> >>>> @@ -74,6 +75,9 @@ static u32 guc_ctl_debug_flags(struct xe_guc *guc)
> >>>> else
> >>>> flags |= FIELD_PREP(GUC_LOG_VERBOSITY, GUC_LOG_LEVEL_TO_VERBOSITY(level));
> >>>>
> >>>> + if (xe_modparam.guc_log_target)
> >>>> + flags |= FIELD_PREP(GUC_LOG_DESTINATION, xe_modparam.guc_log_target);
> >>>> +
> >>>> return flags;
> >>>> }
> >>>>
> >>>> diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
> >>>> index 1c4dfafbcd0b..fc8c681819b9 100644
> >>>> --- a/drivers/gpu/drm/xe/xe_module.c
> >>>> +++ b/drivers/gpu/drm/xe/xe_module.c
> >>>> @@ -21,6 +21,7 @@
> >>>> struct xe_modparam xe_modparam = {
> >>>> .probe_display = true,
> >>>> .guc_log_level = 3,
> >>>> + .guc_log_target = 0,
> >>>> .force_probe = CONFIG_DRM_XE_FORCE_PROBE,
> >>>> #ifdef CONFIG_PCI_IOV
> >>>> .max_vfs = IS_ENABLED(CONFIG_DRM_XE_DEBUG) ? ~0 : 0,
> >>>> @@ -45,6 +46,9 @@ MODULE_PARM_DESC(vram_bar_size, "Set the vram bar size (in MiB) - <0=disable-res
> >>>> module_param_named(guc_log_level, xe_modparam.guc_log_level, int, 0600);
> >>>> MODULE_PARM_DESC(guc_log_level, "GuC firmware logging level (0=disable, 1..5=enable with verbosity min..max)");
> >>>>
> >>>> +module_param_named(guc_log_target, xe_modparam.guc_log_target, int, 0600);
> >>>> +MODULE_PARM_DESC(guc_log_target, "GuC firmware logging target (0=memory [default], 1 = NPK, 2 = memory + NPK)");
> >>>> +
> >>>> module_param_named_unsafe(guc_firmware_path, xe_modparam.guc_firmware_path, charp, 0400);
> >>>> MODULE_PARM_DESC(guc_firmware_path,
> >>>> "GuC firmware path to use instead of the default one");
> >>>> diff --git a/drivers/gpu/drm/xe/xe_module.h b/drivers/gpu/drm/xe/xe_module.h
> >>>> index 5a3bfea8b7b4..4d978f6f26b6 100644
> >>>> --- a/drivers/gpu/drm/xe/xe_module.h
> >>>> +++ b/drivers/gpu/drm/xe/xe_module.h
> >>>> @@ -14,6 +14,7 @@ struct xe_modparam {
> >>>> bool probe_display;
> >>>> u32 force_vram_bar_size;
> >>>> int guc_log_level;
> >>>> + int guc_log_target;
> >>>> char *guc_firmware_path;
> >>>> char *huc_firmware_path;
> >>>> char *gsc_firmware_path;
> >>>> --
> >>>> 2.49.0
> >>>>
> >>>>
> >>
>
>
More information about the Intel-xe
mailing list