[PATCH 2/2] drm/xe/guc: Add support for NPK as a GuC log target

Thu Jun 12 18:47:00 UTC 2025

On 6/12/2025 11:27 AM, Cavitt, Jonathan wrote:
> ----Original Message-----
> From: Harrison, John C <john.c.harrison at intel.com>
> Sent: Thursday, June 12, 2025 10:50 AM
> To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; Intel-Xe at Lists.FreeDesktop.Org
> Subject: Re: [PATCH 2/2] drm/xe/guc: Add support for NPK as a GuC log target
>> On 6/12/2025 7:32 AM, Cavitt, Jonathan wrote:
>>> -----Original Message-----
>>> From: Harrison, John C <john.c.harrison at intel.com>
>>> Sent: Wednesday, June 11, 2025 4:57 PM
>>> To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; Intel-Xe at Lists.FreeDesktop.Org
>>> Subject: Re: [PATCH 2/2] drm/xe/guc: Add support for NPK as a GuC log target
>>>> On 6/11/2025 3:04 PM, Cavitt, Jonathan wrote:
>>>>> -----Original Message-----
>>>>> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of John.C.Harrison at Intel.com
>>>>> Sent: Wednesday, June 11, 2025 2:06 PM
>>>>> To: Intel-Xe at Lists.FreeDesktop.Org
>>>>> Cc: Harrison, John C <john.c.harrison at intel.com>
>>>>> Subject: [PATCH 2/2] drm/xe/guc: Add support for NPK as a GuC log target
>>>>>> From: John Harrison <John.C.Harrison at Intel.com>
>>>>>>
>>>>>> The GuC has an option to write log data via NPK. This is basically a
>>>>>> magic IO address that GuC writes arbitrary data to and which can be
>>>>>> logged by a suitable hardware logger. This can allow retrieval of the
>>>>>> GuC log in hardware debug environments even when the system as a whole
>>>>>> dies horribly.
>>>>>>
>>>>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>>>> So, this is basically a new modparam value that redirects GuC logs to
>>>>> a specific IO address?  I take it guc_log_target = 2 is the default value, and
>>>>> guc_log_target = 1 would print the logs to stdout, then?  I'd ask why we
>>>>> use 0 as a default value and not just default to 2 all the time, but I think I
>>>>> already know why (we need to guard against guc_log_target = 0 anyways
>>>>> to prevent printing to stdin).
>>>> Um, read the patch - "(0=memory [default], 1 = NPK, 2 = memory + NPK)".
>>>> The default is zero. And no, nothing prints to stdout. This is about
>>>> hardware level debugging. It has nothing to do with stdin/stdout/stderr.
>>>> Those concepts do not exist in hardware nor in the KMD. If you send the
>>>> GuC log to the NPK target then you need a hardware debugger (JTAG, etc.)
>>>> to read it, as described in the commit message.
>>> Ah, okay.  When I read "This is basically a magic IO address that GuC writes
>>> arbitrary data to", I thought that was indicating that we're writing to the
>>> in/out/err IO addresses, and that the data being logged "by a suitable
>> You say "the in/out/err IO addresses" like there is such a thing.
>> In/out/err generally refers to stdin/stdout/stderr, being the default
>> first three file handles of a unix process. File handles are not IO
>> addresses. Such file handles also do not exist in the kernel. And they
>> absolutely do not exist inside the GT hardware. My comment was referring
> Okay, right.  The last time the distinction between I/O as a device concept
> and I/O as an interface concept was relevant to me was about 8 years ago,
> so I can understand how I got confused there.
>
>> to memory mapped IO addresses, i.e. hardware registers. On a normal
>> system, there is nothing connected to said hardware so the logged output
>> goes nowhere. However, if you have a hardware debugger attached to the
>> system then it can trap those accesses and record the log.
> You know, sometimes I forget that the intended customers for these
> products are major companies that end up shoving these cards into
> giant server racks and not PC users with screens and keyboards.
>
>>> hardware logger" was occurring separately.  I guess I also thought NPK
>>> was a writing protocol and not a hardware address.
>> NPK is neither a protocol nor an address. It is a block of silicon
>> called North Peak, also known as the Intel Trace Hub.
> "The GuC has an option to write log data via NPK. This is basically a magic IO address..."
>
> If NPK is "a block of silicon" and not an IO address, then perhaps this would be better
> worded as:
>
> "The GuC has the option to write log data to NPK, which is basically a block of silicon..."
NPK is a block which implements a register which is written to by GuC as 
an MMIO access. The point of saying 'basically' is because this is very 
simple view - GuC writes to a register. All else is irrelevant. If you 
want the full details then there is a white paper about it. But 
generally speaking, if you don't know already then it isn't relevant to 
you because you don't have the hundreds of thousands of dollars of 
hardware required to make use of it. So complicated descriptions are 
pointless.

>>> It didn't occur to me that we'd need to directly write the logs to the
>>> hardware logger in cases of catastrophic failure because we already have
>>> methods of streaming the logs directly via the serial ports.  Though I
>>> suppose that we're talking about different "logs" at this point?
>> I don't know what logs you are talking about. This patch is quite
>> clearly only talking about the GuC log. Which is generally accessible
>> via debugfs snapshots or as part of a devcoredump capture.
>> It is not ever 'streamed directly via the serial ports'.
> I was referring to the dmesg logs at the time, though I will admit that I forgot
> there were classes of logs that never get printed to dmesg.  I don't
> personally agree with the practice and think that all relevant logs should
> be printed to dmesg if possible, even if only at certain debug levels or
Sure, lets write to dmesg from inside the GT hardware. I'm sure we can 
add that in to the next product...

DMesg is for super important kernel messages. Sometimes, it is the only 
way to debug kernel related problems because the system dies and 
userland is no longer functional. But it is absolutely not meant to be 
the default output method for all possible logging systems. For example, 
the ftrace mechanism exists precisely because dmesg is not the right way 
to log many things.

> upon direct request.  However, I can at least see why we'd want to store those
> separately in the NPK silicon block in case of catastrophic failure given the files
> they normally get saved to are wiped on system reset.
Nothing is stored. NPK is simply a transport mechanism here. If there is 
nothing connected to it then it goes nowhere. /dev/null if you prefer. 
And there is no file to get wiped. The normal target for GuC logging is 
a memory buffer. Any access via a debugfs or sysfs 'file' is just code 
being run inside the kernel to read that memory buffer and return it to 
the user.

John.

>
>> John.
>>
>>> The reviewed-by still stands.
>>> -Jonathan Cavitt
>>>
>>>>> I also take it this is modified on boot by, for example, writing
>>>>> "xe.guc_log_target=1" to CMDLINE_LINUX_DEFAULT as a part of the grub file.
>>>> That is generally how module parameters work. You can also set via
>>>> modprobe.conf files as long as the Xe driver is a module and not
>>>> compiled in.
>>>>
>>>> John.
>>>>
>>>>> Yeah, seems good.
>>>>> Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
>>>>> -Jonathan Cavitt
>>>>>
>>>>>> ---
>>>>>>     drivers/gpu/drm/xe/xe_guc.c    | 4 ++++
>>>>>>     drivers/gpu/drm/xe/xe_module.c | 4 ++++
>>>>>>     drivers/gpu/drm/xe/xe_module.h | 1 +
>>>>>>     3 files changed, 9 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
>>>>>> index e16d19b44bcc..9c0e3113f7d5 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_guc.c
>>>>>> +++ b/drivers/gpu/drm/xe/xe_guc.c
>>>>>> @@ -35,6 +35,7 @@
>>>>>>     #include "xe_guc_submit.h"
>>>>>>     #include "xe_memirq.h"
>>>>>>     #include "xe_mmio.h"
>>>>>> +#include "xe_module.h"
>>>>>>     #include "xe_platform_types.h"
>>>>>>     #include "xe_sriov.h"
>>>>>>     #include "xe_uc.h"
>>>>>> @@ -74,6 +75,9 @@ static u32 guc_ctl_debug_flags(struct xe_guc *guc)
>>>>>>     	else
>>>>>>     		flags |= FIELD_PREP(GUC_LOG_VERBOSITY, GUC_LOG_LEVEL_TO_VERBOSITY(level));
>>>>>>     
>>>>>> +	if (xe_modparam.guc_log_target)
>>>>>> +		flags |= FIELD_PREP(GUC_LOG_DESTINATION, xe_modparam.guc_log_target);
>>>>>> +
>>>>>>     	return flags;
>>>>>>     }
>>>>>>     
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
>>>>>> index 1c4dfafbcd0b..fc8c681819b9 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_module.c
>>>>>> +++ b/drivers/gpu/drm/xe/xe_module.c
>>>>>> @@ -21,6 +21,7 @@
>>>>>>     struct xe_modparam xe_modparam = {
>>>>>>     	.probe_display = true,
>>>>>>     	.guc_log_level = 3,
>>>>>> +	.guc_log_target = 0,
>>>>>>     	.force_probe = CONFIG_DRM_XE_FORCE_PROBE,
>>>>>>     #ifdef CONFIG_PCI_IOV
>>>>>>     	.max_vfs = IS_ENABLED(CONFIG_DRM_XE_DEBUG) ? ~0 : 0,
>>>>>> @@ -45,6 +46,9 @@ MODULE_PARM_DESC(vram_bar_size, "Set the vram bar size (in MiB) - <0=disable-res
>>>>>>     module_param_named(guc_log_level, xe_modparam.guc_log_level, int, 0600);
>>>>>>     MODULE_PARM_DESC(guc_log_level, "GuC firmware logging level (0=disable, 1..5=enable with verbosity min..max)");
>>>>>>     
>>>>>> +module_param_named(guc_log_target, xe_modparam.guc_log_target, int, 0600);
>>>>>> +MODULE_PARM_DESC(guc_log_target, "GuC firmware logging target (0=memory [default], 1 = NPK, 2 = memory + NPK)");
>>>>>> +
>>>>>>     module_param_named_unsafe(guc_firmware_path, xe_modparam.guc_firmware_path, charp, 0400);
>>>>>>     MODULE_PARM_DESC(guc_firmware_path,
>>>>>>     		 "GuC firmware path to use instead of the default one");
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_module.h b/drivers/gpu/drm/xe/xe_module.h
>>>>>> index 5a3bfea8b7b4..4d978f6f26b6 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_module.h
>>>>>> +++ b/drivers/gpu/drm/xe/xe_module.h
>>>>>> @@ -14,6 +14,7 @@ struct xe_modparam {
>>>>>>     	bool probe_display;
>>>>>>     	u32 force_vram_bar_size;
>>>>>>     	int guc_log_level;
>>>>>> +	int guc_log_target;
>>>>>>     	char *guc_firmware_path;
>>>>>>     	char *huc_firmware_path;
>>>>>>     	char *gsc_firmware_path;
>>>>>> -- 
>>>>>> 2.49.0
>>>>>>
>>>>>>
>>