[Intel-xe] [PATCH v2 2/2] drm/xe: Add reg read/write trace

Lucas De Marchi lucas.demarchi at intel.com
Wed Jun 21 22:57:47 UTC 2023


On Tue, Jun 20, 2023 at 07:10:43PM -0700, Matt Roper wrote:
>On Tue, Jun 20, 2023 at 05:25:49PM -0700, Lucas De Marchi wrote:
>> On Tue, Jun 20, 2023 at 10:38:50PM +0000, Radhakrishna Sripada wrote:
>> >
>> >
>> > > -----Original Message-----
>> > > From: De Marchi, Lucas <lucas.demarchi at intel.com>
>> > > Sent: Tuesday, June 20, 2023 2:28 PM
>> > > To: Sripada, Radhakrishna <radhakrishna.sripada at intel.com>
>> > > Cc: intel-xe at lists.freedesktop.org; Roper, Matthew D
>> > > <matthew.d.roper at intel.com>
>> > > Subject: Re: [Intel-xe] [PATCH v2 2/2] drm/xe: Add reg read/write trace
>> > >
>> > > On Tue, Jun 20, 2023 at 12:32:29PM -0700, Radhakrishna Sripada wrote:
>> > > >This will help debug register read/writes and provide
>> > > >a way to trace all the mmio transactions.
>> > > >
>> > > >In order to avoid spam from xe_mmio_wait32,
>> > > >xe_mmio_read32_notrace is introduced and used.
>> > >
>> > > feedback given before was to avoid the exported _notrace stuff. If there
>> > > was a justification for it somewhere I probably lost it. Could you
>> > > clarify? If I understand the above paragraph correctly, it's due to
>> > > the fact the xe_mmio_wait32() reads the register multiple times...
>> > > why would we want to avoid the "spam"? If we are reading it multiple
>> > > times, the trace should show it instead of hiding.  If we demonstrate
>> > > there is an actual impact, then xe_mmio_wait32() should be adjusted, not
>> > > the rest of the shim layer in intel_uncore.h
>> > Unless I missed some whaere earlier feedback I received was on the usage of inline in function definitions.
>> > Xe_mmio_wait32 I believe is loosely based out of __intel_wait_for_register(atleast based on compat-headers)
>> > Where during the actual check, no trace version of register read is used and just the final value is traced before
>> > Returning from the function. A similar kind of approach is adapted here.
>> >
>> > If it does not make much value like that then I believe we can use a traceable version
>> > But in case if developers want to have a stack trace for each tracee out put then the spam becomes even more
>> > messy.
>>
>> there was a chat last week about that... maybe I misremembered and you
>> were not part of it.  The trace part should be pretty cheap to be kept
>> everywhere, particularly because it's not enabled by default it's
>> nothing more than a few nop instructions.
>>
>> If we don't trace everything and hide some of them behind functions like
>> xe_mmio_wait32() or intel_uncore_read_fw(), with no indication in their
>> name that they won't trace, then when we are reading the trace we spend
>> a long time tracking back what exactly was happening.
>> When we have the complete trace, filtering should be pretty
>> easy. Eliminating random traces means we lose information that is not
>> possible to recover after the fact.
>
>I could see an argument for avoiding tracepoint i915:i915_reg_rw within
>xe_mmio_wait32() if that function utilized separate tracepoints like
>i915:i915_reg_wait_start and i915:i915_reg_wait_end instead.  That could
>potentially consolidate the repeated trace output down to just two lines
>without losing any information.  We don't have tracepoints like those
>today though.

you mean xe:xe_mmio_wait_start/xe:xe_mmio_wait_end, right?
Yes, that would be good and the non-tracing functions would be local to
xe_mmio rather than exported for everybody to shoot their own foot.

Lucas De Marchi

>
>
>Matt
>
>>
>> >
>> > >
>> > > >
>> > > >v2: Do not use inline for the function declarations.(MattR)
>> > > >
>> > > >Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>> > > >Cc: Matt Roper <matthew.d.roper at intel.com>
>> > > >Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada at intel.com>
>> > > >---
>> > > > .../drm/xe/compat-i915-headers/intel_uncore.h |  8 +-
>> > > > drivers/gpu/drm/xe/xe_mmio.c                  | 80 ++++++++++++++++++-
>> > > > drivers/gpu/drm/xe/xe_mmio.h                  |  5 ++
>> > > > drivers/gpu/drm/xe/xe_trace.h                 | 26 ++++++
>> > > > 4 files changed, 111 insertions(+), 8 deletions(-)
>> > > >
>> > > >diff --git a/drivers/gpu/drm/xe/compat-i915-headers/intel_uncore.h
>> > > b/drivers/gpu/drm/xe/compat-i915-headers/intel_uncore.h
>> > > >index fae6213d26f1..036f3b963a84 100644
>> > > >--- a/drivers/gpu/drm/xe/compat-i915-headers/intel_uncore.h
>> > > >+++ b/drivers/gpu/drm/xe/compat-i915-headers/intel_uncore.h
>> > > >@@ -113,7 +113,7 @@ static inline u32 intel_uncore_read_fw(struct
>> > > fake_uncore *uncore,
>> > > > {
>> > > > 	struct xe_reg reg = XE_REG(i915_mmio_reg_offset(i915_reg));
>> > > >
>> > > >-	return xe_mmio_read32(__fake_uncore_to_gt(uncore), reg);
>> > > >+	return xe_mmio_read32_notrace(__fake_uncore_to_gt(uncore), reg);
>> > >
>> > > how would you even know that intel_uncore_read_fw() is non-traceable?
>> > > ditto for all the other functions in this header.
>> > In i915 intel_uncore_read_fw and other _fw functions translate to raw_uncore_read
>> > And its variants which do not include the GEN6 Header/footer which have the traces
>> > included and hence are the non-trace variants. Unless I am missing something.
>>
>> I meant "as a user of the API". There's no clue whatsoever in
>> intel_uncore_read_fw() that it will not trace. I think the selective
>> trace/no-trace in random places in i915 was a mistake.
>>
>> Lucas De Marchi
>>
>> >
>> > --Radhakrishna(RK) Sripada
>> > >
>> > >
>> > > Lucas De Marchi
>> > >
>> > > > }
>> > > >
>> > > > static inline void intel_uncore_write_fw(struct fake_uncore *uncore,
>> > > >@@ -121,7 +121,7 @@ static inline void intel_uncore_write_fw(struct
>> > > fake_uncore *uncore,
>> > > > {
>> > > > 	struct xe_reg reg = XE_REG(i915_mmio_reg_offset(i915_reg));
>> > > >
>> > > >-	xe_mmio_write32(__fake_uncore_to_gt(uncore), reg, val);
>> > > >+	xe_mmio_write32_notrace(__fake_uncore_to_gt(uncore), reg, val);
>> > > > }
>> > > >
>> > > > static inline u32 intel_uncore_read_notrace(struct fake_uncore *uncore,
>> > > >@@ -129,7 +129,7 @@ static inline u32 intel_uncore_read_notrace(struct
>> > > fake_uncore *uncore,
>> > > > {
>> > > > 	struct xe_reg reg = XE_REG(i915_mmio_reg_offset(i915_reg));
>> > > >
>> > > >-	return xe_mmio_read32(__fake_uncore_to_gt(uncore), reg);
>> > > >+	return xe_mmio_read32_notrace(__fake_uncore_to_gt(uncore), reg);
>> > > > }
>> > > >
>> > > > static inline void intel_uncore_write_notrace(struct fake_uncore *uncore,
>> > > >@@ -137,7 +137,7 @@ static inline void intel_uncore_write_notrace(struct
>> > > fake_uncore *uncore,
>> > > > {
>> > > > 	struct xe_reg reg = XE_REG(i915_mmio_reg_offset(i915_reg));
>> > > >
>> > > >-	xe_mmio_write32(__fake_uncore_to_gt(uncore), reg, val);
>> > > >+	xe_mmio_write32_notrace(__fake_uncore_to_gt(uncore), reg, val);
>> > > > }
>> > > >
>> > > > #endif /* __INTEL_UNCORE_H__ */
>> > > >diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
>> > > >index 78c7f839f783..ef2192a2d060 100644
>> > > >--- a/drivers/gpu/drm/xe/xe_mmio.c
>> > > >+++ b/drivers/gpu/drm/xe/xe_mmio.c
>> > > >@@ -19,6 +19,7 @@
>> > > > #include "xe_gt_mcr.h"
>> > > > #include "xe_macros.h"
>> > > > #include "xe_module.h"
>> > > >+#include "xe_trace.h"
>> > > >
>> > > > #define XEHP_MTCFG_ADDR		XE_REG(0x101800)
>> > > > #define TILE_COUNT		REG_GENMASK(15, 8)
>> > > >@@ -429,7 +430,7 @@ static const struct xe_reg mmio_read_whitelist[] = {
>> > > > 	RING_TIMESTAMP(RENDER_RING_BASE),
>> > > > };
>> > > >
>> > > >-u8 xe_mmio_read8(struct xe_gt *gt, struct xe_reg reg)
>> > > >+u8 xe_mmio_read8_notrace(struct xe_gt *gt, struct xe_reg reg)
>> > > > {
>> > > > 	struct xe_tile *tile = gt_to_tile(gt);
>> > > >
>> > > >@@ -439,6 +440,31 @@ u8 xe_mmio_read8(struct xe_gt *gt, struct xe_reg
>> > > reg)
>> > > > 	return readb(tile->mmio.regs + reg.addr);
>> > > > }
>> > > >
>> > > >+u8 xe_mmio_read8(struct xe_gt *gt, struct xe_reg reg)
>> > > >+{
>> > > >+	struct xe_tile *tile = gt_to_tile(gt);
>> > > >+	u8 val;
>> > > >+
>> > > >+	if (reg.addr < gt->mmio.adj_limit)
>> > > >+		reg.addr += gt->mmio.adj_offset;
>> > > >+
>> > > >+	val = readb(tile->mmio.regs + reg.addr);
>> > > >+
>> > > >+	trace_xe_reg_rw(false, &reg, val, sizeof(val));
>> > > >+
>> > > >+	return val;
>> > > >+}
>> > > >+
>> > > >+void xe_mmio_write32_notrace(struct xe_gt *gt, struct xe_reg reg, u32 val)
>> > > >+{
>> > > >+	struct xe_tile *tile = gt_to_tile(gt);
>> > > >+
>> > > >+	if (reg.addr < gt->mmio.adj_limit)
>> > > >+		reg.addr += gt->mmio.adj_offset;
>> > > >+
>> > > >+	writel(val, tile->mmio.regs + reg.addr);
>> > > >+}
>> > > >+
>> > > > void xe_mmio_write32(struct xe_gt *gt, struct xe_reg reg, u32 val)
>> > > > {
>> > > > 	struct xe_tile *tile = gt_to_tile(gt);
>> > > >@@ -446,10 +472,12 @@ void xe_mmio_write32(struct xe_gt *gt, struct
>> > > xe_reg reg, u32 val)
>> > > > 	if (reg.addr < gt->mmio.adj_limit)
>> > > > 		reg.addr += gt->mmio.adj_offset;
>> > > >
>> > > >+	trace_xe_reg_rw(true, &reg, val, sizeof(val));
>> > > >+
>> > > > 	writel(val, tile->mmio.regs + reg.addr);
>> > > > }
>> > > >
>> > > >-u32 xe_mmio_read32(struct xe_gt *gt, struct xe_reg reg)
>> > > >+u32 xe_mmio_read32_notrace(struct xe_gt *gt, struct xe_reg reg)
>> > > > {
>> > > > 	struct xe_tile *tile = gt_to_tile(gt);
>> > > >
>> > > >@@ -459,6 +487,21 @@ u32 xe_mmio_read32(struct xe_gt *gt, struct xe_reg
>> > > reg)
>> > > > 	return readl(tile->mmio.regs + reg.addr);
>> > > > }
>> > > >
>> > > >+u32 xe_mmio_read32(struct xe_gt *gt, struct xe_reg reg)
>> > > >+{
>> > > >+	struct xe_tile *tile = gt_to_tile(gt);
>> > > >+	u32 val;
>> > > >+
>> > > >+	if (reg.addr < gt->mmio.adj_limit)
>> > > >+		reg.addr += gt->mmio.adj_offset;
>> > > >+
>> > > >+	val = readl(tile->mmio.regs + reg.addr);
>> > > >+
>> > > >+	trace_xe_reg_rw(false, &reg, val, sizeof(val));
>> > > >+
>> > > >+	return val;
>> > > >+}
>> > > >+
>> > > > u32 xe_mmio_rmw32(struct xe_gt *gt, struct xe_reg reg, u32 clr, u32 set)
>> > > > {
>> > > > 	u32 old, reg_val;
>> > > >@@ -470,6 +513,16 @@ u32 xe_mmio_rmw32(struct xe_gt *gt, struct xe_reg
>> > > reg, u32 clr, u32 set)
>> > > > 	return old;
>> > > > }
>> > > >
>> > > >+void xe_mmio_write64_notrace(struct xe_gt *gt, struct xe_reg reg, u64 val)
>> > > >+{
>> > > >+	struct xe_tile *tile = gt_to_tile(gt);
>> > > >+
>> > > >+	if (reg.addr < gt->mmio.adj_limit)
>> > > >+		reg.addr += gt->mmio.adj_offset;
>> > > >+
>> > > >+	writeq(val, tile->mmio.regs + reg.addr);
>> > > >+}
>> > > >+
>> > > > void xe_mmio_write64(struct xe_gt *gt, struct xe_reg reg, u64 val)
>> > > > {
>> > > > 	struct xe_tile *tile = gt_to_tile(gt);
>> > > >@@ -477,10 +530,12 @@ void xe_mmio_write64(struct xe_gt *gt, struct
>> > > xe_reg reg, u64 val)
>> > > > 	if (reg.addr < gt->mmio.adj_limit)
>> > > > 		reg.addr += gt->mmio.adj_offset;
>> > > >
>> > > >+	trace_xe_reg_rw(true, &reg, val, sizeof(val));
>> > > >+
>> > > > 	writeq(val, tile->mmio.regs + reg.addr);
>> > > > }
>> > > >
>> > > >-u64 xe_mmio_read64(struct xe_gt *gt, struct xe_reg reg)
>> > > >+u64 xe_mmio_read64_notrace(struct xe_gt *gt, struct xe_reg reg)
>> > > > {
>> > > > 	struct xe_tile *tile = gt_to_tile(gt);
>> > > >
>> > > >@@ -490,6 +545,21 @@ u64 xe_mmio_read64(struct xe_gt *gt, struct xe_reg
>> > > reg)
>> > > > 	return readq(tile->mmio.regs + reg.addr);
>> > > > }
>> > > >
>> > > >+u64 xe_mmio_read64(struct xe_gt *gt, struct xe_reg reg)
>> > > >+{
>> > > >+	struct xe_tile *tile = gt_to_tile(gt);
>> > > >+	u64 val;
>> > > >+
>> > > >+	if (reg.addr < gt->mmio.adj_limit)
>> > > >+		reg.addr += gt->mmio.adj_offset;
>> > > >+
>> > > >+	val = readq(tile->mmio.regs + reg.addr);
>> > > >+
>> > > >+	trace_xe_reg_rw(false, &reg, val, sizeof(val));
>> > > >+
>> > > >+	return val;
>> > > >+}
>> > > >+
>> > > > int xe_mmio_write32_and_verify(struct xe_gt *gt, struct xe_reg reg, u32 val,
>> > > > 			       u32 mask, u32 eval)
>> > > > {
>> > > >@@ -511,7 +581,7 @@ int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg
>> > > reg, u32 val, u32 mask,
>> > > > 	u32 read;
>> > > >
>> > > > 	for (;;) {
>> > > >-		read = xe_mmio_read32(gt, reg);
>> > > >+		read = xe_mmio_read32_notrace(gt, reg);
>> > > > 		if ((read & mask) == val) {
>> > > > 			ret = 0;
>> > > > 			break;
>> > > >@@ -531,6 +601,8 @@ int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg
>> > > reg, u32 val, u32 mask,
>> > > > 		wait <<= 1;
>> > > > 	}
>> > > >
>> > > >+	trace_xe_reg_rw(false, &reg, read, sizeof(read));
>> > > >+
>> > > > 	if (out_val)
>> > > > 		*out_val = read;
>> > > >
>> > > >diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
>> > > >index c06fcac2cc5b..02364ba9dda2 100644
>> > > >--- a/drivers/gpu/drm/xe/xe_mmio.h
>> > > >+++ b/drivers/gpu/drm/xe/xe_mmio.h
>> > > >@@ -20,11 +20,16 @@ struct xe_device;
>> > > > #define GEN12_LMEM_BAR		2
>> > > >
>> > > > int xe_mmio_init(struct xe_device *xe);
>> > > >+u8 xe_mmio_read8_notrace(struct xe_gt *gt, struct xe_reg reg);
>> > > > u8 xe_mmio_read8(struct xe_gt *gt, struct xe_reg reg);
>> > > >+void xe_mmio_write32_notrace(struct xe_gt *gt, struct xe_reg reg, u32 val);
>> > > > void xe_mmio_write32(struct xe_gt *gt, struct xe_reg reg, u32 val);
>> > > >+u32 xe_mmio_read32_notrace(struct xe_gt *gt, struct xe_reg reg);
>> > > > u32 xe_mmio_read32(struct xe_gt *gt, struct xe_reg reg);
>> > > > u32 xe_mmio_rmw32(struct xe_gt *gt, struct xe_reg reg, u32 clr, u32 set);
>> > > >+void xe_mmio_write64_notrace(struct xe_gt *gt, struct xe_reg reg, u64 val);
>> > > > void xe_mmio_write64(struct xe_gt *gt, struct xe_reg reg, u64 val);
>> > > >+u64 xe_mmio_read64_notrace(struct xe_gt *gt, struct xe_reg reg);
>> > > > u64 xe_mmio_read64(struct xe_gt *gt, struct xe_reg reg);
>> > > > int xe_mmio_write32_and_verify(struct xe_gt *gt, struct xe_reg reg, u32 val,
>> > > > 			       u32 mask, u32 eval);
>> > > >diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
>> > > >index 2f8eb7ebe9a7..a5d514c190a2 100644
>> > > >--- a/drivers/gpu/drm/xe/xe_trace.h
>> > > >+++ b/drivers/gpu/drm/xe/xe_trace.h
>> > > >@@ -12,6 +12,7 @@
>> > > > #include <linux/tracepoint.h>
>> > > > #include <linux/types.h>
>> > > >
>> > > >+#include "regs/xe_reg_defs.h"
>> > > > #include "xe_bo_types.h"
>> > > > #include "xe_engine_types.h"
>> > > > #include "xe_gt_tlb_invalidation_types.h"
>> > > >@@ -507,6 +508,31 @@ DEFINE_EVENT(xe_vm, xe_vm_rebind_worker_exit,
>> > > > 	     TP_ARGS(vm)
>> > > > );
>> > > >
>> > > >+TRACE_EVENT(xe_reg_rw,
>> > > >+	    TP_PROTO(bool write, struct xe_reg *reg, u64 val, int len),
>> > > >+	    TP_ARGS(write, reg, val, len),
>> > > >+
>> > > >+	    TP_STRUCT__entry(
>> > > >+		    __field(u64, val)
>> > > >+		    __field(struct xe_reg *, reg)
>> > > >+		    __field(u16, write)
>> > > >+		    __field(u16, len)
>> > > >+		    ),
>> > > >+
>> > > >+	    TP_fast_assign(
>> > > >+		   __entry->val = (u64)val;
>> > > >+		   __entry->reg = reg;
>> > > >+		   __entry->write = write;
>> > > >+		   __entry->len = len;
>> > > >+		),
>> > > >+
>> > > >+	    TP_printk("%s reg=0x%x, len=%d, val=(0x%x, 0x%x)",
>> > > >+		      __entry->write ? "write" : "read",
>> > > >+		      __entry->reg->addr, __entry->len,
>> > > >+		      (u32)(__entry->val & 0xffffffff),
>> > > >+		      (u32)(__entry->val >> 32))
>> > > >+);
>> > > >+
>> > > > TRACE_EVENT(xe_guc_ct_h2g_flow_control,
>> > > > 	    TP_PROTO(u32 _head, u32 _tail, u32 size, u32 space, u32 len),
>> > > > 	    TP_ARGS(_head, _tail, size, space, len),
>> > > >--
>> > > >2.34.1
>> > > >
>
>-- 
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation


More information about the Intel-xe mailing list