[Intel-gfx] [PATCH 3/3] drm/i915: reorganize the unclaimed register detection code

Paulo Zanoni przanoni at gmail.com
Wed May 21 21:23:22 CEST 2014


From: Paulo Zanoni <paulo.r.zanoni at intel.com>

The current code only runs when we do an I915_WRITE operation. It
checks if the unclaimed register flag is set before we do the
operation, and then it checks it again after we do the operation. This
double check allows us to find out if the I915_WRITE operation in
question is the bad one, or if some previous code is the bad one. When
it finds a problem, our code uses DRM_ERROR to signal it.

The good thing about the current code is that it detects the problem,
so at least we can know we did something wrong. The problem is that
even though we find the problem, we don't really have much information
to actually debug it. So whenever I see one of these DRM_ERROR
messages on my systems, the first thing I do is apply a patch to
change the DRM_ERROR to a WARN and also check for unclaimed registers
on I915_READ operations. This local patch makes things even slower,
but it usually helps a lot in finding the bad code.

The first point here is that since the current code is only useful to
detect whether we have a problem or not, but it is not really good to
find the cause of the problem, I don't think we should be checking
both before and after every I915_WRITE operation: just doing the check
once should be enough for us to quickly detect problems. With this
change, the code that runs by default for every single user will only
do 1 read operation for every single I915_WRITE, instead of 2. This
patch does this change.

The second point is that the local patch I have should be upstream,
but since it makes things slower it should be disabled by default. So
I added the i915.mmio_debug option to enable it.

So after this patch, this is what will happen:
 - By default, we will try to detect unclaimed registers once after
   every I915_WRITE operation. Previously we tried twice for every
   I915_WRITE.
 - When we find an unclaimed register we will still print a DRM_ERROR
   message, but we will now tell the user to try again with
   i915.mmio_debug=1.
 - When we use i915.mmio_debug=1 we will try to find unclaimed
   registers both before and after every I915_READ and I915_WRITE
   operation, and we will print stack traces in case we find them.
   This should really help locating the exact point of the bad code
   (or at least finding out that i915.ko is not the problem).

This commit also opens space for really-slow register debugging
operations on other platforms. In theory we can now add lots and lots
of debug code behind i915.mmio_debug, enable this option on our tests,
and catch more problems.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni at intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h     |  1 +
 drivers/gpu/drm/i915/i915_params.c  |  6 ++++
 drivers/gpu/drm/i915/intel_uncore.c | 69 +++++++++++++++++++++++++++++++++----
 3 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8daf4f7..60cf78f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2045,6 +2045,7 @@ struct i915_params {
 	bool reset;
 	bool disable_display;
 	bool disable_vtd_wa;
+	bool mmio_debug;
 };
 extern struct i915_params i915 __read_mostly;
 
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index d05a2af..d9fa00f 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -48,6 +48,7 @@ struct i915_params i915 __read_mostly = {
 	.disable_display = 0,
 	.enable_cmd_parser = 1,
 	.disable_vtd_wa = 0,
+	.mmio_debug = 0,
 };
 
 module_param_named(modeset, i915.modeset, int, 0400);
@@ -156,3 +157,8 @@ MODULE_PARM_DESC(disable_vtd_wa, "Disable all VT-d workarounds (default: false)"
 module_param_named(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
 MODULE_PARM_DESC(enable_cmd_parser,
 		 "Enable command parsing (1=enabled [default], 0=disabled)");
+
+module_param_named(mmio_debug, i915.mmio_debug, bool, 0600);
+MODULE_PARM_DESC(disable_vtd_wa,
+	"Enable the MMIO debug code (default: false). This may negatively "
+	"affect performance.");
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index cd0d6e2..b72f5f5 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -506,21 +506,73 @@ ilk_dummy_write(struct drm_i915_private *dev_priv)
 	__raw_i915_write32(dev_priv, MI_MODE, 0);
 }
 
+/**
+ * hsw_unclaimed_reg_debug - tries to find unclaimed registers
+ * @dev_priv: device private data
+ * @reg: register we're about to touch or just have touched
+ * @read: are we reading or writing a register now?
+ * @before: are we running this before or after we touch the register?
+ *
+ * This function tries to detect unclaimed registers and provide as much
+ * information as possible. It will only do something if the i915.mmio_debug
+ * option is enabled.
+ *
+ * If we detect an unclaimed register when @before is true, it means some
+ * unknown code wrote to an unclaimed register, and we're just detecting it now.
+ * The bad code can be i915.ko, some other Kernel module (e.g., snd_hda_intel,
+ * vgacon) or even user space. If we detect it when @before is false, then
+ * there's a really good chance that the read/write operation we just did to
+ * @reg is what triggered the unclaimed register message, but there's also the
+ * possibility that after the operation we did, and before this function,
+ * something else touched another unclaimed register, and we are detecting it
+ * now.
+ *
+ * The unclaimed register bit can get set when we touch a register that does not
+ * exist and when we touch a register that exists but is powered down. Please
+ * also notice that the HW can only check some register ranges, so there's no
+ * guarantee that all read and write operations to inexistent registers will be
+ * caught by this function.
+ *
+ * Also please notice that we don't run this function on I915_READ_NOTRACE,
+ * I915_WRITE_NOTRACE, and in many other cases, so the case where @before is
+ * true is quite common.
+ */
 static void
-hsw_unclaimed_reg_clear(struct drm_i915_private *dev_priv, u32 reg)
+hsw_unclaimed_reg_debug(struct drm_i915_private *dev_priv, u32 reg, bool read,
+			bool before)
 {
+	const char *op = read ? "reading" : "writing to";
+	const char *when = before ? "before" : "after";
+
+	if (!i915.mmio_debug)
+		return;
+
 	if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
-		DRM_ERROR("Unknown unclaimed register before writing to %x\n",
-			  reg);
+		WARN(1, "Unclaimed register detected %s %s register 0x%x\n",
+		     when, op, reg);
 		__raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
 	}
 }
 
+/**
+ * hsw_unclaimed_reg_detect - tries to find unclaimed registers
+ * @dev_priv: device private data
+ *
+ * This function will try to detect if something touched and unclaimed register,
+ * triggering the FPGA_DBG bit. If this happens, we will print a message telling
+ * the user to use i915.mmio_debug=1 so we can properly debug the problem.
+ *
+ * This way, we can still detect our bugs without giving the user the
+ * performance impact of hsw_unclaimed_reg_debug().
+ */
 static void
-hsw_unclaimed_reg_check(struct drm_i915_private *dev_priv, u32 reg)
+hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv)
 {
+	if (i915.mmio_debug)
+		return;
+
 	if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
-		DRM_ERROR("Unclaimed write to %x\n", reg);
+		DRM_ERROR("Unclaimed register detected. Please use the i915.mmio_debug=1 to debug this problem.");
 		__raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
 	}
 }
@@ -557,6 +609,7 @@ gen5_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace) { \
 static u##x \
 gen6_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace) { \
 	REG_READ_HEADER(x); \
+	hsw_unclaimed_reg_debug(dev_priv, reg, true, true); \
 	if (dev_priv->uncore.forcewake_count == 0 && \
 	    NEEDS_FORCE_WAKE((dev_priv), (reg))) { \
 		dev_priv->uncore.funcs.force_wake_get(dev_priv, \
@@ -567,6 +620,7 @@ gen6_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace) { \
 	} else { \
 		val = __raw_i915_read##x(dev_priv, reg); \
 	} \
+	hsw_unclaimed_reg_debug(dev_priv, reg, true, false); \
 	REG_READ_FOOTER; \
 }
 
@@ -664,12 +718,13 @@ hsw_write##x(struct drm_i915_private *dev_priv, off_t reg, u##x val, bool trace)
 	if (NEEDS_FORCE_WAKE((dev_priv), (reg))) { \
 		__fifo_ret = __gen6_gt_wait_for_fifo(dev_priv); \
 	} \
-	hsw_unclaimed_reg_clear(dev_priv, reg); \
+	hsw_unclaimed_reg_debug(dev_priv, reg, false, true); \
 	__raw_i915_write##x(dev_priv, reg, val); \
 	if (unlikely(__fifo_ret)) { \
 		gen6_gt_check_fifodbg(dev_priv); \
 	} \
-	hsw_unclaimed_reg_check(dev_priv, reg); \
+	hsw_unclaimed_reg_debug(dev_priv, reg, false, false); \
+	hsw_unclaimed_reg_detect(dev_priv); \
 	REG_WRITE_FOOTER; \
 }
 
-- 
1.9.0




More information about the Intel-gfx mailing list