[Intel-gfx] [PATCH 4/5] drm/i915: reorganize the unclaimed register detection code

Daniel Vetter daniel at ffwll.ch
Mon Jul 7 23:34:08 CEST 2014


On Fri, Jul 04, 2014 at 11:50:32AM -0300, Paulo Zanoni wrote:
> From: Paulo Zanoni <paulo.r.zanoni at intel.com>
> 
> The current code only runs when we do an I915_WRITE operation. It
> checks if the unclaimed register flag is set before we do the
> operation, and then it checks it again after we do the operation. This
> double check allows us to find out if the I915_WRITE operation in
> question is the bad one, or if some previous code is the bad one. When
> it finds a problem, our code uses DRM_ERROR to signal it.
> 
> The good thing about the current code is that it detects the problem,
> so at least we can know we did something wrong. The problem is that
> even though we find the problem, we don't really have much information
> to actually debug it. So whenever I see one of these DRM_ERROR
> messages on my systems, the first thing I do is apply a patch to
> change the DRM_ERROR to a WARN and also check for unclaimed registers
> on I915_READ operations. This local patch makes things even slower,
> but it usually helps a lot in finding the bad code.
> 
> The first point here is that since the current code is only useful to
> detect whether we have a problem or not, but it is not really good to
> find the cause of the problem, I don't think we should be checking
> both before and after every I915_WRITE operation: just doing the check
> once should be enough for us to quickly detect problems. With this
> change, the code that runs by default for every single user will only
> do 1 read operation for every single I915_WRITE, instead of 2. This
> patch does this change.
> 
> The second point is that the local patch I have should be upstream,
> but since it makes things slower it should be disabled by default. So
> I added the i915.mmio_debug option to enable it.
> 
> So after this patch, this is what will happen:
>  - By default, we will try to detect unclaimed registers once after
>    every I915_WRITE operation. Previously we tried twice for every
>    I915_WRITE.
>  - When we find an unclaimed register we will still print a DRM_ERROR
>    message, but we will now tell the user to try again with
>    i915.mmio_debug=1.
>  - When we use i915.mmio_debug=1 we will try to find unclaimed
>    registers both before and after every I915_READ and I915_WRITE
>    operation, and we will print stack traces in case we find them.
>    This should really help locating the exact point of the bad code
>    (or at least finding out that i915.ko is not the problem).
> 
> This commit also opens space for really-slow register debugging
> operations on other platforms. In theory we can now add lots and lots
> of debug code behind i915.mmio_debug, enable this option on our tests,
> and catch more problems.
> 
> Signed-off-by: Paulo Zanoni <paulo.r.zanoni at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h     |  1 +
>  drivers/gpu/drm/i915/i915_params.c  |  6 ++++
>  drivers/gpu/drm/i915/intel_uncore.c | 68 +++++++++++++++++++++++++++++++++----
>  3 files changed, 68 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ac06c0f..51d867f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2092,6 +2092,7 @@ struct i915_params {
>  	bool disable_display;
>  	bool disable_vtd_wa;
>  	int use_mmio_flip;
> +	bool mmio_debug;
>  };
>  extern struct i915_params i915 __read_mostly;
>  
> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> index 8145729..7977872 100644
> --- a/drivers/gpu/drm/i915/i915_params.c
> +++ b/drivers/gpu/drm/i915/i915_params.c
> @@ -49,6 +49,7 @@ struct i915_params i915 __read_mostly = {
>  	.enable_cmd_parser = 1,
>  	.disable_vtd_wa = 0,
>  	.use_mmio_flip = 0,
> +	.mmio_debug = 0,
>  };
>  
>  module_param_named(modeset, i915.modeset, int, 0400);
> @@ -161,3 +162,8 @@ MODULE_PARM_DESC(enable_cmd_parser,
>  module_param_named(use_mmio_flip, i915.use_mmio_flip, int, 0600);
>  MODULE_PARM_DESC(use_mmio_flip,
>  		 "use MMIO flips (-1=never, 0=driver discretion [default], 1=always)");
> +
> +module_param_named(mmio_debug, i915.mmio_debug, bool, 0600);
> +MODULE_PARM_DESC(disable_vtd_wa,
> +	"Enable the MMIO debug code (default: false). This may negatively "
> +	"affect performance.");
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 29145df..de5402f 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -513,21 +513,72 @@ ilk_dummy_write(struct drm_i915_private *dev_priv)
>  	__raw_i915_write32(dev_priv, MI_MODE, 0);
>  }
>  
> +/**
> + * hsw_unclaimed_reg_debug - tries to find unclaimed registers
> + * @dev_priv: device private data
> + * @reg: register we're about to touch or just have touched
> + * @read: are we reading or writing a register now?
> + * @before: are we running this before or after we touch the register?
> + *
> + * This function tries to detect unclaimed registers and provide as much
> + * information as possible. It will only do something if the i915.mmio_debug
> + * option is enabled.
> + *
> + * If we detect an unclaimed register when @before is true, it means some
> + * unknown code wrote to an unclaimed register, and we're just detecting it now.
> + * The bad code can be i915.ko, some other Kernel module (e.g., snd_hda_intel,
> + * vgacon) or even user space. If we detect it when @before is false, then
> + * there's a really good chance that the read/write operation we just did to
> + * @reg is what triggered the unclaimed register message, but there's also the
> + * possibility that after the operation we did, and before this function,
> + * something else touched another unclaimed register, and we are detecting it
> + * now.
> + *
> + * The unclaimed register bit can get set when we touch a register that does not
> + * exist and when we touch a register that exists but is powered down. Please
> + * also notice that the HW can only check some register ranges, so there's no
> + * guarantee that all read and write operations to inexistent registers will be
> + * caught by this function.
> + *
> + * Also please notice that we don't run this function on __raw operations and in
> + * many other cases, so the case where @before is true is quite common.
> + */

Pet peeve of mine: Comments have a cost since they get stale really fast
and then can be awfully misleading. We have way too many examples for
that. My rule of interface kerneldoc for functions is that we should only
have it for exported functions used in multiple different places in the
driver. If there's only one caller (like for _init/fini) functions the
comment should be very terse at most.

For static inline helpers in the same file otoh code should simply be
readable enough to not require comments. Imo this is the case here, so the
kerneldoc should imo be dropped.

If you want to keep some of it I guess we could document i915.mmio_debug
with kerneldoc and then also pull it into our i915 section in the drm
docbook. Maybe under a new "debug options" setting. We probably need a
DOC: section to make this work. A few sentences should be more than
enough.

>  static void
> -hsw_unclaimed_reg_clear(struct drm_i915_private *dev_priv, u32 reg)
> +hsw_unclaimed_reg_debug(struct drm_i915_private *dev_priv, u32 reg, bool read,
> +			bool before)
>  {
> +	const char *op = read ? "reading" : "writing to";
> +	const char *when = before ? "before" : "after";
> +
> +	if (!i915.mmio_debug)
> +		return;
> +
>  	if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
> -		DRM_ERROR("Unknown unclaimed register before writing to %x\n",
> -			  reg);
> +		WARN(1, "Unclaimed register detected %s %s register 0x%x\n",
> +		     when, op, reg);
>  		__raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
>  	}
>  }
>  
> +/**
> + * hsw_unclaimed_reg_detect - tries to find unclaimed registers
> + * @dev_priv: device private data
> + *
> + * This function will try to detect if something touched and unclaimed register,
> + * triggering the FPGA_DBG bit. If this happens, we will print a message telling
> + * the user to use i915.mmio_debug=1 so we can properly debug the problem.
> + *
> + * This way, we can still detect our bugs without giving the user the
> + * performance impact of hsw_unclaimed_reg_debug().
> + */
>  static void
> -hsw_unclaimed_reg_check(struct drm_i915_private *dev_priv, u32 reg)
> +hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv)
>  {
> +	if (i915.mmio_debug)
> +		return;
> +
>  	if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
> -		DRM_ERROR("Unclaimed write to %x\n", reg);
> +		DRM_ERROR("Unclaimed register detected. Please use the i915.mmio_debug=1 to debug this problem.");

But imo really the only piece that's required is this helpful message here
into dmesg.

Anyway, the concept looks solid.
-Daniel

>  		__raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
>  	}
>  }
> @@ -564,6 +615,7 @@ gen5_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace) { \
>  static u##x \
>  gen6_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace) { \
>  	REG_READ_HEADER(x); \
> +	hsw_unclaimed_reg_debug(dev_priv, reg, true, true); \
>  	if (dev_priv->uncore.forcewake_count == 0 && \
>  	    NEEDS_FORCE_WAKE((dev_priv), (reg))) { \
>  		dev_priv->uncore.funcs.force_wake_get(dev_priv, \
> @@ -574,6 +626,7 @@ gen6_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace) { \
>  	} else { \
>  		val = __raw_i915_read##x(dev_priv, reg); \
>  	} \
> +	hsw_unclaimed_reg_debug(dev_priv, reg, true, false); \
>  	REG_READ_FOOTER; \
>  }
>  
> @@ -700,12 +753,13 @@ hsw_write##x(struct drm_i915_private *dev_priv, off_t reg, u##x val, bool trace)
>  	if (NEEDS_FORCE_WAKE((dev_priv), (reg))) { \
>  		__fifo_ret = __gen6_gt_wait_for_fifo(dev_priv); \
>  	} \
> -	hsw_unclaimed_reg_clear(dev_priv, reg); \
> +	hsw_unclaimed_reg_debug(dev_priv, reg, false, true); \
>  	__raw_i915_write##x(dev_priv, reg, val); \
>  	if (unlikely(__fifo_ret)) { \
>  		gen6_gt_check_fifodbg(dev_priv); \
>  	} \
> -	hsw_unclaimed_reg_check(dev_priv, reg); \
> +	hsw_unclaimed_reg_debug(dev_priv, reg, false, false); \
> +	hsw_unclaimed_reg_detect(dev_priv); \
>  	REG_WRITE_FOOTER; \
>  }
>  
> -- 
> 2.0.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list