[Intel-gfx] [PATCH 2/2] drm/i915: restrict unclaimed register checking

Zanoni, Paulo R paulo.r.zanoni at intel.com
Wed Sep 2 13:20:52 PDT 2015


Em Qua, 2015-08-26 às 08:44 +0100, Chris Wilson escreveu:
> On Tue, Aug 25, 2015 at 07:03:42PM -0300, Paulo Zanoni wrote:
> > The unclaimed register bit is only triggered when someone touches 
> > the
> > specified register range.
> > 
> > For the normal use case (with i915.mmio_debug=0), this commit will
> > avoid the extra __raw_i915_read32() call for every register outside
> > the specified range, at the expense of a few additional "if"
> > statements.
> > 
> > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > Signed-off-by: Paulo Zanoni <paulo.r.zanoni at intel.com>
> 
> 
> > @@ -612,7 +614,7 @@ hsw_unclaimed_reg_debug(struct drm_i915_private 
> > *dev_priv, u32 reg, bool read,
> >  	const char *op = read ? "reading" : "writing to";
> >  	const char *when = before ? "before" : "after";
> >  
> > -	if (!i915.mmio_debug)
> > +	if (!i915.mmio_debug || !UNCLAIMED_CHECK_RANGE(reg))
> >  		return;
> 
> Place the register check on the preceding line so that it is not
> confused with checking the debug-enabled status. (Also you can argue
> that reg will be register/cache-hot and so a cheaper test to do first
> than loading a module parameter.)

That makes sense, I'll do it.

> 
> >  	if (__raw_i915_read32(dev_priv, FPGA_DBG) & 
> > FPGA_DBG_RM_NOCLAIM) {
> > @@ -624,11 +626,11 @@ hsw_unclaimed_reg_debug(struct 
> > drm_i915_private *dev_priv, u32 reg, bool read,
> >  }
> >  
> >  static void
> > -hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv)
> > +hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv, u32 
> > reg)
> >  {
> >  	static bool mmio_debug_once = true;
> >  
> > -	if (i915.mmio_debug || !mmio_debug_once)
> > +	if (i915.mmio_debug || !UNCLAIMED_CHECK_RANGE(reg) || 
> > !mmio_debug_once)
> >  		return;
> 
> The use is wrong here though because you never reviewed my patch to
> enable the single-shot debugging from the interrupt handler to catch
> invalid usage from elsewhere.

If you're talking about intel_uncore_check_errors(), I think we can
just kill it now. I'll submit a patch with the arguments, so we can
continue this topic there.

Moving back to the main subject:

In the last time you sent the patch to do the unclaimed checks only on
forcewake code, I started the conversations with the HW guys about the
range of registers relevant by FPGA_DBG (and CCd you on these
conversations). My plan was to write this patch at that time, but
priorities happened and it got postponed :(. I also hoped that maybe
you would eventually end up writing it and I would just have to review
it :)

My main argument was that by doing unclaimed register checking only at
forcewake time we would be running a check that is only relevant to
display code during non-display code. So my idea is that if we restrict
the unclaimed check to the actual range of registers that can be caught
we basically remove unclaimed register checking for most (all?) of the
performance-sensitive code.

Since we have multiple solutions, I decided to do some experiments.
First of all, since I was not really seeing hsw_unclaimed_reg_detect()
on perf, I decided to patch it so it would do the read/write check
around FPGA_DBG 50 times per call instead of just one. With this, by
running "perf record glxgears" for a few seconds I could see
hsw_unclaimed_reg_detect() taking about 4.6% of the total time.

Then I applied just this patch, and the time went down to 0.13%. I also
applied your patch on top of it all, and it went up to 0.52% (I guess
the extra checks at forcewake time cost a little bit). Also notice that
since I add a "reg" argument to hsw_unclaimed_reg_detect(), I changed
the forcewake call to use a register in the display range.
I also tested your patch without my patch, and the measured result was
about 0.56%.

Now this isn't a super relevant experiment: just glxgears with a
modified hsw_unclaimed_reg_detect(), but I thought it would be useful
information, and maybe you could provide me suggestions for better
workloads.

So I'd like to know if you're ok with proceeding with just this patch
(considering I implement your suggestions), or if you think we should
go with just your patch or both or none.

Thanks,
Paulo

> -Chris
> 


More information about the Intel-gfx mailing list