[Intel-gfx] [PATCH 2/2] drm/i915: restrict unclaimed register checking
Paulo Zanoni
przanoni at gmail.com
Wed Sep 2 15:13:05 PDT 2015
2015-09-02 17:53 GMT-03:00 chris at chris-wilson.co.uk <chris at chris-wilson.co.uk>:
> On Wed, Sep 02, 2015 at 08:20:52PM +0000, Zanoni, Paulo R wrote:
>> Em Qua, 2015-08-26 às 08:44 +0100, Chris Wilson escreveu:
>> > On Tue, Aug 25, 2015 at 07:03:42PM -0300, Paulo Zanoni wrote:
>> > > The unclaimed register bit is only triggered when someone touches
>> > > the
>> > > specified register range.
>> > >
>> > > For the normal use case (with i915.mmio_debug=0), this commit will
>> > > avoid the extra __raw_i915_read32() call for every register outside
>> > > the specified range, at the expense of a few additional "if"
>> > > statements.
>> > >
>> > > Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> > > Signed-off-by: Paulo Zanoni <paulo.r.zanoni at intel.com>
>> >
>> >
>> > > @@ -612,7 +614,7 @@ hsw_unclaimed_reg_debug(struct drm_i915_private
>> > > *dev_priv, u32 reg, bool read,
>> > > const char *op = read ? "reading" : "writing to";
>> > > const char *when = before ? "before" : "after";
>> > >
>> > > - if (!i915.mmio_debug)
>> > > + if (!i915.mmio_debug || !UNCLAIMED_CHECK_RANGE(reg))
>> > > return;
>> >
>> > Place the register check on the preceding line so that it is not
>> > confused with checking the debug-enabled status. (Also you can argue
>> > that reg will be register/cache-hot and so a cheaper test to do first
>> > than loading a module parameter.)
>>
>> That makes sense, I'll do it.
>>
>> >
>> > > if (__raw_i915_read32(dev_priv, FPGA_DBG) &
>> > > FPGA_DBG_RM_NOCLAIM) {
>> > > @@ -624,11 +626,11 @@ hsw_unclaimed_reg_debug(struct
>> > > drm_i915_private *dev_priv, u32 reg, bool read,
>> > > }
>> > >
>> > > static void
>> > > -hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv)
>> > > +hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv, u32
>> > > reg)
>> > > {
>> > > static bool mmio_debug_once = true;
>> > >
>> > > - if (i915.mmio_debug || !mmio_debug_once)
>> > > + if (i915.mmio_debug || !UNCLAIMED_CHECK_RANGE(reg) ||
>> > > !mmio_debug_once)
>> > > return;
>> >
>> > The use is wrong here though because you never reviewed my patch to
>> > enable the single-shot debugging from the interrupt handler to catch
>> > invalid usage from elsewhere.
>>
>> If you're talking about intel_uncore_check_errors(), I think we can
>> just kill it now. I'll submit a patch with the arguments, so we can
>> continue this topic there.
>>
>> Moving back to the main subject:
>>
>> In the last time you sent the patch to do the unclaimed checks only on
>> forcewake code, I started the conversations with the HW guys about the
>> range of registers relevant by FPGA_DBG (and CCd you on these
>> conversations). My plan was to write this patch at that time, but
>> priorities happened and it got postponed :(. I also hoped that maybe
>> you would eventually end up writing it and I would just have to review
>> it :)
>>
>> My main argument was that by doing unclaimed register checking only at
>> forcewake time we would be running a check that is only relevant to
>> display code during non-display code. So my idea is that if we restrict
>> the unclaimed check to the actual range of registers that can be caught
>> we basically remove unclaimed register checking for most (all?) of the
>> performance-sensitive code.
>>
>> Since we have multiple solutions, I decided to do some experiments.
>> First of all, since I was not really seeing hsw_unclaimed_reg_detect()
>> on perf, I decided to patch it so it would do the read/write check
>> around FPGA_DBG 50 times per call instead of just one. With this, by
>> running "perf record glxgears" for a few seconds I could see
>> hsw_unclaimed_reg_detect() taking about 4.6% of the total time.
>
> Something is very suspect with your system if you are not observing much
> hsw_unclaimed_reg_check during trivial workloads.
Ok, I should have said "almost not seeing" instead of "not really
seeing": my bad. I see about 0.24% on perf with plain
drm-intel-nightly. The problem is that any of the next patches reduces
this down to 0.01% or 0.02%, so it's hard to compare the optimized
patches. By doing 50 reg reads on detect() I can make the numbers of
the optimized patches a little bigger, so easier to compare.
>
>> Then I applied just this patch, and the time went down to 0.13%. I also
>> applied your patch on top of it all, and it went up to 0.52% (I guess
>> the extra checks at forcewake time cost a little bit). Also notice that
>> since I add a "reg" argument to hsw_unclaimed_reg_detect(), I changed
>> the forcewake call to use a register in the display range.
>> I also tested your patch without my patch, and the measured result was
>> about 0.56%.
>>
>> Now this isn't a super relevant experiment: just glxgears with a
>> modified hsw_unclaimed_reg_detect(), but I thought it would be useful
>> information, and maybe you could provide me suggestions for better
>> workloads.
>>
>> So I'd like to know if you're ok with proceeding with just this patch
>> (considering I implement your suggestions), or if you think we should
>> go with just your patch or both or none.
>
> If you really wanted to, you could combine approaches a check in the
> forcewake handler as demonstrated is an order of magnitude less frequent
> than the register accesses. The key point is that we *need* the checking
> against other users, not just our known register access. Checking our own
> access basically amounts to detecting invalid registers, which your
> approach more or less erradicates (since it is a flag we add to known good
> registers, we may as well just compile it and only turn it on during hw
> bringup) and that we have the power well awake.
> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Paulo Zanoni
More information about the Intel-gfx
mailing list