[PATCH 2/3] drm/i915: WARN if RPS adjustment is more than range

Kamble, Sagar A sagar.a.kamble at intel.com
Fri Jan 13 17:31:48 UTC 2017


Hi Chris,

Details of the issue with GuC and trace that shows the adjustment overflow to negative values is given below. PFA trace for reference.
Even when Host RPS has requested RP0, Up Threshold Interrupts are coming due to kernel bug.
With many such interrupts driver adjustment goes 1,2,4,8...,0x40000000,0x80000000(-2147483648) and then request goes down to Rpe Then ramp starts again which is impacting workload performance.
Patch at https://patchwork.freedesktop.org/patch/133084/ will mask interrupts correctly with GuC.

Configuration:
Rpe = 300MHz
Rp0 = 900MHz
Kernel = 4.10.0-rc3+
System = SKL GT3 RVP7

Scores:
Submission/RPS/glb_egypt_fixedtime score
Host/Host/165
GuC/Host/153
GuC/Fix at Rp0/165
GuC/SLPC/165

Ftrace Data: Analysis shows after reaching RP0, RP UP interrupts keep coming as they are unmasked due to kernel bug.
Due to multiple such interrupts (32) driver adjustment of frequency goes bad and sets frequency to min and from thereonwards ramp starts back again. Fix is to avoid the unnecessary interrupts that are leading to miscalculation of adjustments.

Thanks
Sagar

-----Original Message-----
From: Chris Wilson [mailto:chris at chris-wilson.co.uk] 
Sent: Friday, January 13, 2017 8:33 PM
To: Kamble, Sagar A <sagar.a.kamble at intel.com>
Cc: Intel-gfx-trybot at lists.freedesktop.org
Subject: Re: [PATCH 2/3] drm/i915: WARN if RPS adjustment is more than range

On Fri, Jan 13, 2017 at 08:21:45PM +0530, Kamble, Sagar A wrote:
> 
> 
> On 1/13/2017 4:26 PM, Chris Wilson wrote:
> >On Fri, Jan 13, 2017 at 04:12:18PM +0530, Kamble, Sagar A wrote:
> >>
> >>On 1/13/2017 3:55 PM, Chris Wilson wrote:
> >>>On Fri, Jan 13, 2017 at 03:46:11PM +0530, Sagar Arun Kamble wrote:
> >>>>Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >>>>Signed-off-by: Sagar Arun Kamble <sagar.a.kamble at intel.com>
> >>>>---
> >>>>  drivers/gpu/drm/i915/i915_irq.c | 2 ++
> >>>>  1 file changed, 2 insertions(+)
> >>>>
> >>>>diff --git a/drivers/gpu/drm/i915/i915_irq.c 
> >>>>b/drivers/gpu/drm/i915/i915_irq.c index ce5663d..b85bef9 100644
> >>>>--- a/drivers/gpu/drm/i915/i915_irq.c
> >>>>+++ b/drivers/gpu/drm/i915/i915_irq.c
> >>>>@@ -1195,6 +1195,8 @@ static void gen6_pm_rps_work(struct work_struct *work)
> >>>>  		adj = 0;
> >>>>  	}
> >>>>+	WARN_ON(abs(adj) > (max-min));
> >>>Do we actually care? The adjustment is just an exponential function 
> >>>that will be clamped (and then when it is finally clamped will become zero).
> >>>Before it is clamped, it is quite permissible that the last adj 
> >>>will be larger than the entire range.
> >>>-Chris
> >>Oh. right. I coded this incorrectly. Will it be okay if we WARN on 
> >>-ve adjustment for Up interrupts and +ve for Down interrupts.
> >The code has to assume that the pm_iir may contain multiple 
> >interrupts (i.e. both up, down, timeout) because we coalesce the hw 
> >interrupt into the worker. I fear you will just end up repeating the 
> >if-else chain, at which point you could just review the code above.
> >-Chris
> Ok. I am thinking of following options. Could you suggest which one to 
> go ahead.
> 1. Adjustment rollover to negative value due to overflow of adj on 
> consecutive UP interrupts can be avoided if Up interrupts are masked 
> properly. Masking with GUC is fixed in earlier patch.

How can we overflow? Hmm, we are generating interrupts that we don't care for. That's the actual bug, right?

>     Should we just have that patch and ignore the WARN_ON or bounds 
> check altogether.
> 2. Add WARN_ON in individual condition blocks for Up/Down interrupts 
> on invalid adjustment.
> 3. Without WARN_ON on invalid adjustment just reset adjustment to zero

Something like:

if (WARN_ON_ONCE(slpc && (pm_iir & INVALID_SLPC_MASK)))
	pm_iir &= ~INVALID_SLPC_MASK

near the start?
-Chris

--
Chris Wilson, Intel Open Source Technology Centre
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trace1.log
Type: application/octet-stream
Size: 2102513 bytes
Desc: trace1.log
URL: <https://lists.freedesktop.org/archives/intel-gfx-trybot/attachments/20170113/66505276/attachment-0001.obj>


More information about the Intel-gfx-trybot mailing list