[Intel-gfx] [PATCH v3] drm/i915: Use exponential backoff for wait_for()
Sagar Arun Kamble
sagar.a.kamble at intel.com
Thu Nov 30 06:19:40 UTC 2017
On 11/30/2017 8:34 AM, John Harrison wrote:
> On 11/24/2017 6:12 AM, Chris Wilson wrote:
>> Quoting Michał Winiarski (2017-11-24 12:37:56)
>>> Since we see the effects for GuC preeption, let's gather some evidence.
>>>
>>> (SKL)
>>> intel_guc_send_mmio latency: 100 rounds of gem_exec_latency --r '*-preemption'
>>>
>>> drm-tip:
>>> usecs : count distribution
>>> 0 -> 1 : 0 | |
>>> 2 -> 3 : 0 | |
>>> 4 -> 7 : 0 | |
>>> 8 -> 15 : 44 | |
>>> 16 -> 31 : 1088 | |
>>> 32 -> 63 : 832 | |
>>> 64 -> 127 : 0 | |
>>> 128 -> 255 : 0 | |
>>> 256 -> 511 : 12 | |
>>> 512 -> 1023 : 0 | |
>>> 1024 -> 2047 : 29899 |********* |
>>> 2048 -> 4095 : 131033 |****************************************|
>> Such pretty graphs. Reminds me of the bpf hist output, I wonder if we
>> could create a tracepoint/kprobe that would output a histogram for each
>> waiter (filterable ofc). Benefit? Just thinking of tuning the
>> spin/sleep, in which case overall metrics are best
>> (intel_eait_for_register needs to be optimised for the typical case). I
>> am wondering if we could tune the spin period down to 5us, 2us? And then
>> have the 10us sleep.
>>
>> We would also need a typical workload to run, it's profile-guided
>> optimisation after all. Hmm.
>> -Chris
>
> It took me a while to get back to this but I've now had chance to run
> with this exponential backoff scheme on the original system that
> showed the problem. It was a slightly messy back port due to the
> customer tree being much older than current nightly. I'm pretty sure I
> got it correct though. However, I'm not sure what the recommendation
> is for the two timeout values. Using the default of '10, 10' in the
> patch, I still get lots of very long delays.
Recommended setting currently is Wmin=10, Wmax=10 for wait_for_us and
Wmin=10, Wmax=1000 for wait_for.
Exponential backoff is more helpful inside wait_for if wait_for_us prior
to wait_for is smaller.
Setting Wmax less than Wmin is effectively changing the backoff strategy
to just linear waits of Wmin.
> I have to up the Wmin value to at least 140 to get a stall free
> result. Which is plausible given that the big spike in the results of
> any fast version is at 110-150us. Also of note is that a Wmin between
> 10 and 110 actually makes things worse. Changing Wmax has no effect.
>
> In the following table, 'original' is the original driver before any
> changes and 'retry loop' is the version using the first workaround of
> just running the busy poll wait in a 10x loop. The other columns are
> using the backoff patch with the given Wmin/Wmax values. Note that the
> times are bucketed to 10us up to 500us and then in 500us lumps
> thereafter. The value listed is the lower limit, i.e. there were no
> times of <10us measured. Each case was run for 1000 samples.
>
Below setting like in current nightly will suit this workload and as you
have found this will also likely complete most waits in <150us.
If many samples had been beyond 160us and less than 300us we might have
been needed to change Wmin to may be 15 or 20 to ensure the
exponential rise caps around 300us.
wait_for_us(10, 10)
wait_for()
#define wait_for _wait_for(10, 1000)
> Time Original 10/10 50/10 100/10 110/10
> 130/10 140/10 RetryLoop
> 10us: 2 2 2 2 2 2
> 2 2
> 30us: 1 1 1 1 1
> 50us: 1
> 70us: 14 63 56 64
> 63 61
> 80us: 8 41 52 44
> 46 41
> 90us: 6 24 10 28
> 12 17
> 100us: 2 4 20 16 17
> 17 22
> 110us: 13 21 14
> 13 11
> 120us: 6 366 633 636
> 660 650
> 130us: 2 2 46 125
> 95 86 95
> 140us: 3 2 16 18 32
> 46 48
> 150us: 210 3 12 13 37
> 32 31
> 160us: 322 1 18 10 14
> 12 17
> 170us: 157 4 5 5 3
> 5 2
> 180us: 62 11 3 1 2
> 1 1
> 190us: 32 212 1 1 2
> 200us: 27 266 1 1
> 210us: 16
> 181 1
> 220us: 16 51 1
> 230us: 10 43 4
> 240us: 12 22 62 1
> 250us: 4 12 112 3
> 260us: 3 13 73 8
> 270us: 5 12 12 8 2
> 280us: 4 7 12 5 1
> 290us: 9 4
> 300us: 1 3 9 1 1
> 310us: 2 3 5 1 1
> 320us: 1 4 2 3
> 330us: 1 5 1
> 340us: 1 2 1
> 350us: 2 1
> 360us: 2 1
> 370us: 2 2
> 380us: 1
> 390us: 2 1 2 1
> 410us: 1
> 420us: 3
> 430us: 2 2 1
> 440us: 2 1
> 450us: 4
> 460us: 3 1
> 470us: 3 1
> 480us: 2 2
> 490us: 1
> 500us: 19 13 17
> 1000us: 249 22 30 11
> 1500us: 393 4 4 2 1
> 2000us: 132 7 8 8 2
> 1 1
> 2500us: 63 4 4 6 1 1 1
> 3000us: 59 9 7 6 1
> 3500us: 34 2 1 1
> 4000us: 17 9 4 1
> 4500us: 8 2 1 1
> 5000us: 7 1 2
> 5500us: 7 2 1
> 6000us: 4 2 1 1
> 6500us: 3 1
> 7000us: 6 2 1
> 7500us: 4 1 1
> 8000us: 5 1
> 8500us: 1 1
> 9000us: 2
> 9500us: 2 1
> >10000us: 3 1
>
>
> John.
>
>
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20171130/34e9fd67/attachment.html>
More information about the Intel-gfx
mailing list