[Intel-gfx] [PATCH v3] drm/i915: Use exponential backoff for wait_for()

Thu Nov 30 07:15:48 UTC 2017

On 11/29/2017 10:19 PM, Sagar Arun Kamble wrote:
> On 11/30/2017 8:34 AM, John Harrison wrote:
>> On 11/24/2017 6:12 AM, Chris Wilson wrote:
>>> Quoting Michał Winiarski (2017-11-24 12:37:56)
>>>> Since we see the effects for GuC preeption, let's gather some evidence.
>>>>
>>>> (SKL)
>>>> intel_guc_send_mmio latency: 100 rounds of gem_exec_latency --r '*-preemption'
>>>>
>>>> drm-tip:
>>>>       usecs               : count     distribution
>>>>           0 -> 1          : 0        |                                        |
>>>>           2 -> 3          : 0        |                                        |
>>>>           4 -> 7          : 0        |                                        |
>>>>           8 -> 15         : 44       |                                        |
>>>>          16 -> 31         : 1088     |                                        |
>>>>          32 -> 63         : 832      |                                        |
>>>>          64 -> 127        : 0        |                                        |
>>>>         128 -> 255        : 0        |                                        |
>>>>         256 -> 511        : 12       |                                        |
>>>>         512 -> 1023       : 0        |                                        |
>>>>        1024 -> 2047       : 29899    |*********                               |
>>>>        2048 -> 4095       : 131033   |****************************************|
>>> Such pretty graphs. Reminds me of the bpf hist output, I wonder if we
>>> could create a tracepoint/kprobe that would output a histogram for each
>>> waiter (filterable ofc). Benefit? Just thinking of tuning the
>>> spin/sleep, in which case overall metrics are best
>>> (intel_eait_for_register needs to be optimised for the typical case). I
>>> am wondering if we could tune the spin period down to 5us, 2us? And then
>>> have the 10us sleep.
>>>
>>> We would also need a typical workload to run, it's profile-guided
>>> optimisation after all. Hmm.
>>> -Chris
>>
>> It took me a while to get back to this but I've now had chance to run 
>> with this exponential backoff scheme on the original system that 
>> showed the problem. It was a slightly messy back port due to the 
>> customer tree being much older than current nightly. I'm pretty sure 
>> I got it correct though. However, I'm not sure what the 
>> recommendation is for the two timeout values. Using the default of 
>> '10, 10' in the patch, I still get lots of very long delays. 
> Recommended setting currently is Wmin=10, Wmax=10 for wait_for_us and 
> Wmin=10, Wmax=1000 for wait_for.
>
> Exponential backoff is more helpful inside wait_for if wait_for_us 
> prior to wait_for is smaller.
> Setting Wmax less than Wmin is effectively changing the backoff 
> strategy to just linear waits of Wmin.
>> I have to up the Wmin value to at least 140 to get a stall free 
>> result. Which is plausible given that the big spike in the results of 
>> any fast version is at 110-150us. Also of note is that a Wmin between 
>> 10 and 110 actually makes things worse. Changing Wmax has no effect.
>>
>> In the following table, 'original' is the original driver before any 
>> changes and 'retry loop' is the version using the first workaround of 
>> just running the busy poll wait in a 10x loop. The other columns are 
>> using the backoff patch with the given Wmin/Wmax values. Note that 
>> the times are bucketed to 10us up to 500us and then in 500us lumps 
>> thereafter. The value listed is the lower limit, i.e. there were no 
>> times of <10us measured. Each case was run for 1000 samples.
>>
> Below setting like in current nightly will suit this workload and as 
> you have found this will also likely complete most waits in <150us.
> If many samples had been beyond 160us and less than 300us we might 
> have been needed to change Wmin to may be 15 or 20 to ensure the
> exponential rise caps around 300us.
>
> wait_for_us(10, 10)
> wait_for()
>
> #define wait_for _wait_for(10, 1000)
>
But as shown in the table, a setting of 10/10 does not work well for 
this workload. The best results possible are a large spike of waits in 
the 120-130us bucket with a small tail out to 150us. Whereas, the 10/10 
setting produces a spike from 150-170us with the tail extending to 240us 
and an appreciable number of samples stretching all the way out to the 
1-10ms range. A regular delay of multiple milliseconds is not acceptable 
when this path is supposed to be a low latency pre-emption to switch to 
some super high priority time critical task. And as noted, I did try a 
bunch of different settings for Wmax but nothing seemed to make much of 
a difference. E.g. 10/10 vs 10/1000 produced pretty much identical 
results. Hence it didn't seem worth including those in the table.


>>     Time        Original    10/10 50/10    100/10    110/10    
>> 130/10    140/10  RetryLoop
>>     10us:          2         2         2         2 2         
>> 2         2         2
>>     30us:                              1         1 1         1         1
>>     50us:                              1
>>     70us:                             14        63 56        
>> 64        63        61
>>     80us:                              8        41 52        
>> 44        46        41
>>     90us:                              6        24 10        
>> 28        12        17
>>    100us:                    2         4        20 16        
>> 17        17        22
>>    110us:                                       13 21        
>> 14        13        11
>>    120us:                              6       366 633       
>> 636       660       650
>>    130us:                    2         2        46 125        
>> 95        86        95
>>    140us:                    3         2        16 18        
>> 32        46        48
>>    150us:                  210         3        12 13        
>> 37        32        31
>>    160us:                  322         1        18 10        
>> 14        12        17
>>    170us:                  157         4         5 5         
>> 3         5         2
>>    180us:                   62        11         3 1         
>> 2         1         1
>>    190us:                   32       212 1                   1         2
>>    200us:                   27       266 1                   1
>>    210us:                   16 
>> 181                                                 1
>>    220us:                   16 51                                       1
>>    230us:                   10        43         4
>>    240us:                   12        22        62 1
>>    250us:                    4        12       112 3
>>    260us:                    3        13        73 8
>>    270us:                    5        12        12 8         2
>>    280us:                    4         7        12 5         1
>>    290us:                              9         4
>>    300us:                    1         3         9 1         1
>>    310us:                    2         3         5 1         1
>>    320us:                    1         4         2 3
>>    330us:                    1         5         1
>>    340us:                    1 2                   1
>>    350us:                              2         1
>>    360us:                              2         1
>>    370us:                    2                   2
>>    380us:                                        1
>>    390us:                    2         1         2 1
>>    410us:                    1
>>    420us:                    3
>>    430us:                    2         2         1
>>    440us:                    2         1
>>    450us:                              4
>>    460us:                    3         1
>>    470us:                              3         1
>>    480us:                    2                   2
>>    490us:                                        1
>>    500us:                   19        13        17
>>   1000us:        249        22        30        11
>>   1500us:        393         4         4         2 1
>>   2000us:        132         7         8         8 2         
>> 1                   1
>>   2500us:         63         4         4         6 1         1         1
>>   3000us:         59         9         7         6 1
>>   3500us:         34         2 1                             1
>>   4000us:         17         9         4         1
>>   4500us:          8         2         1         1
>>   5000us:          7         1         2
>>   5500us:          7         2                   1
>>   6000us:          4         2         1         1
>>   6500us:          3                             1
>>   7000us:          6         2                   1
>>   7500us:          4         1 1
>>   8000us:          5                             1
>>   8500us:                    1         1
>>   9000us:          2
>>   9500us:          2         1
>> >10000us:          3                             1
>>
>>
>> John.
>>
>>
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20171129/eb984137/attachment-0001.html>