[Intel-gfx] [PATCH v3 2/2] drm/i915: Increase busyspin limit before a context-switch

Tue Nov 28 17:15:52 UTC 2017

On 26/11/2017 12:20, Chris Wilson wrote:
> Looking at the distribution of i915_wait_request for a set of GL
> benchmarks, we see:
> 
> broadwell# python bcc/tools/funclatency.py -u i915_wait_request
>     usecs               : count     distribution
>         0 -> 1          : 29184    |****************************************|
>         2 -> 3          : 5767     |*******                                 |
>         4 -> 7          : 3000     |****                                    |
>         8 -> 15         : 491      |                                        |
>        16 -> 31         : 140      |                                        |
>        32 -> 63         : 203      |                                        |
>        64 -> 127        : 543      |                                        |
>       128 -> 255        : 881      |*                                       |
>       256 -> 511        : 1209     |*                                       |
>       512 -> 1023       : 1739     |**                                      |
>      1024 -> 2047       : 22855    |*******************************         |
>      2048 -> 4095       : 1725     |**                                      |
>      4096 -> 8191       : 5813     |*******                                 |
>      8192 -> 16383      : 5348     |*******                                 |
>     16384 -> 32767      : 1000     |*                                       |
>     32768 -> 65535      : 4400     |******                                  |
>     65536 -> 131071     : 296      |                                        |
>    131072 -> 262143     : 225      |                                        |
>    262144 -> 524287     : 4        |                                        |
>    524288 -> 1048575    : 1        |                                        |
>   1048576 -> 2097151    : 1        |                                        |
>   2097152 -> 4194303    : 1        |                                        |
> 
> broxton# python bcc/tools/funclatency.py -u i915_wait_request
>     usecs               : count     distribution
>         0 -> 1          : 5523     |*************************************   |
>         2 -> 3          : 1340     |*********                               |
>         4 -> 7          : 2100     |**************                          |
>         8 -> 15         : 755      |*****                                   |
>        16 -> 31         : 211      |*                                       |
>        32 -> 63         : 53       |                                        |
>        64 -> 127        : 71       |                                        |
>       128 -> 255        : 113      |                                        |
>       256 -> 511        : 262      |*                                       |
>       512 -> 1023       : 358      |**                                      |
>      1024 -> 2047       : 1105     |*******                                 |
>      2048 -> 4095       : 848      |*****                                   |
>      4096 -> 8191       : 1295     |********                                |
>      8192 -> 16383      : 5894     |****************************************|
>     16384 -> 32767      : 4270     |****************************            |
>     32768 -> 65535      : 5622     |**************************************  |
>     65536 -> 131071     : 306      |**                                      |
>    131072 -> 262143     : 50       |                                        |
>    262144 -> 524287     : 76       |                                        |
>    524288 -> 1048575    : 34       |                                        |
>   1048576 -> 2097151    : 0        |                                        |
>   2097152 -> 4194303    : 1        |                                        |
> 
> Picking 20us for the context-switch busyspin has the dual advantage of
> catching most frequent short waits while avoiding the cost of a context
> switch. 20us is a typical latency of 2 context-switches, i.e. the cost
> of taking the sleep, without the secondary effects of cache flushing.

Next thing I wanted to ask is cumulative time spent spinning vs test 
duration, or in other words, CPU usage before and after.

And of course was the benefit on benchmarks results measurable, by how 
much, and what does the perf per Watt say?

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Sagar Kamble <sagar.a.kamble at intel.com>
> Cc: Eero Tamminen <eero.t.tamminen at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Ben Widawsky <ben at bwidawsk.net>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Michał Winiarski <michal.winiarski at intel.com>
> ---
>   drivers/gpu/drm/i915/Kconfig.profile | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> index a1aed0e2aad5..c8fe5754466c 100644
> --- a/drivers/gpu/drm/i915/Kconfig.profile
> +++ b/drivers/gpu/drm/i915/Kconfig.profile
> @@ -11,7 +11,7 @@ config DRM_I915_SPIN_REQUEST_IRQ
>   
>   config DRM_I915_SPIN_REQUEST_CS
>   	int
> -	default 2 # microseconds
> +	default 20 # microseconds
>   	help
>   	  After sleeping for a request (GPU operation) to complete, we will
>   	  be woken up on the completion of every request prior to the one
>