[Intel-gfx] [PATCH v3 1/2] drm/i915: Expose the busyspin durations for i915_wait_request

Mon Nov 27 07:25:05 UTC 2017


On 11/27/2017 12:50 PM, Sagar Arun Kamble wrote:
>
>
> On 11/26/2017 5:50 PM, Chris Wilson wrote:
>> An interesting discussion regarding "hybrid interrupt polling" for NVMe
>> came to the conclusion that the ideal busyspin before sleeping
>
> I think hybrid approach suggests sleep (1/2 duration) and then 
> busyspin (1/2 duration)
> (for small I/O size), so this should be "busyspin after sleeping" 
> although we are not doing exactly same.
>
>>   was half
>> of the expected request latency (and better if it was already halfway
>> through that request). This suggested that we too should look again at
>> our tradeoff between spinning and waiting. Currently, our spin simply
>> tries to hide the cost of enabling the interrupt, which is good to avoid
>> penalising nop requests (i.e. test throughput) and not much else.
>> Studying real world workloads suggests that a spin of upto 500us can
>> dramatically boost performance, but the suggestion is that this is not
>> from avoiding interrupt latency per-se, but from secondary effects of
>> sleeping such as allowing the CPU reduce cstate and context switch away.
>>
>> v2: Expose the spin setting via Kconfig options for easier adjustment
>> and testing.
>> v3: Don't get caught sneaking in a change to the busyspin parameters.
>>
>> Suggested-by: Sagar Kamble <sagar.a.kamble at intel.com>
>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Sagar Kamble <sagar.a.kamble at intel.com>
>> Cc: Eero Tamminen <eero.t.tamminen at intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> Cc: Ben Widawsky <ben at bwidawsk.net>
>> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>> Cc: Michał Winiarski <michal.winiarski at intel.com>
>> ---
>>   drivers/gpu/drm/i915/Kconfig            |  6 ++++++
>>   drivers/gpu/drm/i915/Kconfig.profile    | 23 +++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_gem_request.c | 28 
>> ++++++++++++++++++++++++----
>>   3 files changed, 53 insertions(+), 4 deletions(-)
>>   create mode 100644 drivers/gpu/drm/i915/Kconfig.profile
>>
>> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
>> index dfd95889f4b7..eae90783f8f9 100644
>> --- a/drivers/gpu/drm/i915/Kconfig
>> +++ b/drivers/gpu/drm/i915/Kconfig
>> @@ -131,3 +131,9 @@ depends on DRM_I915
>>   depends on EXPERT
>>   source drivers/gpu/drm/i915/Kconfig.debug
>>   endmenu
>> +
>> +menu "drm/i915 Profile Guided Optimisation"
>> +    visible if EXPERT
>> +    depends on DRM_I915
>> +    source drivers/gpu/drm/i915/Kconfig.profile
>> +endmenu
>> diff --git a/drivers/gpu/drm/i915/Kconfig.profile 
>> b/drivers/gpu/drm/i915/Kconfig.profile
>> new file mode 100644
>> index 000000000000..a1aed0e2aad5
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/Kconfig.profile
>> @@ -0,0 +1,23 @@
>> +config DRM_I915_SPIN_REQUEST_IRQ
>> +    int
>> +    default 5 # microseconds
>> +    help
>> +      Before sleeping waiting for a request (GPU operation) to 
>> complete,
>> +      we may spend some time polling for its completion. As the IRQ may
>> +      take a non-negligible time to setup, we do a short spin first to
>> +      check if the request will complete quickly.
>> +
>> +      May be 0 to disable the initial spin.
>> +
>> +config DRM_I915_SPIN_REQUEST_CS
>> +    int
>> +    default 2 # microseconds
>> +    help
>> +      After sleeping for a request (GPU operation) to complete, we will
>> +      be woken up on the completion of every request prior to the one
>> +      being waited on. For very short requests, going back to sleep and
>> +      be woken up again may add considerably to the wakeup latency. To
>> +      avoid incurring extra latency from the scheduler, we may 
>> choose to
>> +      spin prior to sleeping again.
>> +
>> +      May be 0 to disable spinning after being woken.
>> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c 
>> b/drivers/gpu/drm/i915/i915_gem_request.c
>> index a90bdd26571f..7ac72a0a949c 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_request.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
>> @@ -1198,8 +1198,21 @@ long i915_wait_request(struct 
>> drm_i915_gem_request *req,
>>       GEM_BUG_ON(!intel_wait_has_seqno(&wait));
>>       GEM_BUG_ON(!i915_sw_fence_signaled(&req->submit));
>>   -    /* Optimistic short spin before touching IRQs */
>> -    if (__i915_spin_request(req, wait.seqno, state, 5))
>> +    /* Optimistic spin before touching IRQs.
>> +     *
>> +     * We may use a rather large value here to offset the penalty of
>> +     * switching away from the active task. Frequently, the client will
>> +     * wait upon an old swapbuffer to throttle itself to remain 
>> within a
>> +     * frame of the gpu. If the client is running in lockstep with 
>> the gpu,
>> +     * then it should not be waiting long at all, and a sleep now 
>> will incur
>> +     * extra scheduler latency in producing the next frame. So we sleep
>> +     * for longer to try and keep the client running.
>> +     *
>> +     * We need ~5us to enable the irq, ~20us to hide a context switch.
>
> This comment fits more with next patch.
>
>> +     */
>> +    if (CONFIG_DRM_I915_SPIN_REQUEST_IRQ &&
>
> If this is set to 0 we still want to sleep and not go to complete.
Hit wicket :) ... Ignore this comment.
With above comments above commit message and comment update this change 
looks fine to me.
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble at intel.com>
>
>> +        __i915_spin_request(req, wait.seqno, state,
>> +                CONFIG_DRM_I915_SPIN_REQUEST_IRQ))
>>           goto complete;
>>         set_current_state(state);
>> @@ -1255,8 +1268,15 @@ long i915_wait_request(struct 
>> drm_i915_gem_request *req,
>>               __i915_wait_request_check_and_reset(req))
>>               continue;
>>   -        /* Only spin if we know the GPU is processing this request */
>> -        if (__i915_spin_request(req, wait.seqno, state, 2))
>> +        /*
>> +         * A quick spin now we are on the CPU to offset the cost of
>> +         * context switching away (and so spin for roughly the same as
>> +         * the scheduler latency). We only spin if we know the GPU is
>> +         * processing this request, and so likely to finish shortly.
>> +         */
>> +        if (CONFIG_DRM_I915_SPIN_REQUEST_CS &&
>
> Same here.
>
>> +            __i915_spin_request(req, wait.seqno, state,
>> +                    CONFIG_DRM_I915_SPIN_REQUEST_CS))
>>               break;
>>             if (!intel_wait_check_request(&wait, req)) {
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx