[Intel-gfx] [PATCH v2] drm/i915: Apply a cond_resched() to the saturated signaler
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Tue Apr 4 12:38:36 UTC 2017
On 04/04/2017 13:05, Chris Wilson wrote:
> If the engine is continually completing nops, we can saturate the
> signaler and keep it working indefinitely. This angers the NMI watchdog!
>
> A good example is to disable semaphores on snb and run igt/gem_exec_nop -
> the parallel, multi-engine workloads are more than sufficient to hog the
> CPU, preventing the system from even processing ICMP echo replies.
>
> v2: Tvrtko dug into cond_resched() on x86 and found that it only
> depended upon preempt_cound and not tif_need_resched() - which means
> that we would always call schedule() at that point.
>
> Fixes: c81d46138da6 ("drm/i915: Convert trace-irq to the breadcrumb waiter")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
> drivers/gpu/drm/i915/intel_breadcrumbs.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 308c56a021ab..9ccbf26124c6 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -580,6 +580,8 @@ static int intel_breadcrumbs_signaler(void *arg)
> signaler_set_rtpriority();
>
> do {
> + bool do_schedule = true;
> +
> set_current_state(TASK_INTERRUPTIBLE);
>
> /* We are either woken up by the interrupt bottom-half,
> @@ -626,7 +628,18 @@ static int intel_breadcrumbs_signaler(void *arg)
> spin_unlock_irq(&b->rb_lock);
>
> i915_gem_request_put(request);
> - } else {
> +
> + /* If the engine is saturated we may be continually
> + * processing completed requests. This angers the
> + * NMI watchdog if we never let anything else
> + * have access to the CPU. Let's pretend to be nice
> + * and relinquish the CPU if we burn through the
> + * entire RT timeslice!
> + */
> + do_schedule = need_resched();
> + }
> +
> + if (unlikely(do_schedule)) {
> DEFINE_WAIT(exec);
>
> if (kthread_should_park())
>
My thinking was to add a check before the request assignment like:
rcu_read_lock();
request = ...;
if (request && !need_resched())
request = ...;
else
request = NULL;
rcu_read_unlock();
But this looks correct as well, maybe it is just my preference on what
would have been easier to understand.
I trust that you have tested it both for solving the NMI lockup detector
issue and that it doesn't affect the signaller latency a lot.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Regards,
Tvrtko
More information about the Intel-gfx
mailing list