[Intel-gfx] [PATCH v2] drm/i915: Apply a cond_resched() to the saturated signaler

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Tue Apr 4 12:38:36 UTC 2017


On 04/04/2017 13:05, Chris Wilson wrote:
> If the engine is continually completing nops, we can saturate the
> signaler and keep it working indefinitely. This angers the NMI watchdog!
>
> A good example is to disable semaphores on snb and run igt/gem_exec_nop -
> the parallel, multi-engine workloads are more than sufficient to hog the
> CPU, preventing the system from even processing ICMP echo replies.
>
> v2: Tvrtko dug into cond_resched() on x86 and found that it only
> depended upon preempt_cound and not tif_need_resched() - which means
> that we would always call schedule() at that point.
>
> Fixes: c81d46138da6 ("drm/i915: Convert trace-irq to the breadcrumb waiter")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
>  drivers/gpu/drm/i915/intel_breadcrumbs.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 308c56a021ab..9ccbf26124c6 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -580,6 +580,8 @@ static int intel_breadcrumbs_signaler(void *arg)
>  	signaler_set_rtpriority();
>
>  	do {
> +		bool do_schedule = true;
> +
>  		set_current_state(TASK_INTERRUPTIBLE);
>
>  		/* We are either woken up by the interrupt bottom-half,
> @@ -626,7 +628,18 @@ static int intel_breadcrumbs_signaler(void *arg)
>  			spin_unlock_irq(&b->rb_lock);
>
>  			i915_gem_request_put(request);
> -		} else {
> +
> +			/* If the engine is saturated we may be continually
> +			 * processing completed requests. This angers the
> +			 * NMI watchdog if we never let anything else
> +			 * have access to the CPU. Let's pretend to be nice
> +			 * and relinquish the CPU if we burn through the
> +			 * entire RT timeslice!
> +			 */
> +			do_schedule = need_resched();
> +		}
> +
> +		if (unlikely(do_schedule)) {
>  			DEFINE_WAIT(exec);
>
>  			if (kthread_should_park())
>

My thinking was to add a check before the request assignment like:

	rcu_read_lock();
	request = ...;
	if (request && !need_resched())
		request = ...;
	else
		request = NULL;
	rcu_read_unlock();

But this looks correct as well, maybe it is just my preference on what 
would have been easier to understand.

I trust that you have tested it both for solving the NMI lockup detector 
issue and that it doesn't affect the signaller latency a lot.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko



More information about the Intel-gfx mailing list