[Intel-gfx] [PATCH 2/2] drm/i915: Postpone fake breadcrumb interrupt if a real interrupt is pending

Thu Feb 16 11:17:11 UTC 2017

On 16/02/2017 10:47, Chris Wilson wrote:
> On Thu, Feb 16, 2017 at 10:38:08AM +0000, Tvrtko Ursulin wrote:
>>
>> On 16/02/2017 09:29, Chris Wilson wrote:
>>> If the timer expires for enabling the fake interrupt, check to see
>>> if there is a real interrupt queued before making the decision to start
>>> polling. This helps in situations where we have a very slow irq-seqno
>>> barrier that may accrue more breadcrumb interrupts before it is able to
>>> catch up. It still leaves a hole for the timer to expire as we are
>>> processing the last irq-seqno barrier, but it appears to help reduce the
>>> frequency of "missed-interrupts" on Ironlake, at least.
>>>
>>> References: https://bugs.freedesktop.org/show_bug.cgi?id=99816
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> ---
>>> drivers/gpu/drm/i915/intel_breadcrumbs.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> index ef3adfd37d7d..21269421bd2a 100644
>>> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> @@ -41,6 +41,11 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
>>> 		return;
>>> 	}
>>>
>>> +	if (test_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
>>> +		mod_timer(&b->hangcheck, jiffies + 1);
>>> +		return;
>>> +	}
>>> +
>>> 	DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
>>> 	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
>>> 	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
>>>
>>
>> Another hmm :), barriers are so much shorter than the hangcheck
>> interval (1500ms) so I don't quite understand what is the problem.
>
> We may queue the wait many, many seconds before it is even being
> processed. The point of this timer is to detect when we haven't seen an
> interrupt for some time and that happens to be the waiter missed (backup
> for seqno-barrier failing).

But it is the same as a slow batch which completes just as the hangcheck 
timer fires.

Is the problem is seqno barrier on some platforms can slow down the 
signaller thread a lot? Hm, if the request duration is right it would 
sleep on every invocation. So in effect extend the request duration as 
seen form userspace by the barrier duration. Why would that be a problem 
though? Haven't fully figured it out.

>> Should we instead put a mod_timer when raising the user irq?
>
> Frequency, mod_timer may require reprogramming of the apic and often
> does when profiling ;)

Agreed on this one so no complaints, just trying to understand the 
precise problem.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko