[Intel-gfx] [PATCH v2 3/3] drm/i915: Defer declaration of missed-interrupt until the waiter is asleep
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Feb 17 11:33:26 UTC 2017
On 17/02/2017 10:58, Chris Wilson wrote:
> On Fri, Feb 17, 2017 at 10:48:50AM +0000, Tvrtko Ursulin wrote:
>>
>> On 17/02/2017 10:18, Chris Wilson wrote:
>>> If the waiter was currently running, assume it hasn't had a chance
>>> to process the pending interupt (e.g, low priority task on a loaded
>>> system) and wait until it sleeps before declaring a missed interrupt.
>>>
>>> References: https://bugs.freedesktop.org/show_bug.cgi?id=99816
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>>> ---
>>> drivers/gpu/drm/i915/intel_breadcrumbs.c | 9 +++++++++
>>> 1 file changed, 9 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> index 4395b177493e..2ad29fb77b2d 100644
>>> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> @@ -45,6 +45,15 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
>>> return;
>>> }
>>>
>>> + /* If the waiter was currently running, assume it hasn't had a chance
>>> + * to process the pending interupt (e.g, low priority task on a loaded
>>> + * system) and wait until it sleeps before declaring a missed interrupt.
>>> + */
>>> + if (!intel_engine_wakeup(engine)) {
>>> + mod_timer(&b->hangcheck, wait_timeout());
>>> + return;
>>> + }
>>> +
>>> DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
>>> set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
>>> mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
>>>
>>
>> Change here is that we would never declare a GPU hang is userspace
>> would just wait indefinitely, or in other words with this patch we
>> would rely on userspace timing out on their waits in order to
>> declare a hang.
>
> Surely you mean the other way around? The only way we get to now declare a
> missed-interrupt and then queue a hangcheck here is if userspace sleeps.
>
>> Hm, in fact even with the current code, if the userspace keeps
>> exiting and re-entering the wait we would be re-arming the hangcheck
>> timer and so also never notice a GPU hang.
>
> Correct. It is not the only way we arm the GPU hangcheck.
> gem_busy/hang, gem_wait/busy-hang check that we do detect hangs even if
> userspace never sleeps.
Looks good after some more digging through the code and a brief IRC
discussion. We only fall back to rapid wakeups (fake_irq) if there are
waiters now, which is inline with the rest of the code.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Regards,
Tvrtko
More information about the Intel-gfx
mailing list