[Intel-gfx] [PATCH v5 4/4] drm/i915: Delay disabling the user interrupt for breadcrumbs

Mon Feb 27 19:05:59 UTC 2017

On 27/02/2017 14:13, Chris Wilson wrote:
> On Mon, Feb 27, 2017 at 02:06:58PM +0000, Chris Wilson wrote:
>> On Mon, Feb 27, 2017 at 01:57:35PM +0000, Tvrtko Ursulin wrote:
>>>
>>> On 27/02/2017 13:24, Chris Wilson wrote:
>>>> 	if (b->hangcheck_interrupts != atomic_read(&engine->irq_count)) {
>>>> @@ -67,7 +76,7 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
>>>> 	 * to process the pending interrupt (e.g, low priority task on a loaded
>>>> 	 * system) and wait until it sleeps before declaring a missed interrupt.
>>>> 	 */
>>>> -	if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ACTIVE) {
>>>> +	if (!(intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP)) {
>>>
>>> I did not get the explanation from the previous round on why you had
>>> to reverse the active to asleep. Here it even looks wrong now,
>>> because I thought you don't want to re-queue the hangcheck when
>>> there are no waiters?
>>
>> No waiters: result = 0
>> Running waiter: result = WAKEUP_WAITER
>> Sleeping waiter: result = WAKEUP_WAITER | WAKEUP_ASLEEP
>>
>> We only want to declare a mised irq if we wake up a sleeping waiter, and
>> keep the hangcheck timer running until the device is idle (when the irq
>> is disarmed).
>>
>> How about:
>>
>> if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP) {
>> 	set_bit(missed_irq);
>> 	mod_timer(&b->fake_irq, jiffies + 1);
>> } else {
>> 	mod_timer(b->hangcheck, wait_timeout);
>> }
>
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index ebb7bc0be9fb..004eb4c0c531 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -72,18 +72,25 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
>                 return;
>         }
>
> -       /* If the waiter was currently running, assume it hasn't had a chance
> +       /* We keep the hangcheck time alive until we disarm the irq, even
> +        * if there are no waiters at present.
> +        *
> +        * If the waiter was currently running, assume it hasn't had a chance
>          * to process the pending interrupt (e.g, low priority task on a loaded
>          * system) and wait until it sleeps before declaring a missed interrupt.
> +        *
> +        * If the waiter was asleep (and not even pending a wakeup), then we
> +        * must have missed an interrupt as the GPU has stopped advancing
> +        * but we still have a waiter. Assuming all batches complete within
> +        * DRM_I915_HANGCHECK_JIFFIES [1.5s]!
>          */
> -       if (!(intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP)) {
> +       if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP) {
> +               DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
> +               set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
> +               mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
> +       } else {
>                 mod_timer(&b->hangcheck, wait_timeout());
> -               return;
>         }
> -
> -       DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
> -       set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
> -       mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
>  }
>
>  static void intel_breadcrumbs_fake_irq(unsigned long data)
> @@ -167,6 +174,10 @@ void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
>
>         spin_lock_irqsave(&b->lock, flags);
>
> +       /* We only disarm the irq when we are idle (all requests completed),
> +        * so if there remains a sleeping waiter, it missed the request
> +        * completion.
> +        */
>         if (__intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP)
>                 set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
>
>
>

Looks okay after some re-re-re-reading. I guess shuffling things back 
and forth in this series, not long after the area was recently changed 
as well, was a bit challenging.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko