Fwd: [PATCH v1] drm/i915/guc: Always disable interrupt ahead of synchronize_irq

Dong, Zhanjun zhanjun.dong at intel.com
Mon Feb 3 23:37:10 UTC 2025


Just found my previous response click on "reply", not the "reply all", 
so add Cc list.

Regards,
Zhanjun Dong


-------- Forwarded Message --------
Subject: Re: [PATCH v1] drm/i915/guc: Always disable interrupt ahead of 
synchronize_irq
Date: Mon, 27 Jan 2025 17:17:33 -0500
From: Dong, Zhanjun <zhanjun.dong at intel.com>
To: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>

Please see my comments below.
Daniele raised a good point on GuC interrupt, I will dig deeper for more 
info.

Regards,
Zhanjun Dong

On 2025-01-27 12:12 p.m., Daniele Ceraolo Spurio wrote:
> 
> 
> 
> On 1/23/2025 8:23 AM, Zhanjun Dong wrote:
>> The purpose of synchronize_irq is to wait for any pending IRQ handlers 
>> for the
>> interrupt to complete, if synchronize_irq called before interrupt 
>> disabled, an
>> tiny timing window created, where no more pending IRQ, but interrupt not
>> disabled yet. Meanwhile, if the interrupt event happened in this 
>> timing window,
>> an unexpected IRQ handling will be triggered.
>>
>> Fixed by always disable interrupt ahead of synchronize_irq.
>>
>> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13454
>> Fixes: 26705e20752a ("drm/i915: Support for GuC interrupts")
>> Fixes: 54c52a841250 ("drm/i915/guc: Correctly handle GuC interrupts on 
>> Gen11")
>> Fixes: 2ae096872a2c ("drm/i915/pxp: Implement PXP irq handler")
>> Fixes: 3e7abf814193 ("drm/i915: Extract GT render power state 
>> management")
>>
>> Signed-off-by: Zhanjun Dong <zhanjun.dong at intel.com>
>>
>> ---
>> Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>> Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
>> Cc: Andi Shyti <andi.shyti at intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_rps.c      | 3 +--
>>   drivers/gpu/drm/i915/gt/uc/intel_guc.c   | 4 ++--
>>   drivers/gpu/drm/i915/pxp/intel_pxp_irq.c | 2 +-
>>   3 files changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/ 
>> i915/gt/intel_rps.c
>> index fa304ea088e4..0fe7a8d7f460 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
>> @@ -244,8 +244,8 @@ static void rps_disable_interrupts(struct 
>> intel_rps *rps)
>>       gen6_gt_pm_disable_irq(gt, GEN6_PM_RPS_EVENTS);
>>       spin_unlock_irq(gt->irq_lock);
>> +    rps_reset_interrupts(rps);
>>       intel_synchronize_irq(gt->i915);
> 
> I don't think this is an issue, because we set the irq mask in 
> gen6_gt_pm_disable_irq, so there is no chance of getting any new 
> interrupts after that. Not saying that we shouldn't do the re-order, but 
> we don't need a fixes tag for this.
Good point, the interrupt already disabled by gen6_gt_pm_disable_irq, 
the rps_reset_interrupts is just clear interrupt bits, so no risk here, 
therefor no issue at all.
Anyway, re-order is still a good practice, the interrupt bits set/and 
clear is all about the interrupt bits, these 2 actions could be consider 
logically tight relationship, the intel_synchronize_irq considered to be 
the next step.
So fixes tag should be removed. As not an issue fix, this work could be 
done separately.

> 
>> -
>>       /*
>>        * Now that we will not be generating any more work, flush any
>>        * outstanding tasks. As we are called on the RPS idle path,
>> @@ -254,7 +254,6 @@ static void rps_disable_interrupts(struct 
>> intel_rps *rps)
>>        */
>>       cancel_work_sync(&rps->work);
>> -    rps_reset_interrupts(rps);
>>       GT_TRACE(gt, "interrupts:off\n");
>>   }
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/ 
>> i915/gt/uc/intel_guc.c
>> index 5949ff0b0161..3e7b2c6cdca4 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> @@ -116,9 +116,9 @@ static void gen9_disable_guc_interrupts(struct 
>> intel_guc *guc)
>>       gen6_gt_pm_disable_irq(gt, gt->pm_guc_events);
>>       spin_unlock_irq(gt->irq_lock);
>> -    intel_synchronize_irq(gt->i915);
>>       gen9_reset_guc_interrupts(guc);
>> +    intel_synchronize_irq(gt->i915);
> 
> Same as above with gen6_gt_pm_disable_irq
> 
>>   }
>>   static bool __gen11_reset_guc_interrupts(struct intel_gt *gt)
>> @@ -154,9 +154,9 @@ static void gen11_disable_guc_interrupts(struct 
>> intel_guc *guc)
>>       struct intel_gt *gt = guc_to_gt(guc);
>>       guc->interrupts.enabled = false;
>> -    intel_synchronize_irq(gt->i915);
>>       gen11_reset_guc_interrupts(guc);
>> +    intel_synchronize_irq(gt->i915);
> 
> No early disabling here, but I don't think this change helps either. 
> AFAICS gen11_reset_guc_interrupts() only calls gen11_gt_reset_one_iir(), 
> which just clears the IIR bits for the GuC. There are no changes in 
> interrupt enable/mask status, so interrupts can still be generated. The 
> way interrupts are stopped for gen11+ is by setting guc- 
>  >interrupts.enabled, because that's checked from both guc_irq_handler() 
> and intel_guc_to_host_event_handler(), so any new interrupts generated 
> after we set that should be immediately dropped.
Good point! Because there are no changes in interrupt enable/mask 
status, guc interrupt won't stops, therefor synchronize irq won't make 
changes.
Let me dig deeper and get back to you once I have more info.

> 
>>   }
>>   static void guc_dead_worker_func(struct work_struct *w)
>> diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c b/drivers/gpu/ 
>> drm/i915/pxp/intel_pxp_irq.c
>> index d81750b9bdda..b82a667e7ac0 100644
>> --- a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
>> +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
>> @@ -101,9 +101,9 @@ void intel_pxp_irq_disable(struct intel_pxp *pxp)
>>       __pxp_set_interrupts(gt, 0);
>>       spin_unlock_irq(gt->irq_lock);
>> -    intel_synchronize_irq(gt->i915);
>>       pxp_irq_reset(gt);
>> +    intel_synchronize_irq(gt->i915);
> 
> Again not a bug here, __pxp_set_interrupts is doing the interrupts 
> disabling and that is already happening before intel_synchronize_irq().
Same to above rps case, not a bug.
> 
> Daniele
> 
>>       flush_work(&pxp->session_work);
>>   }
> 



More information about the Intel-gfx mailing list