[PATCH] drm/i915/guc: Add synchronization on interrupt enable flag

Andi Shyti andi.shyti at kernel.org
Wed Aug 20 22:05:46 UTC 2025


Hi Zhanjun,

...

> > On Tue, Aug 19, 2025 at 12:00:10PM -0400, Zhanjun Dong wrote:
> > > Boolean flag access from interrupt context might have synchronous issue on
> > 
> > /synchronous/synchronization/
> > /issue/issues/
> > 
> > > multiple processor platform, flage modified by one core might be read as an
> > 
> > /flage/flags/
> > 
> > > old value by another core. This issue on interrupt enable flag might causes
> > > interrupt missing or leakage.
> > 
> > /missing/misses/
> > 
> > > Fixed by change the data type as automic to add memory synchronization.
> > 
> > This can be re-written: "Change the interrupts.enable type to
> > atomic to ensure memory synchronization."
> Will follow all above comments

No need to resend for this if the patch is fine.

> > > Fixes: a187f13d51fa0 ("drm/i915/guc: handle interrupts from media GuC")
> > 
> > What issue are you exactly trying to fix? And have you tested
> Will add:
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14834
> 
> > that this patch is this really fixing it? Where is the flag's
> CI report this issue about every 1 or 2 months and close it because no
> reproduce.
> I can reproduce it in about 1000 rounds(about 9 hours) of IGT test but not
> at 100% rate, so it is hard to say really fixed because of hard to
> reproduce.
> My latest test runs 2200 rounds in total has no reproduce.
> 
> > misalignment happening?
> > 
> > Please remember that when in interrupt context you are already
> > running in atomic, so that probably you don't need to have an
> > additional atomic access to variables.
> Interrupt context only read it. When the flag was changed and interrupt was
> triggered in very short time, the flag read at interrupt context might read
> out old value, if other core(non interrupt context) set the flag and ISR
> read out 0, ISR will do simple return and misses interrupt handling, making
> it appear as lost interrupt; if other core clear the flag and ISR read out
> 1, ISR will continue to run and not stop as expected, making it appear as an
> interrupt leak. >

Heh... it looks to me very unlikely to happen, but if you say
that this fixes it, then I'm OK with the patch.

I see still one case missing: the error comes here:

  ct_try_receive_message+0x336/0xa10

that is the tasklet that starts after the IRQ. Has the flag
changed from the irq to the tasklet? :-)

Would it make sense to add something like:

@@ -1338,6 +1338,9 @@ static void ct_receive_tasklet_func(struct tasklet_struct *t)
 {
        struct intel_guc_ct *ct = from_tasklet(ct, t, receive_tasklet);

+       if (!atomic_read(&guc->interrupts.enabled))
+               return;
+
        ct_try_receive_message(ct);
 }

BTW, have you tried adding a simple memory barrier (even though
in the i915 community people dislike memory barriers, but with a
proper explanation it might save us all this churn).

Thanks,
Andi


More information about the Intel-gfx mailing list