[PATCH v5 3/5] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault

Thu May 15 17:13:42 UTC 2025

On Thu, May 15, 2025 at 10:47 AM Will Deacon <will at kernel.org> wrote:
>
> On Tue, May 06, 2025 at 11:18:44AM -0400, Connor Abbott wrote:
> > On Tue, May 6, 2025 at 10:53 AM Will Deacon <will at kernel.org> wrote:
> > >
> > > On Tue, May 06, 2025 at 10:08:05AM -0400, Connor Abbott wrote:
> > > > On Tue, May 6, 2025 at 8:24 AM Will Deacon <will at kernel.org> wrote:
> > > > > On Wed, Mar 19, 2025 at 10:44:02AM -0400, Connor Abbott wrote:
> > > > > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > > > index c7b5d7c093e71050d29a834c8d33125e96b04d81..9927f3431a2eab913750e6079edc6393d1938c98 100644
> > > > > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > > > @@ -470,13 +470,52 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> > > > > >       if (!(cfi->fsr & ARM_SMMU_CB_FSR_FAULT))
> > > > > >               return IRQ_NONE;
> > > > > >
> > > > > > +     /*
> > > > > > +      * On some implementations FSR.SS asserts a context fault
> > > > > > +      * interrupt. We do not want this behavior, because resolving the
> > > > > > +      * original context fault typically requires operations that cannot be
> > > > > > +      * performed in IRQ context but leaving the stall unacknowledged will
> > > > > > +      * immediately lead to another spurious interrupt as FSR.SS is still
> > > > > > +      * set. Work around this by disabling interrupts for this context bank.
> > > > > > +      * It's expected that interrupts are re-enabled after resuming the
> > > > > > +      * translation.
> > > > >
> > > > > s/translation/transaction/
> > > > >
> > > > > > +      *
> > > > > > +      * We have to do this before report_iommu_fault() so that we don't
> > > > > > +      * leave interrupts disabled in case the downstream user decides the
> > > > > > +      * fault can be resolved inside its fault handler.
> > > > > > +      *
> > > > > > +      * There is a possible race if there are multiple context banks sharing
> > > > > > +      * the same interrupt and both signal an interrupt in between writing
> > > > > > +      * RESUME and SCTLR. We could disable interrupts here before we
> > > > > > +      * re-enable them in the resume handler, leaving interrupts enabled.
> > > > > > +      * Lock the write to serialize it with the resume handler.
> > > > > > +      */
> > > > >
> > > > > I'm struggling to understand this last part. If the resume handler runs
> > > > > synchronously from report_iommu_fault(), then there's no need for
> > > > > locking because we're in interrupt context. If the resume handler can
> > > > > run asynchronously from report_iommu_fault(), then the locking doesn't
> > > > > help because the code below could clear CFIE right after the resume
> > > > > handler has set it.
> > > >
> > > > The problem is indeed when the resume handler runs asynchronously.
> > > > Clearing CFIE right after the resume handler has set it is normal and
> > > > expected. The issue is the opposite, i.e. something like:
> > > >
> > > > - Resume handler writes RESUME and stalls for some reason
> > > > - The interrupt handler runs through and clears CFIE while it's already cleared
> > > > - Resume handler sets CFIE, assuming that the handler hasn't run yet
> > > > but it actually has
> > > >
> > > > This wouldn't happen with only one context bank, because we wouldn't
> > > > get an interrupt until the resume handler sets CFIE, but with multiple
> > > > context banks and a shared interrupt line we could get a "spurious"
> > > > interrupt due to a fault in an earlier context bank that becomes not
> > > > spurious if the resume handler writes RESUME before the context fault
> > > > handler for this bank reads FSR above.
> > >
> > > Ah, gotcha. Thanks for the explanation.
> > >
> > > If we moved the RESUME+CFIE into the interrupt handler after the call
> > > to report_iommu_fault(), would it be possible to run the handler as a
> > > threaded irq (see 'context_fault_needs_threaded_irq') and handle the
> > > callback synchronously? In that case, I think we could avoid taking the
> > > lock if we wrote CFIE _before_ RESUME.
> > >
> >
> > We need the lock anyway due to the parallel manipulation of CFCFG in
> > the same register introduced in the next patch. Expanding it to also
> > cover the write to RESUME is not a huge deal. Also, doing it
> > synchronously would require rewriting the fault handling in drm/msm
> > and again I'm trying to fix this serious stability problem now as soon
> > as possible without getting dragged into rewriting the whole thing.
>
> This has never worked though, right? In which case, we should fix it
> properly rather than papering over the mess.

It has never worked upstream. Which means that everyone is either
carrying this series downstream, or blowing up sometimes. The number
of places carrying this series is quickly multiplying, so it's
becoming more and more painful that this isn't upstream. Not to
mention the downstreams that still aren't aware of this and hang the
whole system sometimes.

>
> Georgi (CC'd) added support for threaded interrupts specifically to
> permit sleeping operations in the fault handler. You should be able to
> use that and I don't understand why that would require "rewriting the
> whole thing". You can kick the async work and then wait for it to
> complete, no?

It would certainly require rewriting the iommu side of it, although it
does get simpler.

Properly handling the drm/msm side of it would also require getting
rid of the fault worker, although I suppose we could just wait on it
as a hack.

I've started trying to see how it would look, but the biggest problem
is that it's going to introduce a lot of complicated cross-tree
dependencies. To fully follow the recommended sequence, we'd have to
do something like:

1. Enable threaded IRQ on Adreno SMMU.
2. Make drm/msm do its devcoredump business immediately.
3. Make iommu in charge of writing RESUME and do it after writing FSR.

The problem is that if we only do 1 and 2, things will be way worse
than before. Instead of some duplicate faults while the devcoredump is
pending (that sometimes, but not always, results in a full system hang
if devcoredump is scheduled on the same core) the interrupt will never
be cleared, due to the MMU-500 behavior of ignoring writes to RESUME
if FSR isn't cleared, and the entire system will hang every time
there's a context fault.

I suppose I could put 3 before 2, and temporarily break devcoredumps?

>
> That would then open the door to handling the RESUME in the core driver
> in future based on the return value from report_iommu_fault().
>
> You also need to fix qcom_tbu_halt() as I mentioned before.
>
> Will