[PATCH v3 1/3] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault
Rob Clark
robdclark at gmail.com
Tue Jan 28 22:09:48 UTC 2025
On Tue, Jan 28, 2025 at 3:08 AM Prakash Gupta <quic_guptap at quicinc.com> wrote:
>
> On Thu, Jan 23, 2025 at 03:14:16PM -0500, Connor Abbott wrote:
> > On Thu, Jan 23, 2025 at 2:26 PM Prakash Gupta <quic_guptap at quicinc.com> wrote:
> > >
> > > On Thu, Jan 23, 2025 at 09:00:17AM -0500, Connor Abbott wrote:
> > > > On Thu, Jan 23, 2025 at 6:10 AM Prakash Gupta <quic_guptap at quicinc.com> wrote:
> > > > >
> > > > > On Wed, Jan 22, 2025 at 03:00:58PM -0500, Connor Abbott wrote:
> > > > > The context would remain stalled till we write to CBn_RESUME. Which is done
> > > > > in qcom_adreno_smmu_resume_translation(). For a stalled context further
> > > > > transactions are not processed and we shouldn't see further faults and
> > > > > or fault inerrupts. Do you observe faults with stalled context?
> > > >
> > > > Yes. I've observed that on MMU-500 writing RESUME before the interrupt
> > > > has been cleared doesn't clear SS. This happened with v2 in the case
> > > > where there was already a devcoredump and drm/msm called
> > > > qcom_adreno_smmu_resume_translation() immediately from its fault
> > > > handler, and we'd get a storm of unhandled interrupts until it was
> > > > disabled. Given that the architecture spec says we're supposed to
> > > > clear the interrupt first this may have been an attempt to "help"
> > > > developers.
> > > >
> > >
> > > Just to clarify, present sequence is:
> > > 1. context fault with stall enable. FSR.SS set.
> > > 2. Report fault to client
> > > 3. resume/terminate stalled transaction
> > > 4. clear FSR
> > >
> > > At what point when you try #2->#3->#4 or #4->#2->#3 sequence, is FSR.SS
> > > cleared and interrupt storm is observed.
> >
> > With #2->#3->#4 FSR.SS is *not* cleared and there is a subsequent
> > interrupt storm with only FSR.SS asserted. With #4->#2->#3 there is no
> > interrupt storm. From this we conclude that MMU-500 does not clear
> > FSR.SS unless #4 happens before #3.
> >
> Thanks Connor for clarification. I get your point now. I think it's
> expected interrupt storm with #2->#3->#4 sequence is expected. With
> CONFIG_ARM_SMMU_QCOM_DEBUG enabled, context fault follows the sequence of
> #1->#2->#4->#3, which is spec recommended.
>
> so, common fault handler can be modified to follow the same sequence, but I
> have concern regarding clearing FSR before reporting fault to client.
> qcom_adreno_smmu_get_fault_info() is an example I gave but other client
> handler may have similar expecation of fault register not changed before
> client fault handler is called.
Simple solution would be to move the clearing of FSR to after the
fault is reported. It doesn't really matter if it is before or after,
as long as it happens before the irq handler returns, AFAIU. And
drm/msm will collect the fault info from the irq handler.
BR,
-R
> > > The way CFIE disable is helping
> > > with current patch indicates write FSR is unstalling context and subsequent
> > > transactions are faulted.
> >
> > No, it does not indicate that. The interrupt storm happens even when
> > there is exactly one context fault, and when the interrupt storm
> > happens *only* FSR.SS is asserted. I've verified this with debug
> > prints. Once more, MMU-500 will assert an interrupt when only FSR.SS
> > is asserted. This has nothing to do with subsequent transactions.
> >
> > > Do you stop getting interrupt storm after write
> > > RESUME.
> >
> > Yes, as long as the write to RESUME happens after all other bits in
> > FSR are cleared.
> >
> > > If you can mention your SCTLR configuration and FSR state in test
> > > sequence, it would be clearer.
> >
> > SCTLR has both HUPCF and CFCFG enabled.
> >
> > >
> > > An aside, If reducing delay between FSR and RESUME write helps then both
> > > can be done as part of qcom_adreno_smmu_resume_translation(). This will
> > > follow spec recommendation and also make sure fault registers are not
> > > cleared before reporting fault to client.
> I think this sequence should address the issue you are observing.
>
> > >
> > > Thanks,
> > > Prakash
More information about the Freedreno
mailing list