[PATCH v6 0/7] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault

Connor Abbott cwabbott0 at gmail.com
Tue May 20 14:42:49 UTC 2025


On Tue, May 20, 2025 at 10:19 AM Will Deacon <will at kernel.org> wrote:
>
> Hi Connor,
>
> On Thu, May 15, 2025 at 03:58:42PM -0400, Connor Abbott wrote:
> > drm/msm uses the stall-on-fault model to record the GPU state on the
> > first GPU page fault to help debugging. On systems where the GPU is
> > paired with a MMU-500, there were two problems:
> >
> > 1. The MMU-500 doesn't de-assert its interrupt line until the fault is
> >    resumed, which led to a storm of interrupts until the fault handler
> >    was called. If we got unlucky and the fault handler was on the same
> >    CPU as the interrupt, there was a deadlock.
> > 2. The GPU is capable of generating page faults much faster than we can
> >    resume them. GMU (GPU Management Unit) shares the same context bank
> >    as the GPU, so if there was a sudden spurt of page faults it would be
> >    effectively starved and would trigger a watchdog reset, made even
> >    worse because the GPU cannot be reset while there's a pending
> >    transaction leaving the GPU permanently wedged.
> >
> > Patches 1-2 and 4 fix the first problem by switching the IRQ to be a
> > threaded IRQ and then making drm/msm do its devcoredump work
> > synchronously in the threaded IRQ. Patch 4 is dependent on patches 1-2.
> > Patch 6 fixes the second problem and is dependent on patch 3. Patch 5 is
> > a cleanup for patch 4 and patch 7 is a subsequent further cleanup to get
> > rid of the resume_fault() callback once we switch resuming to being done
> > by the SMMU's fault handler.
>
> Thanks for reworking this; I think it looks much better now from the
> SMMU standpoint.
>
> > I've organized the series in the order that it should be picked up:
> >
> > - Patches 1-3 need to be applied to the iommu tree first.
>
> Which kernel version did you base these on? I can't see to apply the
> second patch, as you seem to have a stale copy of arm-smmu-qcom.c?
>
> Will

Sorry about that, for the next version I'll rebase on linux-next. I
was using an older version of msm-next for a while now.

Connor


More information about the Freedreno mailing list