[PATCH v8 0/7] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault

Will Deacon will at kernel.org
Wed May 21 11:01:17 UTC 2025


On Tue, 20 May 2025 15:08:53 -0400, Connor Abbott wrote:
> drm/msm uses the stall-on-fault model to record the GPU state on the
> first GPU page fault to help debugging. On systems where the GPU is
> paired with a MMU-500, there were two problems:
> 
> 1. The MMU-500 doesn't de-assert its interrupt line until the fault is
>    resumed, which led to a storm of interrupts until the fault handler
>    was called. If we got unlucky and the fault handler was on the same
>    CPU as the interrupt, there was a deadlock.
> 2. The GPU is capable of generating page faults much faster than we can
>    resume them. GMU (GPU Management Unit) shares the same context bank
>    as the GPU, so if there was a sudden spurt of page faults it would be
>    effectively starved and would trigger a watchdog reset, made even
>    worse because the GPU cannot be reset while there's a pending
>    transaction leaving the GPU permanently wedged.
> 
> [...]

Applied first three SMMU driver changes to iommu (arm/smmu/updates), thanks!

[1/7] iommu/arm-smmu-qcom: Enable threaded IRQ for Adreno SMMUv2/MMU500
      https://git.kernel.org/iommu/c/1650620774fa
[2/7] iommu/arm-smmu: Move handing of RESUME to the context fault handler
      https://git.kernel.org/iommu/c/3053a2c5086d
[3/7] iommu/arm-smmu-qcom: Make set_stall work when the device is on
      https://git.kernel.org/iommu/c/70892277ca2d

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev


More information about the Freedreno mailing list