[PATCH v2 0/3] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault
Connor Abbott
cwabbott0 at gmail.com
Mon Jan 20 15:46:44 UTC 2025
drm/msm uses the stall-on-fault model to record the GPU state on the
first GPU page fault to help debugging. On systems where the GPU is
paired with a MMU-500, there were two problems:
1. The MMU-500 doesn't de-assert its interrupt line until the fault is
resumed, which led to a storm of interrupts until the fault handler
was called. If we got unlucky and the fault handler was on the same
CPU as the interrupt, there was a deadlock.
2. The GPU is capable of generating page faults much faster than we can
resume them. GMU (GPU Management Unit) shares the same context bank
as the GPU, so if there was a sudden spurt of page faults it would be
effectively starved and would trigger a watchdog reset, made even
worse because the GPU cannot be reset while there's a pending
transaction leaving the GPU permanently wedged.
Patch 1 fixes the first problem and is independent of the rest of the
series. Patch 3 fixes the second problem and is dependent on patch 2, so
there will have to be some cross-tree coordination.
I've rebased this series on the latest linux-next to avoid rebase
troubles.
Signed-off-by: Connor Abbott <cwabbott0 at gmail.com>
---
Changes in v2:
- Remove unnecessary _irqsave when locking in IRQ handler (Robin)
- Reuse existing spinlock for CFIE manipulation (Robin)
- Lock CFCFG manipulation against concurrent CFIE manipulation
- Don't use timer to re-enable stall-on-fault. (Rob)
- Use more descriptive name for the function that re-enables
stall-on-fault if the cooldown period has ended. (Rob)
- Link to v1: https://lore.kernel.org/r/20250117-msm-gpu-fault-fixes-next-v1-0-bc9b332b5d0b@gmail.com
---
Connor Abbott (3):
iommu/arm-smmu: Fix spurious interrupts with stall-on-fault
iommu/arm-smmu-qcom: Make set_stall work when the device is on
drm/msm: Temporarily disable stall-on-fault after a page fault
drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 2 ++
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 +++
drivers/gpu/drm/msm/adreno/adreno_gpu.c | 42 +++++++++++++++++++++++++++-
drivers/gpu/drm/msm/adreno/adreno_gpu.h | 24 ++++++++++++++++
drivers/gpu/drm/msm/msm_iommu.c | 9 ++++++
drivers/gpu/drm/msm/msm_mmu.h | 1 +
drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 45 +++++++++++++++++++++++++++---
drivers/iommu/arm/arm-smmu/arm-smmu.c | 30 ++++++++++++++++++++
drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 -
9 files changed, 152 insertions(+), 6 deletions(-)
---
base-commit: 0907e7fb35756464aa34c35d6abb02998418164b
change-id: 20250117-msm-gpu-fault-fixes-next-96e3098023e1
Best regards,
--
Connor Abbott <cwabbott0 at gmail.com>
More information about the Freedreno
mailing list