[PATCH v4 0/7] Improve GPU Recovery
Akhil P Oommen
quic_akhilpo at quicinc.com
Wed Aug 17 15:14:13 UTC 2022
Recently, I debugged a few device crashes which occured during recovery
after a hangcheck timeout. It looks like there are a few things we can
do to improve our chance at a successful gpu recovery.
First one is to ensure that CX GDSC collapses which clears the internal
states in gpu's CX domain. First 5 patches tries to handle this.
Rest of the patches are to ensure that few internal blocks like CP, GMU
and GBIF are halted properly before proceeding for a snapshot followed by
recovery. Also, handle 'prepare slumber' hfi failure correctly. These
are A6x specific improvements.
This series is rebased on top of v2 version of [1] which is based on
linus's master branch.
[1] https://patchwork.freedesktop.org/series/106860/
Changes in v4:
- Keep active_submit lock across the suspend & resume (Rob)
- Clear gpu->active_submits to silence a WARN() during runpm suspend (Rob)
Changes in v3:
- Use reset interface from gpucc driver to poll for cx gdsc collapse
https://patchwork.freedesktop.org/series/106860/
- Use single pm refcount for all active submits
Changes in v2:
- Rebased on msm-next tip
Akhil P Oommen (7):
drm/msm: Remove unnecessary pm_runtime_get/put
drm/msm: Take single rpm refcount on behalf of all submits
drm/msm: Correct pm_runtime votes in recover worker
drm/msm: Fix cx collapse issue during recovery
drm/msm/a6xx: Ensure CX collapse during gpu recovery
drm/msm/a6xx: Improve gpu recovery sequence
drm/msm/a6xx: Handle GMU prepare-slumber hfi failure
drivers/gpu/drm/msm/adreno/a6xx.xml.h | 4 ++
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 83 ++++++++++++++++++++++-------------
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 43 ++++++++++++++++--
drivers/gpu/drm/msm/msm_gpu.c | 21 ++++++---
drivers/gpu/drm/msm/msm_gpu.h | 4 ++
drivers/gpu/drm/msm/msm_ringbuffer.c | 4 --
6 files changed, 114 insertions(+), 45 deletions(-)
--
2.7.4
More information about the dri-devel
mailing list