[Intel-xe] [PATCH] drm/xe: fix pvc unload issue

Lucas De Marchi lucas.demarchi at intel.com
Tue Apr 4 18:48:39 UTC 2023


On Mon, Apr 03, 2023 at 10:20:31PM +0000, Chang, Bruce wrote:
>Currently, unload pvc driver will generate a null dereference
>and the call stack is as below.
>
>[ 4850.618000] Call Trace:
>[ 4850.620740]  <TASK>
>[ 4850.623134]  ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
>[ 4850.628661]  ttm_bo_release+0x154/0x2c0 [ttm]
>[ 4850.633317]  ? drm_buddy_fini+0x62/0x80 [drm_buddy]
>[ 4850.638487]  ? __kmem_cache_free+0x27d/0x2c0
>[ 4850.643054]  ttm_bo_put+0x38/0x60 [ttm]
>[ 4850.647190]  xe_gem_object_free+0x1f/0x30 [xe]
>[ 4850.651945]  drm_gem_object_free+0x1e/0x30 [drm]
>[ 4850.656904]  ggtt_fini_noalloc+0x9d/0xe0 [xe]
>[ 4850.661574]  drm_managed_release+0xb5/0x150 [drm]
>[ 4850.666617]  drm_dev_release+0x30/0x50 [drm]
>[ 4850.671209]  devm_drm_dev_init_release+0x3c/0x60 [drm]
>
>There are a couple issues, but the main one is due to TTM has only
>one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
>1 TTM_PL_TT each tile. The second will overwrite the first one.
>
>During unload time, the first tile will reset the TTM_PL_TT manger
>and when the second tile is trying to free Bo and it will generate
>the null reference since the TTM manage is already got reset to 0.
>
>The fix is to use one global TTM_PL_TT manager.
>
>v2: make gtt mgr global and change the name to sys_mgr
>
>Cc: Stuart Summers <stuart.summers at intel.com>
>Cc: Matthew Brost <matthew.brost at intel.com>
>Cc: Vivi, Rodrigo <rodrigo.vivi at intel.com>
>Signed-off-by: Bruce Chang <yu.bruce.chang at intel.com>
>---
> drivers/gpu/drm/xe/Makefile                   |  2 +-
> drivers/gpu/drm/xe/xe_device.c                |  2 +
> drivers/gpu/drm/xe/xe_device.h                |  1 +
> drivers/gpu/drm/xe/xe_device_types.h          |  2 +
> drivers/gpu/drm/xe/xe_gt.c                    | 18 -----
> drivers/gpu/drm/xe/xe_gt_types.h              |  2 -
> drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h           | 16 -----
> drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h     | 18 -----
> .../xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} | 67 +++++++------------
> 9 files changed, 31 insertions(+), 97 deletions(-)
> delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
> delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
> rename drivers/gpu/drm/xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} (52%)
>
>diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>index 669c7b6f552c..7e19a15af3dc 100644
>--- a/drivers/gpu/drm/xe/Makefile
>+++ b/drivers/gpu/drm/xe/Makefile
>@@ -83,7 +83,7 @@ xe-y += xe_bb.o \
> 	xe_step.o \
> 	xe_sync.o \
> 	xe_trace.o \
>-	xe_ttm_gtt_mgr.o \
>+	xe_ttm_sys_mgr.o \
> 	xe_ttm_stolen_mgr.o \
> 	xe_ttm_vram_mgr.o \
> 	xe_tuning.o \
>diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>index a79f934e3d2d..4fb9aff27686 100644
>--- a/drivers/gpu/drm/xe/xe_device.c
>+++ b/drivers/gpu/drm/xe/xe_device.c
>@@ -279,6 +279,8 @@ int xe_device_probe(struct xe_device *xe)
> 	if (err)
> 		goto err_irq_shutdown;
>
>+	xe_ttm_sys_mgr_init(xe);
>+
> 	for_each_gt(gt, xe, id) {
> 		err = xe_gt_init_noalloc(gt);
> 		if (err)
>diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
>index d277f8985f7b..d9d1b09a8e38 100644
>--- a/drivers/gpu/drm/xe/xe_device.h
>+++ b/drivers/gpu/drm/xe/xe_device.h
>@@ -116,4 +116,5 @@ static inline bool xe_device_has_flat_ccs(struct xe_device *xe)
> }
>
> u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size);
>+int xe_ttm_sys_mgr_init(struct xe_device *xe);

   CC [M]  drivers/gpu/drm/xe/tests/xe_bo_test.o
../drivers/gpu/drm/xe/xe_ttm_sys_mgr.c:99:5: error: no previous prototype for ‘xe_ttm_sys_mgr_init’ [-Werror=missing-prototypes]
    99 | int xe_ttm_sys_mgr_init(struct xe_device *xe)
       |     ^~~~~~~~~~~~~~~~~~~


couple of issues:

1) you moved the implementation to a new file and didn't include
xe_device.h, creating the build issue above
2) The decl shouldn't be here though. You need a xe_ttm_sys_mgr.h, just
like we have xe_ttm_stolen_mgr.h, xe_ttm_vram_mgr.h, xe_ttm_vram_mgr_types.h

Lucas De Marchi


More information about the Intel-xe mailing list