[Intel-xe] [PATCH] drm/xe: fix pvc unload issue
Rodrigo Vivi
rodrigo.vivi at intel.com
Tue Apr 4 19:45:05 UTC 2023
On Tue, Apr 04, 2023 at 11:48:39AM -0700, Lucas De Marchi wrote:
> On Mon, Apr 03, 2023 at 10:20:31PM +0000, Chang, Bruce wrote:
> > Currently, unload pvc driver will generate a null dereference
> > and the call stack is as below.
> >
> > [ 4850.618000] Call Trace:
> > [ 4850.620740] <TASK>
> > [ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
> > [ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
> > [ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
> > [ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
> > [ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
> > [ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
> > [ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
> > [ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
> > [ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
> > [ 4850.666617] drm_dev_release+0x30/0x50 [drm]
> > [ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]
> >
> > There are a couple issues, but the main one is due to TTM has only
> > one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
> > 1 TTM_PL_TT each tile. The second will overwrite the first one.
> >
> > During unload time, the first tile will reset the TTM_PL_TT manger
> > and when the second tile is trying to free Bo and it will generate
> > the null reference since the TTM manage is already got reset to 0.
> >
> > The fix is to use one global TTM_PL_TT manager.
> >
> > v2: make gtt mgr global and change the name to sys_mgr
> >
> > Cc: Stuart Summers <stuart.summers at intel.com>
> > Cc: Matthew Brost <matthew.brost at intel.com>
> > Cc: Vivi, Rodrigo <rodrigo.vivi at intel.com>
> > Signed-off-by: Bruce Chang <yu.bruce.chang at intel.com>
> > ---
> > drivers/gpu/drm/xe/Makefile | 2 +-
> > drivers/gpu/drm/xe/xe_device.c | 2 +
> > drivers/gpu/drm/xe/xe_device.h | 1 +
> > drivers/gpu/drm/xe/xe_device_types.h | 2 +
> > drivers/gpu/drm/xe/xe_gt.c | 18 -----
> > drivers/gpu/drm/xe/xe_gt_types.h | 2 -
> > drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h | 16 -----
> > drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h | 18 -----
> > .../xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} | 67 +++++++------------
> > 9 files changed, 31 insertions(+), 97 deletions(-)
> > delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
> > delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
> > rename drivers/gpu/drm/xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} (52%)
> >
> > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> > index 669c7b6f552c..7e19a15af3dc 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -83,7 +83,7 @@ xe-y += xe_bb.o \
> > xe_step.o \
> > xe_sync.o \
> > xe_trace.o \
> > - xe_ttm_gtt_mgr.o \
> > + xe_ttm_sys_mgr.o \
> > xe_ttm_stolen_mgr.o \
> > xe_ttm_vram_mgr.o \
> > xe_tuning.o \
> > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > index a79f934e3d2d..4fb9aff27686 100644
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -279,6 +279,8 @@ int xe_device_probe(struct xe_device *xe)
> > if (err)
> > goto err_irq_shutdown;
> >
> > + xe_ttm_sys_mgr_init(xe);
> > +
> > for_each_gt(gt, xe, id) {
> > err = xe_gt_init_noalloc(gt);
> > if (err)
> > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> > index d277f8985f7b..d9d1b09a8e38 100644
> > --- a/drivers/gpu/drm/xe/xe_device.h
> > +++ b/drivers/gpu/drm/xe/xe_device.h
> > @@ -116,4 +116,5 @@ static inline bool xe_device_has_flat_ccs(struct xe_device *xe)
> > }
> >
> > u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size);
> > +int xe_ttm_sys_mgr_init(struct xe_device *xe);
>
> CC [M] drivers/gpu/drm/xe/tests/xe_bo_test.o
> ../drivers/gpu/drm/xe/xe_ttm_sys_mgr.c:99:5: error: no previous prototype for ‘xe_ttm_sys_mgr_init’ [-Werror=missing-prototypes]
> 99 | int xe_ttm_sys_mgr_init(struct xe_device *xe)
> | ^~~~~~~~~~~~~~~~~~~
>
that's awkward... I didn't get issue here, not CI got it...
>
> couple of issues:
>
> 1) you moved the implementation to a new file and didn't include
> xe_device.h, creating the build issue above
I was wondering about gcc version, but this wouldn't be magic in older gcc...
> 2) The decl shouldn't be here though. You need a xe_ttm_sys_mgr.h, just
> like we have xe_ttm_stolen_mgr.h, xe_ttm_vram_mgr.h, xe_ttm_vram_mgr_types.h
yeap, that sounds the way to go...
>
> Lucas De Marchi
More information about the Intel-xe
mailing list