[PATCH 1/2] drm/xe: Fix missing workqueue destroy in xe_gt_pagefault

Tue Jun 25 17:17:52 UTC 2024

On Mon, 2024-06-10 at 22:36 +0000, Matthew Brost wrote:
> On Mon, Jun 10, 2024 at 08:32:59PM +0000, Stuart Summers wrote:
> > On driver reload we never free up the memory for the pagefault and
> > access counter workqueues. Add those destroy calls here.
> > 
> 
> These queue long term should be moved to the xe device as they are
> indexed per VM and be an array of ordered_wq. Something like [1]. Not
> sure if it worth holding up this patch though.
> 
> [1]
> https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-svm-post/-/commit/e9fd233a5ab4db54104970cd8d7c0e92a36f5220
> 
> > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel
> > GPUs")
> > Signed-off-by: Stuart Summers <stuart.summers at intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_gt.c           |  2 ++
> >  drivers/gpu/drm/xe/xe_gt_pagefault.c | 11 +++++++++++
> >  drivers/gpu/drm/xe/xe_gt_pagefault.h |  1 +
> >  3 files changed, 14 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_gt.c
> > b/drivers/gpu/drm/xe/xe_gt.c
> > index 57d84751e160..68dc6920112b 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.c
> > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > @@ -107,6 +107,8 @@ void xe_gt_remove(struct xe_gt *gt)
> >  
> >         xe_uc_remove(&gt->uc);
> >  
> > +       xe_gt_pagefault_fini(gt);
> > +
> >         for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
> >                 xe_hw_fence_irq_finish(&gt->fence_irq[i]);
> >  }
> > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > index eaf68f0135c1..3858c8e0b707 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > @@ -415,6 +415,17 @@ int xe_gt_pagefault_init(struct xe_gt *gt)
> >         return 0;
> >  }
> >  
> > +void xe_gt_pagefault_fini(struct xe_gt *gt)
> 
> 
> Use drmm_add_action_or_reset rather than exporting a fini function.

Just getting back from sick leave. Sorry for the delay getting back!

Yes makes sense. I'll look at that for the next version.

Thanks,
Stuart

> 
> Matt
> 
> > +{
> > +       struct xe_device *xe = gt_to_xe(gt);
> > +
> > +       if (!xe->info.has_usm)
> > +               return;
> > +
> > +       destroy_workqueue(gt->usm.acc_wq);
> > +       destroy_workqueue(gt->usm.pf_wq);
> > +}
> > +
> >  void xe_gt_pagefault_reset(struct xe_gt *gt)
> >  {
> >         struct xe_device *xe = gt_to_xe(gt);
> > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.h
> > b/drivers/gpu/drm/xe/xe_gt_pagefault.h
> > index 839c065a5e4c..d37b790ce8bb 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.h
> > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.h
> > @@ -12,6 +12,7 @@ struct xe_gt;
> >  struct xe_guc;
> >  
> >  int xe_gt_pagefault_init(struct xe_gt *gt);
> > +void xe_gt_pagefault_fini(struct xe_gt *gt);
> >  void xe_gt_pagefault_reset(struct xe_gt *gt);
> >  int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32
> > len);
> >  int xe_guc_access_counter_notify_handler(struct xe_guc *guc, u32
> > *msg, u32 len);
> > -- 
> > 2.34.1
> >