[PATCH 03/11] drm/xe: Implement xe_pagefault_reset

Wed Aug 6 23:16:26 UTC 2025

On Tue, 2025-08-05 at 23:22 -0700, Matthew Brost wrote:
> Squash any pending faults on the GT being reset by setting the GT
> field
> in struct xe_pagefault to NULL.
> 
> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt.c        |  2 ++
>  drivers/gpu/drm/xe/xe_pagefault.c | 23 ++++++++++++++++++++++-
>  2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index 390394bbaadc..5aa03f89a062 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -50,6 +50,7 @@
>  #include "xe_map.h"
>  #include "xe_migrate.h"
>  #include "xe_mmio.h"
> +#include "xe_pagefault.h"
>  #include "xe_pat.h"
>  #include "xe_pm.h"
>  #include "xe_mocs.h"
> @@ -846,6 +847,7 @@ static int gt_reset(struct xe_gt *gt)
>  
>         xe_uc_gucrc_disable(&gt->uc);
>         xe_uc_stop_prepare(&gt->uc);
> +       xe_pagefault_reset(gt_to_xe(gt), gt);

Can we just pass the GT in here and then extrapolate xe from there? I
realize you're thinking of dropping the GT piece, but maybe we can
change the parameters around at that time. Just feels weird passing
these both in at this point.

>         xe_gt_pagefault_reset(gt);
>  
>         xe_uc_stop(&gt->uc);
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c
> b/drivers/gpu/drm/xe/xe_pagefault.c
> index 14304c41eb23..aef389e51612 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -122,6 +122,24 @@ int xe_pagefault_init(struct xe_device *xe)
>         return err;
>  }
>  
> +static void xe_pagefault_queue_reset(struct xe_device *xe, struct
> xe_gt *gt,
> +                                    struct xe_pagefault_queue
> *pf_queue)
> +{
> +       u32 i;
> +
> +       /* Squash all pending faults on the GT */
> +
> +       spin_lock_irq(&pf_queue->lock);
> +       for (i = pf_queue->tail; i != pf_queue->head;
> +            i = (i + xe_pagefault_entry_size()) % pf_queue->size) {

Should we add a check in here that pf_queue->head is some multiple of
xe_pagefault_entry_size and pf_queue->size is aligned to
xe_pagefault_entry_size()?

> +               struct xe_pagefault *pf = pf_queue->data + i;
> +
> +               if (pf->gt == gt)
> +                       pf->gt = NULL;

Not sure I fully get the intent here... so we loop back around from
TAIL to HEAD and clear all of the GTs in pf_queue->data for each one?
Is the expectation that each entry in the pf_queue has the same GT or
is NULL? And then setting to NULL is a way we can abstract out the GT?

Still getting through the series, so appologize if this is also
answered later in the series...

Thanks,
Stuart

> +       }
> +       spin_unlock_irq(&pf_queue->lock);
> +}
> +
>  /**
>   * xe_pagefault_reset() - Page fault reset for a GT
>   * @xe: xe device instance
> @@ -132,7 +150,10 @@ int xe_pagefault_init(struct xe_device *xe)
>   */
>  void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt)
>  {
> -       /* TODO - implement */
> +       int i;
> +
> +       for (i = 0; i < XE_PAGEFAULT_QUEUE_COUNT; ++i)
> +               xe_pagefault_queue_reset(xe, gt, xe->usm.pf_queue +
> i);
>  }
>  
>  /**