[PATCH 03/11] drm/xe: Implement xe_pagefault_reset
Summers, Stuart
stuart.summers at intel.com
Wed Aug 6 23:16:26 UTC 2025
On Tue, 2025-08-05 at 23:22 -0700, Matthew Brost wrote:
> Squash any pending faults on the GT being reset by setting the GT
> field
> in struct xe_pagefault to NULL.
>
> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> ---
> drivers/gpu/drm/xe/xe_gt.c | 2 ++
> drivers/gpu/drm/xe/xe_pagefault.c | 23 ++++++++++++++++++++++-
> 2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index 390394bbaadc..5aa03f89a062 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -50,6 +50,7 @@
> #include "xe_map.h"
> #include "xe_migrate.h"
> #include "xe_mmio.h"
> +#include "xe_pagefault.h"
> #include "xe_pat.h"
> #include "xe_pm.h"
> #include "xe_mocs.h"
> @@ -846,6 +847,7 @@ static int gt_reset(struct xe_gt *gt)
>
> xe_uc_gucrc_disable(>->uc);
> xe_uc_stop_prepare(>->uc);
> + xe_pagefault_reset(gt_to_xe(gt), gt);
Can we just pass the GT in here and then extrapolate xe from there? I
realize you're thinking of dropping the GT piece, but maybe we can
change the parameters around at that time. Just feels weird passing
these both in at this point.
> xe_gt_pagefault_reset(gt);
>
> xe_uc_stop(>->uc);
> diff --git a/drivers/gpu/drm/xe/xe_pagefault.c
> b/drivers/gpu/drm/xe/xe_pagefault.c
> index 14304c41eb23..aef389e51612 100644
> --- a/drivers/gpu/drm/xe/xe_pagefault.c
> +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> @@ -122,6 +122,24 @@ int xe_pagefault_init(struct xe_device *xe)
> return err;
> }
>
> +static void xe_pagefault_queue_reset(struct xe_device *xe, struct
> xe_gt *gt,
> + struct xe_pagefault_queue
> *pf_queue)
> +{
> + u32 i;
> +
> + /* Squash all pending faults on the GT */
> +
> + spin_lock_irq(&pf_queue->lock);
> + for (i = pf_queue->tail; i != pf_queue->head;
> + i = (i + xe_pagefault_entry_size()) % pf_queue->size) {
Should we add a check in here that pf_queue->head is some multiple of
xe_pagefault_entry_size and pf_queue->size is aligned to
xe_pagefault_entry_size()?
> + struct xe_pagefault *pf = pf_queue->data + i;
> +
> + if (pf->gt == gt)
> + pf->gt = NULL;
Not sure I fully get the intent here... so we loop back around from
TAIL to HEAD and clear all of the GTs in pf_queue->data for each one?
Is the expectation that each entry in the pf_queue has the same GT or
is NULL? And then setting to NULL is a way we can abstract out the GT?
Still getting through the series, so appologize if this is also
answered later in the series...
Thanks,
Stuart
> + }
> + spin_unlock_irq(&pf_queue->lock);
> +}
> +
> /**
> * xe_pagefault_reset() - Page fault reset for a GT
> * @xe: xe device instance
> @@ -132,7 +150,10 @@ int xe_pagefault_init(struct xe_device *xe)
> */
> void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt)
> {
> - /* TODO - implement */
> + int i;
> +
> + for (i = 0; i < XE_PAGEFAULT_QUEUE_COUNT; ++i)
> + xe_pagefault_queue_reset(xe, gt, xe->usm.pf_queue +
> i);
> }
>
> /**
More information about the Intel-xe
mailing list