[PATCH 01/11] drm/xe: Stub out new pagefault layer

Thu Aug 7 18:10:30 UTC 2025

On Thu, Aug 07, 2025 at 11:20:06AM -0600, Summers, Stuart wrote:
> On Wed, 2025-08-06 at 16:53 -0700, Matthew Brost wrote:
> > On Wed, Aug 06, 2025 at 05:01:12PM -0600, Summers, Stuart wrote:
> > > Few basic comments below to start. I personally would rather this
> > > be
> > > brought over from the existing fault handler rather than creating
> > > something entirely new and then clobbering the older stuff - just
> > > so
> > > the review of message format requests/replies is easier to review
> > > and
> > > where we're deviating from the existing external interfaces
> > > (HW/FW/GuC/etc). You already have this here though so not a huge
> > > deal.
> > > I think most of this was in the giant blob of patches that got
> > > merged
> > > with the initial driver, so I guess the counter argument is we can
> > > have
> > > easy to reference historical reviews now.
> > > 
> > 
> > Yes, page fault code is largely just a big blob from the original Xe
> > patch that wasn't the most well thought out code. We still have that
> > history in the tree, just git blame won't work, so you'd need to know
> > where to look if want that.
> > 
> > I don't think there is a great way to pull this over, unless patches
> > 2-7
> > are squashed into a single patch + a couple of 'git mv' are used.
> 
> No definitely don't think that's worth it. Let's just review as-is.
> 
> > 
> > > On Tue, 2025-08-05 at 23:22 -0700, Matthew Brost wrote:
> > > > Stub out the new page fault layer and add kernel documentation.
> > > > This
> > > > is
> > > > intended as a replacement for the GT page fault layer, enabling
> > > > multiple
> > > > producers to hook into a shared page fault consumer interface.
> > > > 
> > > > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/Makefile             |   1 +
> > > >  drivers/gpu/drm/xe/xe_pagefault.c       |  63 ++++++++++++
> > > >  drivers/gpu/drm/xe/xe_pagefault.h       |  19 ++++
> > > >  drivers/gpu/drm/xe/xe_pagefault_types.h | 125
> > > > ++++++++++++++++++++++++
> > > >  4 files changed, 208 insertions(+)
> > > >  create mode 100644 drivers/gpu/drm/xe/xe_pagefault.c
> > > >  create mode 100644 drivers/gpu/drm/xe/xe_pagefault.h
> > > >  create mode 100644 drivers/gpu/drm/xe/xe_pagefault_types.h
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/Makefile
> > > > b/drivers/gpu/drm/xe/Makefile
> > > > index 8e0c3412a757..6fbebafe79c9 100644
> > > > --- a/drivers/gpu/drm/xe/Makefile
> > > > +++ b/drivers/gpu/drm/xe/Makefile
> > > > @@ -93,6 +93,7 @@ xe-y += xe_bb.o \
> > > >         xe_nvm.o \
> > > >         xe_oa.o \
> > > >         xe_observation.o \
> > > > +       xe_pagefault.o \
> > > >         xe_pat.o \
> > > >         xe_pci.o \
> > > >         xe_pcode.o \
> > > > diff --git a/drivers/gpu/drm/xe/xe_pagefault.c
> > > > b/drivers/gpu/drm/xe/xe_pagefault.c
> > > > new file mode 100644
> > > > index 000000000000..3ce0e8d74b9d
> > > > --- /dev/null
> > > > +++ b/drivers/gpu/drm/xe/xe_pagefault.c
> > > > @@ -0,0 +1,63 @@
> > > > +// SPDX-License-Identifier: MIT
> > > > +/*
> > > > + * Copyright © 2025 Intel Corporation
> > > > + */
> > > > +
> > > > +#include "xe_pagefault.h"
> > > > +#include "xe_pagefault_types.h"
> > > > +
> > > > +/**
> > > > + * DOC: Xe page faults
> > > > + *
> > > > + * Xe page faults are handled in two layers. The producer layer
> > > > interacts with
> > > > + * hardware or firmware to receive and parse faults into struct
> > > > xe_pagefault,
> > > > + * then forwards them to the consumer. The consumer layer
> > > > services
> > > > the faults
> > > > + * (e.g., memory migration, page table updates) and acknowledges
> > > > the
> > > > result back
> > > > + * to the producer, which then forwards the results to the
> > > > hardware
> > > > or firmware.
> > > > + * The consumer uses a page fault queue sized to absorb all
> > > > potential faults and
> > > > + * a multi-threaded worker to process them. Multiple producers
> > > > are
> > > > supported,
> > > > + * with a single shared consumer.
> > > > + */
> > > > +
> > > > +/**
> > > > + * xe_pagefault_init() - Page fault init
> > > > + * @xe: xe device instance
> > > > + *
> > > > + * Initialize Xe page fault state. Must be done after reading
> > > > fuses.
> > > > + *
> > > > + * Return: 0 on Success, errno on failure
> > > > + */
> > > > +int xe_pagefault_init(struct xe_device *xe)
> > > > +{
> > > > +       /* TODO - implement */
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +/**
> > > > + * xe_pagefault_reset() - Page fault reset for a GT
> > > > + * @xe: xe device instance
> > > > + * @gt: GT being reset
> > > > + *
> > > > + * Reset the Xe page fault state for a GT; that is, squash any
> > > > pending faults on
> > > > + * the GT.
> > > > + */
> > > > +void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt)
> > > > +{
> > > > +       /* TODO - implement */
> > > > +}
> > > > +
> > > > +/**
> > > > + * xe_pagefault_handler() - Page fault handler
> > > > + * @xe: xe device instance
> > > > + * @pf: Page fault
> > > > + *
> > > > + * Sink the page fault to a queue (i.e., a memory buffer) and
> > > > queue
> > > > a worker to
> > > > + * service it. Safe to be called from IRQ or process context.
> > > > Reclaim safe.
> > > > + *
> > > > + * Return: 0 on success, errno on failure
> > > > + */
> > > > +int xe_pagefault_handler(struct xe_device *xe, struct
> > > > xe_pagefault
> > > > *pf)
> > > > +{
> > > > +       /* TODO - implement */
> > > > +       return 0;
> > > > +}
> > > > diff --git a/drivers/gpu/drm/xe/xe_pagefault.h
> > > > b/drivers/gpu/drm/xe/xe_pagefault.h
> > > > new file mode 100644
> > > > index 000000000000..bd0cdf9ed37f
> > > > --- /dev/null
> > > > +++ b/drivers/gpu/drm/xe/xe_pagefault.h
> > > > @@ -0,0 +1,19 @@
> > > > +/* SPDX-License-Identifier: MIT */
> > > > +/*
> > > > + * Copyright © 2025 Intel Corporation
> > > > + */
> > > > +
> > > > +#ifndef _XE_PAGEFAULT_H_
> > > > +#define _XE_PAGEFAULT_H_
> > > > +
> > > > +struct xe_device;
> > > > +struct xe_gt;
> > > > +struct xe_pagefault;
> > > > +
> > > > +int xe_pagefault_init(struct xe_device *xe);
> > > > +
> > > > +void xe_pagefault_reset(struct xe_device *xe, struct xe_gt *gt);
> > > > +
> > > > +int xe_pagefault_handler(struct xe_device *xe, struct
> > > > xe_pagefault
> > > > *pf);
> > > > +
> > > > +#endif
> > > > diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h
> > > > b/drivers/gpu/drm/xe/xe_pagefault_types.h
> > > > new file mode 100644
> > > > index 000000000000..fcff84f93dd8
> > > > --- /dev/null
> > > > +++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
> > > > @@ -0,0 +1,125 @@
> > > > +/* SPDX-License-Identifier: MIT */
> > > > +/*
> > > > + * Copyright © 2025 Intel Corporation
> > > > + */
> > > > +
> > > > +#ifndef _XE_PAGEFAULT_TYPES_H_
> > > > +#define _XE_PAGEFAULT_TYPES_H_
> > > > +
> > > > +#include <linux/workqueue.h>
> > > > +
> > > > +struct xe_pagefault;
> > > > +struct xe_gt;
> > > 
> > > Nit: Maybe reverse these structs to be in alphabetical order
> > > 
> > 
> > Yes, that is the preferred style. Will fix.
> > 
> > > > +
> > > > +/** enum xe_pagefault_access_type - Xe page fault access type */
> > > > +enum xe_pagefault_access_type {
> > > > +       /** @XE_PAGEFAULT_ACCESS_TYPE_READ: Read access type */
> > > > +       XE_PAGEFAULT_ACCESS_TYPE_READ   = 0,
> > > > +       /** @XE_PAGEFAULT_ACCESS_TYPE_WRITE: Write access type */
> > > > +       XE_PAGEFAULT_ACCESS_TYPE_WRITE  = 1,
> > > > +       /** @XE_PAGEFAULT_ACCESS_TYPE_ATOMIC: Atomic access type
> > > > */
> > > > +       XE_PAGEFAULT_ACCESS_TYPE_ATOMIC = 2,
> > > > +};
> > > > +
> > > > +/** enum xe_pagefault_type - Xe page fault type */
> > > > +enum xe_pagefault_type {
> > > > +       /** @XE_PAGEFAULT_TYPE_NOT_PRESENT: Not present */
> > > > +       XE_PAGEFAULT_TYPE_NOT_PRESENT           = 0,
> > > > +       /** @XE_PAGEFAULT_TYPE_WRITE_ACCESS_VIOLATION: Write
> > > > access
> > > > violation */
> > > > +       XE_PAGEFAULT_WRITE_ACCESS_VIOLATION     = 1,
> > > > +       /** @XE_PAGEFAULT_TYPE_WRITE_ACCESS_VIOLATION: Atomic
> > > > access
> > > > violation */
> > > 
> > > XE_PAGEFAULT_TYPE_WRITE_ACCESS_VIOLATION ->
> > > XE_PAGEFAULT_ACCESS_TYPE_ATOMIC
> > > 
> > 
> > The intended prefix here is 'XE_PAGEFAULT_TYPE_' to normalize the
> > naming
> > with 'enum xe_pagefault_type'.
> 
> Ah sorry you're right. I also should have been more specific that I
> meant this should be ATOMIC access vs WRITE access, so:
> XE_PAGEFAULT_TYPE_ATOMIC_ACCESS_VIOLATION
> 
> > 
> > > > +       XE_PAGEFAULT_ATOMIC_ACCESS_VIOLATION    = 2,
> > > > +};
> > > > +
> > > > +/** struct xe_pagefault_ops - Xe pagefault ops (producer) */
> > > > +struct xe_pagefault_ops {
> > > > +       /**
> > > > +        * @ack_fault: Ack fault
> > > > +        * @pf: Page fault
> > > > +        * @err: Error state of fault
> > > > +        *
> > > > +        * Page fault producer receives acknowledgment from the
> > > > consumer and
> > > > +        * sends the result to the HW/FW interface.
> > > > +        */
> > > > +       void (*ack_fault)(struct xe_pagefault *pf, int err);
> > > > +};
> > > > +
> > > > +/**
> > > > + * struct xe_pagefault - Xe page fault
> > > > + *
> > > > + * Generic page fault structure for communication between
> > > > producer
> > > > and consumer.
> > > > + * Carefully sized to be 64 bytes.
> > > > + */
> > > > +struct xe_pagefault {
> > > > +       /**
> > > > +        * @gt: GT of fault
> > > > +        *
> > > > +        * XXX: We may want to decouple the GT from individual
> > > > faults, as it's
> > > > +        * unclear whether future platforms will always have a GT
> > > > for
> > > > all page
> > > > +        * fault producers. Internally, the GT is used for stats,
> > > > identifying
> > > > +        * the appropriate VRAM region, and locating the
> > > > migration
> > > > queue.
> > > > +        * Leaving this as-is for now, but we can revisit later
> > > > to
> > > > see if we
> > > > +        * can convert it to use the Xe device pointer instead.
> > > > +        */
> > > 
> > > What if instead of assuming the GT stays static and we eventually
> > > remove if we have some new HW abstraction layer that isn't a GT but
> > > still uses the page fault, we instead push to have said theoretical
> > > abstraction layer overload the GT here like we're doing with
> > > primary
> > > and media today. Then we can keep the interface here simple and
> > > just
> > > leave this in there, or change in the future if that doesn't make
> > > sense
> > > without the suggestive comment?
> > > 
> > 
> > I can remove this comment, as it adds some confusion. Hopefully, we
> > always have a GT. I was just speculating about future cases where we
> > might not have one. From a purely interface perspective, it would be
> > ideal to completely decouple the GT here.
> >  
> > > > +       struct xe_gt *gt;
> > > > +       /**
> > > > +        * @consumer: State for the software handling the fault.
> > > > Populated by
> > > > +        * the producer and may be modified by the consumer to
> > > > communicate
> > > > +        * information back to the producer upon fault
> > > > acknowledgment.
> > > > +        */
> > > > +       struct {
> > > > +               /** @consumer.page_addr: address of page fault */
> > > > +               u64 page_addr;
> > > > +               /** @consumer.asid: address space ID */
> > > > +               u32 asid;
> > > 
> > > Can we just call this an ID instead of a pasid or asid? I.e. the ID
> > > could be anything, not strictly process-bound.
> > > 
> > 
> > I think the idea here is that this serves as the ID for our reverse
> > VM
> > lookup mechanism in the KMD. We call it ASID throughout the codebase
> > today, so we’re stuck with the name—though it may or may not have any
> > actual meaning in hardware, depending on the producer. For example,
> > if
> > the producer receives a fault based on a queue ID, we’d look up the
> > queue and then pass in q->vm.asid.
> > 
> > We could even have the producer look up the VM directly, if
> > preferred,
> > and just pass that over. However, that would require a few more bits
> > here and might introduce lifetime issues—for example, we’d have to
> > refcount the VM.
> 
> Yeah I mean some of those problems we can solve if they come up later.
> Just thinking having something more generic here would be nice. But I
> agree on the cross-KMD usage. We can keep this and change it more
> broadly if that makes sense later.
> 

I think the point here is we are always going to have a VM which is
required by the consumer to service the fault, the producer side needs
parse the fault and figure out a known value in the KMD which
corresponds to a VM and pass it over. We call this value asid today
(also the name of hardware interface + what we program into the LRC) but
could rename this everywhere in KMD if that makes sense. e.g.,
kmd_vm_id (vm_id is a user space name / value which means something
different).

> > 
> > > > +               /** @consumer.access_type: access type */
> > > > +               u8 access_type;
> > > > +               /** @consumer.fault_type: fault type */
> > > > +               u8 fault_type;
> > > > +#define XE_PAGEFAULT_LEVEL_NACK                0xff    /*
> > > > Producer
> > > > indicates nack fault */
> > > > +               /** @consumer.fault_level: fault level */
> > > > +               u8 fault_level;
> > > > +               /** @consumer.engine_class: engine class */
> > > > +               u8 engine_class;
> > > > +               /** consumer.reserved: reserved bits for future
> > > > expansion */
> > > > +               u64 reserved;
> > > 
> > > What about engine instance? Or is that going to overload reserved
> > > here?
> > > 
> > 
> > reserved could be used to include 'engine instance' if required, is
> > there for future expansion and also to have structure sized to 64
> > bytes.
> > 
> > I include fault_level, engine_class as I though both were used by [1]
> > but now that I looked again only fault level is used so I guess
> > engine_class can be pulled out too unless we want to keep for the
> > only
> > place in which it is used (debug messages).
> 
> I think today hardware/GuC provides both engine class and engine
> instance which is why I mentioned. We can ignore those fields if we
> don't feel they are valuable/relevant, but at least today we are
> reading those and printing them out.
> 

Yes, the debug message drops engine instance due to not passing this
value over. I think that is ok, engine class typically all we care about
anyways.

Matt

> Thanks,
> Stuart
> 
> > 
> > Matt
> > 
> > [1] https://patchwork.freedesktop.org/series/148727/
> > 
> > > Thanks,
> > > Stuart
> > > 
> > > > +       } consumer;
> > > > +       /**
> > > > +        * @producer: State for the producer (i.e., HW/FW
> > > > interface).
> > > > Populated
> > > > +        * by the producer and should not be modified—or even
> > > > inspected—by the
> > > > +        * consumer, except for calling operations.
> > > > +        */
> > > > +       struct {
> > > > +               /** @producer.private: private pointer */
> > > > +               void *private;
> > > > +               /** @producer.ops: operations */
> > > > +               const struct xe_pagefault_ops *ops;
> > > > +#define XE_PAGEFAULT_PRODUCER_MSG_LEN_DW       4
> > > > +               /**
> > > > +                * producer.msg: page fault message, used by
> > > > producer
> > > > in fault
> > > > +                * acknowledgement to formulate response to HW/FW
> > > > interface.
> > > > +                */
> > > > +               u32 msg[XE_PAGEFAULT_PRODUCER_MSG_LEN_DW];
> > > > +       } producer;
> > > > +};
> > > > +
> > > > +/** struct xe_pagefault_queue: Xe pagefault queue (consumer) */
> > > > +struct xe_pagefault_queue {
> > > > +       /**
> > > > +        * @data: Data in queue containing struct xe_pagefault,
> > > > protected by
> > > > +        * @lock
> > > > +        */
> > > > +       void *data;
> > > > +       /** @size: Size of queue in bytes */
> > > > +       u32 size;
> > > > +       /** @head: Head pointer in bytes, moved by producer,
> > > > protected by @lock */
> > > > +       u32 head;
> > > > +       /** @tail: Tail pointer in bytes, moved by consumer,
> > > > protected by @lock */
> > > > +       u32 tail;
> > > > +       /** @lock: protects page fault queue */
> > > > +       spinlock_t lock;
> > > > +       /** @worker: to process page faults */
> > > > +       struct work_struct worker;
> > > > +};
> > > > +
> > > > +#endif
> > > 
>