[PATCH v2 29/29] drm/doc: gpusvm: Add GPU SVM documentation
Matthew Brost
matthew.brost at intel.com
Tue Dec 17 23:14:47 UTC 2024
On Mon, Dec 02, 2024 at 02:00:44PM +0100, Thomas Hellström wrote:
> On Tue, 2024-10-15 at 20:25 -0700, Matthew Brost wrote:
> > Add documentation for agree upon GPU SVM design principles, current
> > status, and future plans.
> >
> > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > ---
> > Documentation/gpu/rfc/gpusvm.rst | 70
> > ++++++++++++++++++++++++++++++++
> > Documentation/gpu/rfc/index.rst | 4 ++
> > 2 files changed, 74 insertions(+)
> > create mode 100644 Documentation/gpu/rfc/gpusvm.rst
> >
> > diff --git a/Documentation/gpu/rfc/gpusvm.rst
> > b/Documentation/gpu/rfc/gpusvm.rst
> > new file mode 100644
> > index 000000000000..2d3f79a6c30a
> > --- /dev/null
> > +++ b/Documentation/gpu/rfc/gpusvm.rst
> > @@ -0,0 +1,70 @@
> > +===============
> > +GPU SVM Section
> > +===============
> > +
> > +Agreed upon design principles
> > +=============================
> > +
> > +* migrate_to_ram path
> > + * Rely on core MM concepts (migration ptes, page refs, and
> > page locking)
> > + only
> > + * No driver specific locks other than locks for hardware
> > interaction in
> > + this path
>
> We have previously been discussing the bo lock to protect the bo from
> eviction during migrate, if the vram allocation is bo-based. This is a
> cross-driver lock with a well-established locking order and I suggest
> we allow this. Apart from that I think the above statement needs some
Not allowing additional locks was suggested by Sima and I think it makes
sense but I do agree taking a dma-resv in migrate_to_ram would be safe.
However, the way GPU SVM is structured there is not any hooks to enable
this.
> elaboration: What is the problem we are trying to avoid with driver-
> specific locks, written so that it's easy to understand it's a bad
> idea.
>
Sure, will try to elaborate.
> > + * Partial migration is supported
>
> Exactly what do we mean by partial migration.
>
Will
> > + * Driver handles mixed migrations via retry loops rather
> > than locking
> > +* Eviction
> > + * Only looking at physical memory datastructures and locks
> as opposed to...
>
As opposed looking at virtual memory data structures. Can elaborate why
this is a bad idea too - basically mremap + fork fall apart when you
start looking at virtual things.
> > + * No looking at mm/vma structs or relying on those being
> > locked
> We're violating this with the current implementation, aren't we?
>
Aside from when calling migrate_vma_* or creating initial range. Can
elaborate on this and certainly say drivers should not look at CPU VMA.
>
> > +* GPU fault side
> > + * mmap_read only used around core MM functions which require
> > this lock
> > + * Big retry loop to handle all races with the mmu notifier
> > under the gpu
> > + pagetable locks/mmu notifier range lock/whatever we end up
> > calling
> > + those
> > + * Races (especially against concurrent
> > eviction/migrate_to_ram) should
> > + not be handled on the fault side by trying to hold locks
>
> This actually contradicts my comment written above about using the bo
> lock to block eviction here. The alternative would be to pin vram
> allocations during migration until the mm_truct has references on the
> allocation, but it'd be good to clarify exactly why locking here is a
> bad idea, and why we can't rely on lockdep?
>
I'll try to clarify.
> > +* Physical memory to virtual backpointer
> > + * Does not work, no pointers from physical memory to virtual
> > should
> > + exist
>
> We actually still have the private zdd structure, but it's strictly not
> to virtual but to the allocation metadata. Is it verified that the
> zone_device_data field is allowed to be modified by the pagemap between
> allocation and migration?
>
We don't modify zdd between allocation and migration aside from the ref
count of the zdd.
>
> > +* GPU pagetable locking
> > + * Notifier lock only protects range tree, pages, pagetable
> > entries, and
> > + mmu notifier seqno tracking, it is not a global lock to
> > protect
> > + against races
> > + * All races handled with big retry as mentioned above
>
> Adding a note here about "pages valid" for subranges rather than
> relying on the wider notifer seqno. I.E. a subrange can be valid even
> if the notifier seqno says otherwise.
>
Sure.
> Performance considerations:
> Perhaps mention that notifier (core mm) performance is more important
> than gpu fault (driver) performance when considering optimizations that
> improves one at the cost of the other?
>
Hmm, that is kinda speculation IMO. I have heard that feedback but
unsure if I agree with it nor do we have any data to backup that claim.
I'd rather not write something down like this that is based on
speculation. I do agree we should profile the code to really understand
the hot spots and write down our findings once we have done that.
> > +
> > +Overview of current design
> > +==========================
> > +
> > +Current design is simple as possible to get a working basline in
> > which can built
>
> can be built
>
+1
Matt
> > +upon.
> > +
> > +.. kernel-doc:: drivers/gpu/drm/xe/drm_gpusvm.c
> > + :doc: Overview
> > + :doc: Locking
> > + :doc: Migrataion
> > + :doc: Partial Unmapping of Ranges
> > + :doc: Examples
> > +
> > +Possible future design features
> > +===============================
> > +
> > +* Concurrent GPU faults
> > + * CPU faults are concurrent so makes sense to have
> > concurrent GPU faults
> > + * Should be possible with fined grained locking in the
> > driver GPU
> > + fault handler
> > + * No expected GPU SVM changes required
> > +* Ranges with mixed system and device pages
> > + * Can be added if required to drm_gpusvm_get_pages fairly
> > easily
> > +* Multi-GPU support
> > + * Work in progress and patches expected after initially
> > landing on GPU
> > + SVM
> > + * Ideally can be done with little to no changes to GPU SVM
> > +* Drop ranges in favor of radix tree
> > + * May be desirable for faster notifiers
> > +* Compound device pages
> > + * Nvidia, AMD, and Intel all have agreed expensive core MM
> > functions in
> > + migrate device layer are a performance bottleneck, having
> > compound
> > + device pages should help increase performance by reducing
> > the number
> > + of these expensive calls
> > +* Higher order dma mapping for migration
> > + * 4k dma mapping adversely affects migration performance on
> > Intel
> > + hardware, higher order (2M) dma mapping should help here
> > diff --git a/Documentation/gpu/rfc/index.rst
> > b/Documentation/gpu/rfc/index.rst
> > index 476719771eef..396e535377fb 100644
> > --- a/Documentation/gpu/rfc/index.rst
> > +++ b/Documentation/gpu/rfc/index.rst
> > @@ -16,6 +16,10 @@ host such documentation:
> > * Once the code has landed move all the documentation to the right
> > places in
> > the main core, helper or driver sections.
> >
> > +.. toctree::
> > +
> > + gpusvm.rst
> > +
> > .. toctree::
> >
> > i915_gem_lmem.rst
>
> Thanks,
> Thomas
>
More information about the Intel-xe
mailing list