[PATCH v2 05/29] drm/gpusvm: Add support for GPU Shared Virtual Memory

Tue Nov 5 16:12:19 UTC 2024

On Tue, Nov 05, 2024 at 11:22:12AM +0100, Thomas Hellström wrote:
> On Mon, 2024-11-04 at 15:07 -0800, Matthew Brost wrote:
> > > We
> > > have
> > > https://elixir.bootlin.com/linux/v6.12-rc6/source/include/linux/int
> > > erval_tree_generic.h#L24
> > > 
> > > to relate to. Now GPUVM can't use the generic version since it
> > > needs
> > > u64 intervals. These trees need unsigned long only so it should be
> > > ok.
> > > And safe removal, isn't that possible to implement without the
> > > list?
> > > Then it's really only the linked list as a perf optimization I
> > > guess,
> > > but we have a lot of those pending...
> > > 
> > 
> > See my other comments. Let me just follow on using a maple tree and
> > perhaps a
> > list isn't required if we use that. Will have definite answer in my
> > next rev.
> 
> Note, though, that IIRC maple trees do not allow overlapping ranges,
> and If we need to support multiple svm VMAs with different offsets,
> like Christian suggests, we will likely have overlapping ranges for the
> range tree but not for the notifier tree.
> 

I don't think that is how overlapping ranges would look though. We'd
have multiple GPU VMAs / GPU ptes to pointing the same SVM range. The
SVM ranges speak in the CPU address space - we'd attach multiple GPU
VMAs to the SVM so in the notifier we can find all the GPU pages to
invalidate. At least I think it would look this way - can cross that
bridge if / when we get to it though.

> Thinking a bit more about this, my concern is mostly around needlessly
> instantiating new interval trees instead of using the generic
> instantiation, because that is clearly against recommended practice. 
> 

Ok, so with this statement then I think both the interval trees in GPU
VM / xe_range_fence are going again the recommended practice too?

> But the list could probably be added anyway if needed, and it does
> indeed AFAICT reduce the traversal complexity from O(N ln N) to O(N).
> 

I think this will show in the notifier. We currently walk the notifier's
range tree twice - once to do the invalidate, once to unmap the pages /
add to the garbage collector. I even optimize this so the 2nd walk
doesn't have to lookup first range again making the complexity O(ln N +
2 * N) vs (2 * N * ln N) without a list.

Matt

> /Thomas
>