[PATCH] drm/gpuvm: merge adjacent gpuva range during a map operation

Mon Sep 23 23:41:31 UTC 2024

(adding dri-devel)

On Mon, Sep 23, 2024 at 02:24:02PM +0000, Zeng, Oak wrote:
> > > This patch is an old one in my back log. I roughly remember I ran into
> > a situation where there were two duplicated VMAs covering
> > > Same virtual address range are kept in gpuvm's RB-tree. One VMA
> > was actually already destroyed. This further caused issues as
> > > The destroyed VMA was found during a GPUVM RB-tree walk. This
> > triggered me to look into the gpuvm merge split logic and end
> > > Up with this patch. This patch did fix that issue.
> > 
> > That would indeed be a big issue. As Matt suggests, is there a
> > reproducer?
> > 
> > Either way, adding merge support can't be the fix for this, we need a
> > separate
> > one, that's back-portable.
> > 
> 
> The discussion went on when you were away. See https://patchwork.freedesktop.org/patch/614941/?series=138835&rev=1

Yes, I'm aware. But I don't see how this is related to what I said above?

> 
> Matt and me agreed to implement a merge logic in gpuvm, but gpuvm need to check a driver cookie/callback to decide merge or not.
> We reached this conclusion based on some requirement from system allocator design. See more details in above link.
> 
> Can you take a look and let us know whether you agree?

Generally, I'm fine with that, one of my early versions of GPUVM had this. But I
dropped it because I don't saw an immediate benefit.

>From my old change log:

    "Remove merging of GPUVAs; the kernel has limited to none knowlege about
    the semantics of mapping sequences. Hence, merging is purely speculative.
    It seems that gaining a significant (or at least a measurable) performance
    increase through merging is way more likely to happen when userspace is
    responsible for merging mappings up to the next larger page size if
    possible."

If the pure number of GPUVA structures is a concern though, I think it's fair to
add it. So, feel to send a patch.

It's probably also a good idea to double check with my old merge implementation
[1]. It's pretty easy to get this wrong. I'm not saying I got it right, but if
we both ended up with the same logic, it's a good indicator. :)

However, this should *not* be a solution for an existing bug. Above you mention
a bug related to "duplicated VMAs covering the same virtual address range". This
is unrelated and must be fixed separately. Do you have a way to reproduce this?

[1] https://lore.kernel.org/dri-devel/20230217134422.14116-6-dakr@redhat.com/

> 
> > Also, can we move this on DRI-devel please?
> 
> Yes will do.
> 
> Oak