[PATCH v4 01/40] drm/gpuvm: Don't require obj lock in destructor path

Tue May 20 22:56:15 UTC 2025

On Tue, May 20, 2025 at 3:31 PM Dave Airlie <airlied at gmail.com> wrote:
>
> On Wed, 21 May 2025 at 07:53, Rob Clark <robdclark at gmail.com> wrote:
> >
> > On Tue, May 20, 2025 at 2:25 PM Dave Airlie <airlied at gmail.com> wrote:
> > >
> > > On Sat, 17 May 2025 at 02:20, Rob Clark <robdclark at gmail.com> wrote:
> > > >
> > > > On Fri, May 16, 2025 at 2:01 AM Danilo Krummrich <dakr at kernel.org> wrote:
> > > > >
> > > > > On Thu, May 15, 2025 at 02:57:46PM -0700, Rob Clark wrote:
> > > > > > On Thu, May 15, 2025 at 10:55 AM Danilo Krummrich <dakr at kernel.org> wrote:
> > > > > > > Anyways, I don't agree with that. Even if you can tweak your driver to not run
> > > > > > > into trouble with this, we can't introduce a mode that violates GOUVM's internal
> > > > > > > lifetimes and subsequently fix it up with WARN_ON() or BUG_ON().
> > > > > > >
> > > > > > > I still don't see a real technical reason why msm can't be reworked to follow
> > > > > > > those lifetime rules.
> > > > > >
> > > > > > The basic issue is that (a) it would be really awkward to have two
> > > > > > side-by-side VM/VMA management/tracking systems.  But in legacy mode,
> > > > > > we have the opposite direction of reference holding.  (But at the same
> > > > > > time, don't need/use most of the features of gpuvm.)
> > > > >
> > > > > Ok, let's try to move this forward; I see three options (in order of descending
> > > > > preference):
> > > > >
> > > > >   1) Rework the legacy code to properly work with GPUVM.
> > > > >   2) Don't use GPUVM for the legacy mode.
> > > > >   .
> > > > >   .
> > > > >   .
> > > > >   3) Get an ACK from Dave / Sima to implement those workarounds for MSM in
> > > > >      GPUVM.
> > > > >
> > > > > If you go for 3), the code introduced by those two patches should be guarded
> > > > > with a flag that makes it very clear that this is a workaround specifically
> > > > > for MSM legacy mode and does not give any guarantees in terms of correctness
> > > > > regarding lifetimes etc., e.g. DRM_GPUVM_MSM_LEGACY_QUIRK.
> > > >
> > > > I'm not even sure how #2 would work, other than just copy/pasta all of
> > > > drm_gpuvm into msm, which doesn't really seem great.
> > > >
> > > > As for #1, even if I could get it to work, it would still be a lot
> > > > more mmu map/unmap (like on every pageflip, vs the current state that
> > > > the vma is kept around until the object is freed).  For the
> > > > non-VM_BIND world, there are advantages to the BO holding the ref to
> > > > the VMA, rather than the other way around.  Even at just a modest
> > > > single layer 1080p the map takes ~.2ms and unmap ~.3ms (plus the unmap
> > > > costs a tlbinv).  So from that standpoint, #3 is the superior option.
> > > >
> > >
> > > Before we get to #3, I'll need a bit more info here on why you have to
> > > map/unmap the VMA on every pageflip.
> >
> > Previously we'd keep the VMA hanging around until the GEM obj is
> > freed.  But that can't work if the VMA (via the VM_BO) is holding a
> > reference to the GEM obj.
> >
> > I was kinda thinking about keeping the VMA around until the handle is
> > closed.. but that doesn't cover the dma-buf case (ie. when you
> > re-import the dma-buf fd each frame.. I know android does this, unsure
> > about other wsi's).
> >
> > > But actually I think 2 is the best option, I think in nouveau this is
> > > where we ended up, we didn't modify the old submission paths at all
> > > and kept the old bo/vm lifetimes.
> > >
> > > We just added completely new bind/exec ioctls and you can only use one
> > > method once you've opened an fd.
> >
> > hmm, but that means tracking VMAs against a single BO differently..
> > which.. at least seems ugly..
>
> I don't think it is if you already have the code to do that, and just
> add gpuvm support in parallel.
>
> You also have to figure out that the world is moving towards Vulkan
> for everything so any optimisations you've made for particular legacy
> paths will need to be dealt with in the future picture anyways.

fwiw, the case I'm more worried about is the kms vm for scanout, that
won't be using vk

BR,
-R

> But I'd rather not hack gpuvm into being something it isn't, if there
> is a meaningful commonality in legacy bo/vm bindings across drivers,
> we could create something new, but the ref counting and handling is
> pretty fundamental to gpuvm architecture.
>
> There should only be two paths, legacy and gpuvm, and you shouldn't
> ever be mixing them on a particular exec path, since you should only
> have a vm per userspace fd, and can pick which way to use it the first
> time someone calls it.
>
> Dave.
> Dave.