[PATCH v4 02/40] drm/gpuvm: Allow VAs to hold soft reference to BOs
Danilo Krummrich
dakr at kernel.org
Thu May 15 15:30:16 UTC 2025
On Thu, May 15, 2025 at 07:59:16AM -0700, Rob Clark wrote:
Thanks for the detailed explanation!
> On Thu, May 15, 2025 at 2:00 AM Danilo Krummrich <dakr at kernel.org> wrote:
> >
> > On Wed, May 14, 2025 at 10:53:16AM -0700, Rob Clark wrote:
> > > From: Rob Clark <robdclark at chromium.org>
> > >
> > > Eases migration for drivers where VAs don't hold hard references to
> > > their associated BO, avoiding reference loops.
> > >
> > > In particular, msm uses soft references to optimistically keep around
> > > mappings until the BO is distroyed. Which obviously won't work if the
> > > VA (the mapping) is holding a reference to the BO.
> >
> > Ick! This is all complicated enough. Allow drivers to bypass the proper
> > reference counting for GEM objects in the context of VM_BO structures seems like
> > an insane footgun.
> >
> > I don't understand why MSM would need weak references here. Why does msm need
> > that, but nouveau, Xe, panthor, PowerVR do not?
>
> Most of those drivers were designed (and had their UABI designed) with
> gpuvm, or at least sparse, in mind from the get go. I'm not sure
> about nouveau, but I guess it just got lucky that it's UABI semantics
> fit having the VMA hold a reference to the BO.
>
> Unfortunately, msm pre-dates sparse.. and in the beginning there was
> only a single global VM, multiple VMs was something retrofitted ~6yrs
> (?) back. For existing msm, the VMA(s) are implicitly torn down when
> the GEM obj is freed. This won't work with the VMA(s) holding hard
> references to the BO.
Ok, that makes sense to me, but why can't this be changed? I don't see how the
uAPI would be affected, this is just an implementation detail, no?
> When userspace opts-in to "VM_BIND" mode, which it has to do before
> the VM is created, then we don't set this flag, the VMA holds a hard
> reference to the BO as it does with other drivers. But consider this
> use-case, which is perfectly valid for old (existing) userspace:
>
> 1) Userspace creates a BO
> 2) Submits rendering referencing the BO
> 3) Immediately closes the BO handle, without waiting for the submit to complete
>
> In this case, the submit holds a reference to the BO which holds a
> reference to the VMA.
Can't you just instead create the VMAs, which hold a reference to the VM_BO,
which holds a reference to the BO, then drop the drop the original BO reference
and finally, when everything is completed, remove all VMAs of the VM_BO?
This should do exactly the same *and* be conformant with GPUVM design.
> Everything is torn down gracefully when the
> submit completes. But if the VMA held a hard reference to the BO then
> you'd have a reference loop.
>
> So there really is no other way to use gpuvm _and_ maintain backwards
> compatibility with the semantics of the pre-VM_BIND UAPI without this
> flag.
Again, how is this important for maintaining backwards compatibility with the
uAPI? This all seems like a driver internal implementation detail to me.
So, is there a technical reason, or is it more that it would be more effort on
the driver end to rework things accordingly?
> Fortunately DRM_GPUVM_VA_WEAK_REF is minimally intrusive. Otherwise I
> probably would have had to fork my own copy of gpuvm.
>
> BR,
> -R
More information about the dri-devel
mailing list