[PATCH v4 02/40] drm/gpuvm: Allow VAs to hold soft reference to BOs
Rob Clark
robdclark at gmail.com
Thu May 15 20:10:34 UTC 2025
On Thu, May 15, 2025 at 10:51 AM Danilo Krummrich <dakr at kernel.org> wrote:
>
> On Thu, May 15, 2025 at 10:34:07AM -0700, Rob Clark wrote:
> > On Thu, May 15, 2025 at 8:30 AM Danilo Krummrich <dakr at kernel.org> wrote:
> > >
> > > On Thu, May 15, 2025 at 07:59:16AM -0700, Rob Clark wrote:
> > >
> > > Thanks for the detailed explanation!
> > >
> > > > On Thu, May 15, 2025 at 2:00 AM Danilo Krummrich <dakr at kernel.org> wrote:
> > > > >
> > > > > On Wed, May 14, 2025 at 10:53:16AM -0700, Rob Clark wrote:
> > > > > > From: Rob Clark <robdclark at chromium.org>
> > > > > >
> > > > > > Eases migration for drivers where VAs don't hold hard references to
> > > > > > their associated BO, avoiding reference loops.
> > > > > >
> > > > > > In particular, msm uses soft references to optimistically keep around
> > > > > > mappings until the BO is distroyed. Which obviously won't work if the
> > > > > > VA (the mapping) is holding a reference to the BO.
> > > > >
> > > > > Ick! This is all complicated enough. Allow drivers to bypass the proper
> > > > > reference counting for GEM objects in the context of VM_BO structures seems like
> > > > > an insane footgun.
> > > > >
> > > > > I don't understand why MSM would need weak references here. Why does msm need
> > > > > that, but nouveau, Xe, panthor, PowerVR do not?
> > > >
> > > > Most of those drivers were designed (and had their UABI designed) with
> > > > gpuvm, or at least sparse, in mind from the get go. I'm not sure
> > > > about nouveau, but I guess it just got lucky that it's UABI semantics
> > > > fit having the VMA hold a reference to the BO.
> > > >
> > > > Unfortunately, msm pre-dates sparse.. and in the beginning there was
> > > > only a single global VM, multiple VMs was something retrofitted ~6yrs
> > > > (?) back. For existing msm, the VMA(s) are implicitly torn down when
> > > > the GEM obj is freed. This won't work with the VMA(s) holding hard
> > > > references to the BO.
> > >
> > > Ok, that makes sense to me, but why can't this be changed? I don't see how the
> > > uAPI would be affected, this is just an implementation detail, no?
> >
> > It's about the behaviour of the API, there is no explicit VMA
> > creation/destruction in the uAPI.
>
> But that shouldn't matter? Userspace gives you a BO, the driver creates VMAs
> itself, which can have a reference on the VM_BO, which references the original
> BO. At this point you can drop the original reference of the BO and just destroy
> all corresponding VMAs once the driver fulfilled the request from userspace?
Having the submit hold a reference to the VM_BO, and then this funny
looking bit of code in gem_close() gets us part way there:
vm_bo = drm_gpuvm_bo_find(ctx->vm, obj);
if (vm_bo) {
drm_gpuvm_bo_put(vm_bo);
drm_gpuvm_bo_put(vm_bo);
}
But we still leak BO's used in other VMs.. scanout, and various other
fw and other internal BOs... those would all have to be tracked down
and to find _someplace_ to break the VM_BO circular reference loop.
> > > > When userspace opts-in to "VM_BIND" mode, which it has to do before
> > > > the VM is created, then we don't set this flag, the VMA holds a hard
> > > > reference to the BO as it does with other drivers. But consider this
> > > > use-case, which is perfectly valid for old (existing) userspace:
> > > >
> > > > 1) Userspace creates a BO
> > > > 2) Submits rendering referencing the BO
> > > > 3) Immediately closes the BO handle, without waiting for the submit to complete
> > > >
> > > > In this case, the submit holds a reference to the BO which holds a
> > > > reference to the VMA.
> > >
> > > Can't you just instead create the VMAs, which hold a reference to the VM_BO,
> > > which holds a reference to the BO, then drop the drop the original BO reference
> > > and finally, when everything is completed, remove all VMAs of the VM_BO?
> >
> > Perhaps the submit could hold a ref to the VM_BO instead of the BO to
> > cover that particular case.
> >
> > But for the legacy world, the VMA is implicitly torn down when the BO
> > is freed. Which will never happen if the VM_BO holds a reference to
> > the BO.
>
> Sure, I get that; what I do not get is why it can't be changed, e.g. in the way
> described above.
>
> > > This should do exactly the same *and* be conformant with GPUVM design.
> > >
> > > > Everything is torn down gracefully when the
> > > > submit completes. But if the VMA held a hard reference to the BO then
> > > > you'd have a reference loop.
> > > >
> > > > So there really is no other way to use gpuvm _and_ maintain backwards
> > > > compatibility with the semantics of the pre-VM_BIND UAPI without this
> > > > flag.
> > >
> > > Again, how is this important for maintaining backwards compatibility with the
> > > uAPI? This all seems like a driver internal implementation detail to me.
> > >
> > > So, is there a technical reason, or is it more that it would be more effort on
> > > the driver end to rework things accordingly?
> >
> > If there were a way to work without WEAK_REF, it seems like it would
> > be harder and much less of a drop in change.
>
> So, you're saying there is no technical blocker to rework it?
Not clear.. it would certainly make conversion to gpuvm a much bigger
flag-day, because without WEAK_REF the way gpuvm works is exactly
backwards from how the thing it is replacing works.
BR,
-R
More information about the dri-devel
mailing list