[PATCH v2 2/2] rust: drm: Add GPUVM abstraction

Fri Jun 13 16:42:59 UTC 2025

Danilo,

> <snip>
> 
>>>> +// SAFETY: DRM GpuVmBo objects are always reference counted and the get/put functions
>>>> +// satisfy the requirements.
>>>> +unsafe impl<T: DriverGpuVm> AlwaysRefCounted for GpuVmBo<T> {
>>>> +    fn inc_ref(&self) {
>>>> +        // SAFETY: The drm_gpuvm_get function satisfies the requirements for inc_ref().
>>>> +        unsafe { bindings::drm_gpuvm_bo_get(&self.bo as *const _ as *mut _) };
>>>> +    }
>>>> +
>>>> +    unsafe fn dec_ref(mut obj: NonNull<Self>) {
>>>> +        // SAFETY: drm_gpuvm_bo_put() requires holding the gpuva lock, which is
>>>> +        // the dma_resv lock by default.
>>>> +        // The drm_gpuvm_put function satisfies the requirements for dec_ref().
>>>> +        // (We do not support custom locks yet.)
>>>> +        unsafe {
>>>> +            let resv = (*obj.as_mut().bo.obj).resv;
>>>> +            bindings::dma_resv_lock(resv, core::ptr::null_mut());
>>>> +            bindings::drm_gpuvm_bo_put(&mut obj.as_mut().bo);
>>>> +            bindings::dma_resv_unlock(resv);
>>> 
>>> What if the resv_lock is held already? Please also make sure to put multiple
>>> unsafe calls each in a separate unsafe block.
>> 
>> By whom?
> 
> The lock might be held already by the driver or by TTM when things are called
> from TTM callbacks.
> 
> This is why GPUVM never takes locks by itself, but asserts that the correct lock
> is held.
> 
> I think we really want to get proof by the driver by providing lock guard
> references.
> 

There doesn’t seem to be a solution that fits all the boxes here.

As you said, at this point the current status of the resv is unknown. If we
simply assume that it is not taken, we run into the problem you pointed out:
i.e.: recursive locking where ttm or some other layer has the lock already.

Alternatively, if we assume that the resv must be locked in dec_ref(), then we
may build a lock::Guard from it and assert that it is held, but in any case
it's very confusing to expect the reservation to be locked on a dec_ref() call.

The fact that dec_ref() is placed automatically on drop will massively
complicate the call sites:

We will have to ensure that the resv is locked at all times where we interface
with a GpuVmBo, because each of these points could possibly be the last active
ref. If we don't, then we've introduced a race where the list is modified but
no lock is taken, which will be a pretty easy mistake to make. This seems to
also be the case in C, which we should try to improve upon.

My suggestion is to introduce a separate GPU-VA lock here:

/// A base GEM object.
#[repr(C)]
#[pin_data]
pub struct Object<T: DriverObject> {
    obj: bindings::drm_gem_object,
    // The DRM core ensures the Device exists as long as its objects exist, so we don't need to
    // manage the reference count here.
    dev: *const bindings::drm_device,
    #[pin]
    inner: T,
    #[pin]
    _p: PhantomPinned,
    // Add a GPU-VA lock here <--------
}

And only support custom locks in Rust, to the detriment of the optimization
where the resv is used and to the detriment of any perf improvements that
reusing the reservation might bring to the table.

Notice that this would sidestep this entire discussion: nobody else would be
aware of this new lock so we could safely lock it in dec_ref(). We would also
be transparently managing the locking on behalf of drivers in all the other
calls where the VA list is accessed, which is another plus as I said above.

I understand that most C drivers do not need an extra lock, but it's getting
hard to emulate this behavior in Rust.

Also, the fact that they don't need an extra lock does not invalidate the fact
that it would be simply safer to have this extra lock anyways. In other words,
it is still completely possible to use GPUVM without locking anything and IMHO
we shouldn't bring this over if we can help it.

— Daniel