[PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

Danilo Krummrich dakr at redhat.com
Wed Oct 4 18:24:20 UTC 2023



On 10/4/23 19:57, Thomas Hellström wrote:
> On Wed, 2023-10-04 at 19:17 +0200, Danilo Krummrich wrote:
>> On 10/4/23 17:29, Thomas Hellström wrote:
>>>
>>> On Wed, 2023-10-04 at 14:57 +0200, Danilo Krummrich wrote:
>>>> On 10/3/23 11:11, Thomas Hellström wrote:
>>>>
>>>> <snip>
>>>>
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to
>>>>>>> /
>>>>>>> from the &drm_gpuvms
>>>>>>> + * evicted list
>>>>>>> + * @vm_bo: the &drm_gpuvm_bo to add or remove
>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>> + *
>>>>>>> + * Adds a &drm_gpuvm_bo to or removes it from the
>>>>>>> &drm_gpuvms
>>>>>>> evicted list.
>>>>>>> + */
>>>>>>> +void
>>>>>>> +drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict)
>>>>>>> +{
>>>>>>> +    struct drm_gem_object *obj = vm_bo->obj;
>>>>>>> +
>>>>>>> +    dma_resv_assert_held(obj->resv);
>>>>>>> +
>>>>>>> +    /* Always lock list transactions, even if
>>>>>>> DRM_GPUVM_RESV_PROTECTED is
>>>>>>> +     * set. This is required to protect multiple
>>>>>>> concurrent
>>>>>>> calls to
>>>>>>> +     * drm_gpuvm_bo_evict() with BOs with different
>>>>>>> dma_resv.
>>>>>>> +     */
>>>>>>
>>>>>> This doesn't work. The RESV_PROTECTED case requires the
>>>>>> evicted
>>>>>> flag we discussed before. The list is either protected by the
>>>>>> spinlock or the resv. Otherwise a list add could race with a
>>>>>> list
>>>>>> removal elsewhere.
>>>>
>>>> I think it does unless I miss something, but it might be a bit
>>>> subtle
>>>> though.
>>>>
>>>> Concurrent drm_gpuvm_bo_evict() are protected by the spinlock.
>>>> Additionally, when
>>>> drm_gpuvm_bo_evict() is called we hold the dma-resv of the
>>>> corresponding GEM object.
>>>>
>>>> In drm_gpuvm_validate() I assert that we hold *all* dma-resv,
>>>> which
>>>> implies that no
>>>> one can call drm_gpuvm_bo_evict() on any of the VM's objects and
>>>> no
>>>> one can add a new
>>>> one and directly call drm_gpuvm_bo_evict() on it either.
>>>
>>> But translated into how the data (the list in this case) is
>>> protected
>>> it becomes
>>>
>>> "Either the spinlock and the bo resv of a single list item OR the
>>> bo
>>> resvs of all bos that can potentially be on the list",
>>>
>>> while this is certainly possible to assert, any new / future code
>>> that
>>> manipulates the evict list will probably get this wrong and as a
>>> result
>>> the code becomes pretty fragile. I think drm_gpuvm_bo_destroy()
>>> already
>>> gets it wrong in that it, while holding a single resv, doesn't take
>>> the
>>> spinlock.
>>
>> That's true and I don't like it either. Unfortunately, with the dma-
>> resv
>> locking scheme we can't really protect the evict list without the
>> drm_gpuvm_bo::evicted trick properly.
>>
>> But as pointed out in my other reply, I'm a bit worried about the
>> drm_gpuvm_bo::evicted trick being too restrictive, but maybe it's
>> fine
>> doing it in the RESV_PROTECTED case.
> 
> Ah, indeed. I misread that as discussing the current code rather than
> the drm_gpuvm_bo::evicted trick. If validating only a subset, or a
> range, then with the drm_gpuvm_bo::evicted trick would be valid only
> for that subset.
> 
> But the current code would break because the condition of locking "the
> resvs of all bos that can potentially be on the list" doesn't hold
> anymore, and you'd get list corruption.
> 
> What *would* work, though, is the solution currently in xe, The
> original evict list, and a staging evict list whose items are copied
> over on validation. The staging evict list being protected by the
> spinlock, the original evict list by the resv, and they'd use separate
> list heads in the drm_gpuvm_bo, but that is yet another complication.
> 
> But I think if this becomes an issue, those VMs (perhaps OpenGL UMD
> VMs) only wanting to validate a subset, would simply initially rely on
> the current non-RESV solution. It looks like it's only a matter of
> flipping the flag on a per-vm basis.

If such a driver locks a range it can also just validate all locked
objects I guess.

And for everything else, we still have the spinlock protected variant,
where drivers can freely move things around by just taking the spinlock.

I think I will go ahead and add drm_gpuvm_bo::evicted, plus the helpers
I mentioned.

> 
> /Thomas
> 
> 
>>
>>>
>>> So I think that needs fixing, and if keeping that protection I
>>> think it
>>> needs to be documented with the list member and ideally an assert.
>>> But
>>> also note that lockdep_assert_held will typically give false true
>>> for
>>> dma_resv locks; as long as the first dma_resv lock locked in a
>>> drm_exec
>>> sequence  remains locked, lockdep thinks *all* dma_resv locks are
>>> held.
>>> (or something along those lines), so the resv lockdep asserts are
>>> currently pretty useless.
>>>
>>> /Thomas
>>>
>>>
>>>
>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 



More information about the dri-devel mailing list