[PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2

Tue Jun 20 09:04:54 UTC 2023

On 6/20/23 17:16, Tatsuyuki Ishi wrote:
> On 6/20/23 17:12, Christian König wrote:
>> Am 20.06.23 um 06:07 schrieb Tatsuyuki Ishi:
>>>> @@ -928,18 +874,56 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>>>>           e->user_invalidated = userpage_invalidated;
>>>>       }
>>>>   -    r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
>>>> -                   &duplicates);
>>>> -    if (unlikely(r != 0)) {
>>>> -        if (r != -ERESTARTSYS)
>>>> -            DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
>>>> -        goto out_free_user_pages;
>>>> +    drm_exec_while_not_all_locked(&p->exec) {
>>>> +        r = amdgpu_vm_lock_pd(&fpriv->vm, &p->exec);
>>>> +        drm_exec_continue_on_contention(&p->exec);
>>>
>>> Duplicate handling is needed for pretty much every call of amdgpu_vm_lock_pd, as bo->tbo.base.resv == vm->root.bo->tbo.base.resv for AMDGPU_GEM_CREATE_VM_ALWAYS_VALID.
>>
>> Well no. AMDGPU_GEM_CREATE_VM_ALWAYS_VALID means that BOs should *not* be part of the relocation list. So when those cause an EALREADY here then userspace has a bug.
> 
> Sounds fair, lemme check how RADV is handling this again.

I checked again and relocation list was actually fine, but other places were not. For example amdgpu_gem_object_close
locks both bo->tbo.base.resv and vm->root.bo->tbo.base.resv (PD) on its own.

This was the easily debuggable case since it caused an error log but some other BO operations on ALWAYS_VALID
is also presumably broken due to the same reason.

Tatsuyuki