[PATCH v3 1/2] drm/amdgpu: Auto-validate DMABuf imports in compute VMs

Felix Kuehling felix.kuehling at amd.com
Fri Dec 15 15:19:09 UTC 2023


On 2023-12-15 07:30, Christian König wrote:
>> @@ -1425,11 +1451,21 @@ int amdgpu_vm_handle_moved(struct 
>> amdgpu_device *adev,
>>           }
>>             r = amdgpu_vm_bo_update(adev, bo_va, clear);
>> -        if (r)
>> -            return r;
>>             if (unlock)
>>               dma_resv_unlock(resv);
>> +        if (r)
>> +            return r;
>> +
>> +        /* Remember evicted DMABuf imports in compute VMs for later
>> +         * validation
>> +         */
>> +        if (vm->is_compute_context && bo_va->base.bo &&
>> +            bo_va->base.bo->tbo.base.import_attach &&
>> +            (!bo_va->base.bo->tbo.resource ||
>> +             bo_va->base.bo->tbo.resource->mem_type == TTM_PL_SYSTEM))
>> +            amdgpu_vm_bo_evicted(&bo_va->base);
>> +
>
> The change looks mostly good now. Just one thing which worries me is 
> that when GFX and compute is mixed in the same VM this here might 
> cause problems when we run into an error during command submission.
>
> E.g. GFX validates the VM BOs, but then the IOCTL fails before calling 
> amdgpu_vm_handle_moved().
>
> In this case the DMA-buf wouldn't be validated any more.

This code path shouldn't be relevant for command submission, but for the 
amdgpu_vm_handle_moved call in amdgpu_dma_buf_move_notify. That's where 
the BO is first found to be evicted and its PTEs invalidated. That's 
where we need to remember it so it can be validated in the next call to 
amdgpu_vm_validate.

Currently the amdgpu_cs code path calls amdgpu_vm_validate with 
ticket=NULL, so it won't validate these imports. The only place that 
auto-validates evicted imports is amdgpu_amdkfd_restore_process_bos. So 
none of this should affect amdgpu_cs command submission.

The flow for amdgpu_amdkfd_restore_process_bos is:

  * amdgpu_vm_validate validates the evicted dmabuf imports (moves the
    bo_vas from "evicted" to "invalidated")
  * amdgpu_vm_handle_moved iterates over invalidated bo_vas and updates
    the PTEs with valid entries (moves the bo_vas to "done")


Regards,
   Felix


>
> Regards,
> Christian. 


More information about the amd-gfx mailing list