[PATCH] drm/amdgpu: Handle duplicate BOs during process restore
Christian König
ckoenig.leichtzumerken at gmail.com
Mon Mar 11 16:33:11 UTC 2024
Am 11.03.24 um 16:33 schrieb Felix Kuehling:
> On 2024-03-11 11:25, Joshi, Mukul wrote:
>> [AMD Official Use Only - General]
>>
>>> -----Original Message-----
>>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>>> Sent: Monday, March 11, 2024 2:50 AM
>>> To: Joshi, Mukul <Mukul.Joshi at amd.com>; amd-gfx at lists.freedesktop.org
>>> Cc: Kuehling, Felix <Felix.Kuehling at amd.com>
>>> Subject: Re: [PATCH] drm/amdgpu: Handle duplicate BOs during process
>>> restore
>>>
>>> Caution: This message originated from an External Source. Use proper
>>> caution
>>> when opening attachments, clicking links, or responding.
>>>
>>>
>>> Am 08.03.24 um 17:22 schrieb Mukul Joshi:
>>>> In certain situations, some apps can import a BO multiple times
>>>> (through IPC for example). To restore such processes successfully, we
>>>> need to tell drm to ignore duplicate BOs.
>>>> While at it, also add additional logging to prevent silent failures
>>>> when process restore fails.
>>>>
>>>> Signed-off-by: Mukul Joshi <mukul.joshi at amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 14
>>> ++++++++++----
>>>> 1 file changed, 10 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> index bf8e6653341f..65d808d8b5da 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> @@ -2869,14 +2869,16 @@ int
>>>> amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
>>>> __rcu *
>>>>
>>>> mutex_lock(&process_info->lock);
>>>>
>>>> - drm_exec_init(&exec, 0);
>>>> + drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES);
>>>> drm_exec_until_all_locked(&exec) {
>>>> list_for_each_entry(peer_vm,
>>>> &process_info->vm_list_head,
>>>> vm_list_node) {
>>>> ret = amdgpu_vm_lock_pd(peer_vm, &exec, 2);
>>>> drm_exec_retry_on_contention(&exec);
>>>> - if (unlikely(ret))
>>>> + if (unlikely(ret)) {
>>>> + pr_err("Locking VM PD failed, ret:
>>>> + %d\n", ret);
>>>> goto ttm_reserve_fail;
>>>> + }
>>> That's a bad idea. Locking can always be interrupted and that would
>>> print an
>>> error here.
>>>
>> Thanks Christian. Will send out a patch to change it to pr_debug.
>
> We cannot get interrupted here because we're in a worker thread. We
> should be running in non-interruptible mode.
Ah! Ok in that case this isn't necessary.
But in general I think we should avoid error printing like that. If we
want to know where something failed there is a function tracker for that.
Regards,
Christian.
>
> Regards,
> Felix
>
>
>>
>> Regards,
>> Mukul
>>
>>> Regards,
>>> Christian.
>>>
>>>> }
>>>>
>>>> /* Reserve all BOs and page tables/directory. Add all
>>>> BOs from @@ -2889,8 +2891,10 @@ int
>>> amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
>>> __rcu *
>>>> gobj = &mem->bo->tbo.base;
>>>> ret = drm_exec_prepare_obj(&exec, gobj, 1);
>>>> drm_exec_retry_on_contention(&exec);
>>>> - if (unlikely(ret))
>>>> + if (unlikely(ret)) {
>>>> + pr_err("drm_exec_prepare_obj failed,
>>>> + ret: %d\n", ret);
>>>> goto ttm_reserve_fail;
>>>> + }
>>>> }
>>>> }
>>>>
>>>> @@ -2950,8 +2954,10 @@ int
>>> amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
>>> __rcu *
>>>> * validations above would invalidate DMABuf imports again.
>>>> */
>>>> ret = process_validate_vms(process_info, &exec.ticket);
>>>> - if (ret)
>>>> + if (ret) {
>>>> + pr_err("Validating VMs failed, ret: %d\n", ret);
>>>> goto validate_map_fail;
>>>> + }
>>>>
>>>> /* Update mappings not managed by KFD */
>>>> list_for_each_entry(peer_vm, &process_info->vm_list_head,
More information about the amd-gfx
mailing list