[PATCH] drm/amdgpu: Handle duplicate BOs during process restore

Felix Kuehling felix.kuehling at amd.com
Mon Mar 11 15:33:14 UTC 2024


On 2024-03-11 11:25, Joshi, Mukul wrote:
> [AMD Official Use Only - General]
>
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>> Sent: Monday, March 11, 2024 2:50 AM
>> To: Joshi, Mukul <Mukul.Joshi at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Kuehling, Felix <Felix.Kuehling at amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: Handle duplicate BOs during process
>> restore
>>
>> Caution: This message originated from an External Source. Use proper caution
>> when opening attachments, clicking links, or responding.
>>
>>
>> Am 08.03.24 um 17:22 schrieb Mukul Joshi:
>>> In certain situations, some apps can import a BO multiple times
>>> (through IPC for example). To restore such processes successfully, we
>>> need to tell drm to ignore duplicate BOs.
>>> While at it, also add additional logging to prevent silent failures
>>> when process restore fails.
>>>
>>> Signed-off-by: Mukul Joshi <mukul.joshi at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 14
>> ++++++++++----
>>>    1 file changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> index bf8e6653341f..65d808d8b5da 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> @@ -2869,14 +2869,16 @@ int
>>> amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
>>> __rcu *
>>>
>>>        mutex_lock(&process_info->lock);
>>>
>>> -     drm_exec_init(&exec, 0);
>>> +     drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES);
>>>        drm_exec_until_all_locked(&exec) {
>>>                list_for_each_entry(peer_vm, &process_info->vm_list_head,
>>>                                    vm_list_node) {
>>>                        ret = amdgpu_vm_lock_pd(peer_vm, &exec, 2);
>>>                        drm_exec_retry_on_contention(&exec);
>>> -                     if (unlikely(ret))
>>> +                     if (unlikely(ret)) {
>>> +                             pr_err("Locking VM PD failed, ret:
>>> + %d\n", ret);
>>>                                goto ttm_reserve_fail;
>>> +                     }
>> That's a bad idea. Locking can always be interrupted and that would print an
>> error here.
>>
> Thanks Christian. Will send out a patch to change it to pr_debug.

We cannot get interrupted here because we're in a worker thread. We 
should be running in non-interruptible mode.

Regards,
   Felix


>
> Regards,
> Mukul
>
>> Regards,
>> Christian.
>>
>>>                }
>>>
>>>                /* Reserve all BOs and page tables/directory. Add all
>>> BOs from @@ -2889,8 +2891,10 @@ int
>> amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
>> __rcu *
>>>                        gobj = &mem->bo->tbo.base;
>>>                        ret = drm_exec_prepare_obj(&exec, gobj, 1);
>>>                        drm_exec_retry_on_contention(&exec);
>>> -                     if (unlikely(ret))
>>> +                     if (unlikely(ret)) {
>>> +                             pr_err("drm_exec_prepare_obj failed,
>>> + ret: %d\n", ret);
>>>                                goto ttm_reserve_fail;
>>> +                     }
>>>                }
>>>        }
>>>
>>> @@ -2950,8 +2954,10 @@ int
>> amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
>> __rcu *
>>>         * validations above would invalidate DMABuf imports again.
>>>         */
>>>        ret = process_validate_vms(process_info, &exec.ticket);
>>> -     if (ret)
>>> +     if (ret) {
>>> +             pr_err("Validating VMs failed, ret: %d\n", ret);
>>>                goto validate_map_fail;
>>> +     }
>>>
>>>        /* Update mappings not managed by KFD */
>>>        list_for_each_entry(peer_vm, &process_info->vm_list_head,


More information about the amd-gfx mailing list