[PATCH] drm/amdgpu: Handle duplicate BOs during process restore
Joshi, Mukul
Mukul.Joshi at amd.com
Mon Mar 11 15:25:08 UTC 2024
[AMD Official Use Only - General]
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken at gmail.com>
> Sent: Monday, March 11, 2024 2:50 AM
> To: Joshi, Mukul <Mukul.Joshi at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Kuehling, Felix <Felix.Kuehling at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: Handle duplicate BOs during process
> restore
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> Am 08.03.24 um 17:22 schrieb Mukul Joshi:
> > In certain situations, some apps can import a BO multiple times
> > (through IPC for example). To restore such processes successfully, we
> > need to tell drm to ignore duplicate BOs.
> > While at it, also add additional logging to prevent silent failures
> > when process restore fails.
> >
> > Signed-off-by: Mukul Joshi <mukul.joshi at amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 14
> ++++++++++----
> > 1 file changed, 10 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > index bf8e6653341f..65d808d8b5da 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > @@ -2869,14 +2869,16 @@ int
> > amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
> > __rcu *
> >
> > mutex_lock(&process_info->lock);
> >
> > - drm_exec_init(&exec, 0);
> > + drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES);
> > drm_exec_until_all_locked(&exec) {
> > list_for_each_entry(peer_vm, &process_info->vm_list_head,
> > vm_list_node) {
> > ret = amdgpu_vm_lock_pd(peer_vm, &exec, 2);
> > drm_exec_retry_on_contention(&exec);
> > - if (unlikely(ret))
> > + if (unlikely(ret)) {
> > + pr_err("Locking VM PD failed, ret:
> > + %d\n", ret);
> > goto ttm_reserve_fail;
> > + }
>
> That's a bad idea. Locking can always be interrupted and that would print an
> error here.
>
Thanks Christian. Will send out a patch to change it to pr_debug.
Regards,
Mukul
> Regards,
> Christian.
>
> > }
> >
> > /* Reserve all BOs and page tables/directory. Add all
> > BOs from @@ -2889,8 +2891,10 @@ int
> amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
> __rcu *
> > gobj = &mem->bo->tbo.base;
> > ret = drm_exec_prepare_obj(&exec, gobj, 1);
> > drm_exec_retry_on_contention(&exec);
> > - if (unlikely(ret))
> > + if (unlikely(ret)) {
> > + pr_err("drm_exec_prepare_obj failed,
> > + ret: %d\n", ret);
> > goto ttm_reserve_fail;
> > + }
> > }
> > }
> >
> > @@ -2950,8 +2954,10 @@ int
> amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
> __rcu *
> > * validations above would invalidate DMABuf imports again.
> > */
> > ret = process_validate_vms(process_info, &exec.ticket);
> > - if (ret)
> > + if (ret) {
> > + pr_err("Validating VMs failed, ret: %d\n", ret);
> > goto validate_map_fail;
> > + }
> >
> > /* Update mappings not managed by KFD */
> > list_for_each_entry(peer_vm, &process_info->vm_list_head,
More information about the amd-gfx
mailing list