[PATCH v2 2/2] drm/amdkfd: pause autosuspend when creating pdd

Joshi, Mukul Mukul.Joshi at amd.com
Fri Oct 11 15:07:23 UTC 2024


[AMD Official Use Only - AMD Internal Distribution Only]

> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Yunxiang
> Li
> Sent: Thursday, October 10, 2024 12:18 PM
> To: amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Koenig, Christian
> <Christian.Koenig at amd.com>; Gui, Jack <Jack.Gui at amd.com>; Li, Yunxiang
> (Teddy) <Yunxiang.Li at amd.com>
> Subject: [PATCH v2 2/2] drm/amdkfd: pause autosuspend when creating pdd
>
> When using MES creating a pdd will require talking to the GPU to setup the relevant
> context. The code here forgot to wake up the GPU in case it was in suspend, this
> causes KVM to EFAULT for passthrough GPU for example.
>
> Also, change the other place where we pause suspend to use the cleaner
> pm_runtime_resume_and_get helper.
>
> Fixes: cc009e613de6 ("drm/amdkfd: Add KFD support for soc21 v3")
> Signed-off-by: Yunxiang Li <Yunxiang.Li at amd.com>
> ---
>
> It is unclear to me if kfd_process_destroy_pdds also have this problem, or is freeing
> gtt mem guaranteed to succeed even with the GPU in suspend.
>
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index c9ce95bc3ec8e..c3d510f1dabfa 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -1633,12 +1633,19 @@ struct kfd_process_device
> *kfd_create_process_device_data(struct kfd_node *dev,
>       atomic64_set(&pdd->evict_duration_counter, 0);
>
>       if (dev->kfd->shared_resources.enable_mes) {
> +             retval = pm_runtime_resume_and_get(bdev);
> +             if (retval < 0) {
> +                     pr_err("failed to stop autosuspend\n");
> +                     goto err_free_pdd;
> +             }

I am not 100% sure if it makes sense to resume here and then put it back on suspend queue after the
allocation. And then again call pm_runtime_resume in kfd_bind_process_to_device().
Maybe you can set pdd->runtime_inuse = true here itself and not call the pm_runtime_put_autosuspend here.

I will probably let @Kuehling, Felix comment on this.

Regards,
Mukul

>               retval = amdgpu_amdkfd_alloc_gtt_mem(adev,
>                                               AMDGPU_MES_PROC_CTX_SIZE,
>                                               &pdd->proc_ctx_bo,
>                                               &pdd->proc_ctx_gpu_addr,
>                                               &pdd->proc_ctx_cpu_ptr,
>                                               false);
> +             pm_runtime_mark_last_busy(bdev);
> +             pm_runtime_put_autosuspend(bdev);
>               if (retval) {
>                       dev_err(bdev,
>                               "failed to allocate process context bo\n"); @@ -
> 1768,11 +1775,9 @@ struct kfd_process_device
> *kfd_bind_process_to_device(struct kfd_node *dev,
>        * pdd is destroyed.
>        */
>       if (!pdd->runtime_inuse) {
> -             err = pm_runtime_get_sync(bdev);
> -             if (err < 0) {
> -                     pm_runtime_put_autosuspend(adev_to_drm(dev->adev)-
> >dev);
> +             err = pm_runtime_resume_and_get(bdev);
> +             if (err < 0)
>                       return ERR_PTR(err);
> -             }
>       }
>
>       /*
> --
> 2.34.1



More information about the amd-gfx mailing list