[PATCH v3 2/2] drm/amdkfd: pause autosuspend when creating pdd
Felix Kuehling
felix.kuehling at amd.com
Wed Dec 4 23:36:29 UTC 2024
On 2024-12-03 09:30, Yunxiang Li wrote:
> When using MES creating a pdd will require talking to the GPU to setup
> the relevant context. The code here forgot to wake up the GPU in case it
> was in suspend, this causes KVM to EFAULT for passthrough GPU for
> example. This issue can be masked if the GPU was woken up by other
> things (e.g. opening the KMS node) first and have not yet gone to sleep.
>
> Fixes: cc009e613de6 ("drm/amdkfd: Add KFD support for soc21 v3")
> Signed-off-by: Yunxiang Li <Yunxiang.Li at amd.com>
> ---
> v3: remove the cleanup in kfd_bind_process_to_device and document why
> this issue doesn't always happen
>
> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 555a892fcf963..c81c020af75d1 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -1635,12 +1635,19 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_node *dev,
> atomic64_set(&pdd->evict_duration_counter, 0);
>
> if (dev->kfd->shared_resources.enable_mes) {
> + retval = pm_runtime_resume_and_get(bdev);
> + if (retval < 0) {
> + pr_err("failed to stop autosuspend\n");
> + goto err_free_pdd;
> + }
> retval = amdgpu_amdkfd_alloc_gtt_mem(adev,
> AMDGPU_MES_PROC_CTX_SIZE,
> &pdd->proc_ctx_bo,
> &pdd->proc_ctx_gpu_addr,
> &pdd->proc_ctx_cpu_ptr,
> false);
As far as I can see from grepping the code, this BO is never used. It is
allocated here and freed in kfd_process_destroy_pdds, and that's it.
I see a different proc_ctx_bo allocation in amdgpu_mes_create_process
but I don't see that function being called anywhere. Either my grep-Fu
is getting rusty, or there is some dead code and data structures
surrounding MES here.
So unless I'm missing something, we can just remove this proc_ctx_bo
completely.
Regards,
Felix
> + pm_runtime_mark_last_busy(bdev);
> + pm_runtime_put_autosuspend(bdev);
> if (retval) {
> dev_err(bdev,
> "failed to allocate process context bo\n");
More information about the amd-gfx
mailing list