[PATCH 4/8] drm/amdkfd: Fix debug unregister procedure on process termination

Oded Gabbay oded.gabbay at gmail.com
Wed Nov 1 09:56:13 UTC 2017


On Thu, Oct 26, 2017 at 1:41 AM, Felix Kuehling <Felix.Kuehling at amd.com> wrote:
> From: Yair Shachar <yair.shachar at amd.com>
>
> Take the dbgmgr lock and unregister before destroying the debug manager.
> Do this before destroying the queues.
>
> Signed-off-by: Yair Shachar <yair.shachar at amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c | 37 +++++++++++++++++++++++---------
>  1 file changed, 27 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 5d93a5c..6caf6df 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -229,17 +229,26 @@ static void kfd_process_notifier_release(struct mmu_notifier *mn,
>
>         mutex_lock(&p->mutex);
>
> +       /* Iterate over all process device data structures and if the
> +        * pdd is in debug mode, we should first force unregistration,
> +        * then we will be able to destroy the queues
> +        */
> +       list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
> +               struct kfd_dev *dev = pdd->dev;
> +
> +               mutex_lock(kfd_get_dbgmgr_mutex());
> +               if (dev && dev->dbgmgr && dev->dbgmgr->pasid == p->pasid) {
> +                       if (!kfd_dbgmgr_unregister(dev->dbgmgr, p)) {
> +                               kfd_dbgmgr_destroy(dev->dbgmgr);
> +                               dev->dbgmgr = NULL;
> +                       }
> +               }
> +               mutex_unlock(kfd_get_dbgmgr_mutex());
> +       }
> +
>         kfd_process_dequeue_from_all_devices(p);
>         pqm_uninit(&p->pqm);
>
> -       /* Iterate over all process device data structure and check
> -        * if we should delete debug managers
> -        */
> -       list_for_each_entry(pdd, &p->per_device_data, per_device_list)
> -               if ((pdd->dev->dbgmgr) &&
> -                               (pdd->dev->dbgmgr->pasid == p->pasid))
> -                       kfd_dbgmgr_destroy(pdd->dev->dbgmgr);
> -
>         mutex_unlock(&p->mutex);
>
>         /*
> @@ -468,8 +477,16 @@ void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned int pasid)
>
>         pr_debug("Unbinding process %d from IOMMU\n", pasid);
>
> -       if ((dev->dbgmgr) && (dev->dbgmgr->pasid == p->pasid))
> -               kfd_dbgmgr_destroy(dev->dbgmgr);
> +       mutex_lock(kfd_get_dbgmgr_mutex());
> +
> +       if (dev->dbgmgr && dev->dbgmgr->pasid == p->pasid) {
> +               if (!kfd_dbgmgr_unregister(dev->dbgmgr, p)) {
> +                       kfd_dbgmgr_destroy(dev->dbgmgr);
> +                       dev->dbgmgr = NULL;
> +               }
> +       }
> +
> +       mutex_unlock(kfd_get_dbgmgr_mutex());
>
>         pdd = kfd_get_process_device_data(dev, p);
>         if (pdd)
> --
> 2.7.4
>
The patch is fine but to prevent theoretical deadlocks, I suggest to
add to this patch a change in the order of locks at
kfd_ioctl_dbg_register(), so that p->mutex is taken *before* dbgmgr
mutex

With that change, this patch is:
Reviewed-by: Oded Gabbay <oded.gabbay at gmail.com>


More information about the amd-gfx mailing list