[PATCH 13/19] drm/amdkfd: Handle remaining BUG_ONs more gracefully

Kuehling, Felix Felix.Kuehling at amd.com
Sat Aug 12 18:37:40 UTC 2017


Sorry about the weird quoting format. I'm using outlook web access from home. Comments inline [FK]

________________________________________
From: Oded Gabbay <oded.gabbay at gmail.com>
Sent: Saturday, August 12, 2017 10:39 AM
To: Kuehling, Felix
Cc: amd-gfx list
Subject: Re: [PATCH 13/19] drm/amdkfd: Handle remaining BUG_ONs more gracefully

On Sat, Aug 12, 2017 at 12:56 AM, Felix Kuehling <Felix.Kuehling at amd.com> wrote:
> In most cases, BUG_ONs can be replaced with WARN_ON with an error
> return. In some void functions just turn them into a WARN_ON and
> possibly an early exit.
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c            |  3 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.c            |  3 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c            | 16 ++++----
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 19 ++++-----
>  .../drm/amd/amdkfd/kfd_device_queue_manager_cik.c  |  2 +-
>  .../drm/amd/amdkfd/kfd_device_queue_manager_vi.c   |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c      | 20 +++++++---
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c   |  3 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c    |  3 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    | 45 +++++++++++++---------
>  drivers/gpu/drm/amd/amdkfd/kfd_pasid.c             |  4 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c           |  9 ++---
>  .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |  7 ++--
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c          |  4 +-
>  14 files changed, 84 insertions(+), 56 deletions(-)
[snip]

>> @@ -610,12 +616,15 @@ int pm_send_unmap_queue(struct packet_manager *pm, enum kfd_queue_type type,
>                                 queue_sel__mes_unmap_queues__perform_request_on_dynamic_queues_only;
>                 break;
>         default:
> -               BUG();
> -               break;
> +               WARN(1, "filter %d", mode);
> +               retval = -EINVAL;
>         }
>
> -       pm->priv_queue->ops.submit_packet(pm->priv_queue);
> -
> +err_invalid:
> +       if (!retval)
> +               pm->priv_queue->ops.submit_packet(pm->priv_queue);
> +       else
> +               pm->priv_queue->ops.rollback_packet(pm->priv_queue);

I don't feel comfortable putting a valid code path under an "err_invalid" label.
This defeats the purpose of goto statement and common cleanup code,
making the code unreadable.

[FK] It is common clean-up code that is needed both in the error and success case. If you prefer, I can separate the error cleanup from the normal cleanup, but it will result in some duplication.

Also, the rollback packet function was not in the original code. Why
did you add it here ?

[FK] With the BUG(), this function would not return in case of an error. Without the BUG it will return with an error code, so it needs to clean up after itself. So it needs to release the space it allocated on the queue. Otherwise the next potential user of the queue will submit garbage.

>  err_acquire_packet_buffer:
>         mutex_unlock(&pm->lock);
>         return retval;
[snip]

> @@ -202,10 +200,10 @@ static void kfd_process_destroy_delayed(struct rcu_head *rcu)
>         struct kfd_process_release_work *work;
>         struct kfd_process *p;
>
> -       BUG_ON(!kfd_process_wq);
> +       WARN_ON(!kfd_process_wq);
I think this is redundant, as kfd_process_wq is later derefernced
inside queue_work (as *wq). So we will get a violation there anyway.

[FK] OK.

Regards,
  Felix

>
>         p = container_of(rcu, struct kfd_process, rcu);
> -       BUG_ON(atomic_read(&p->mm->mm_count) <= 0);
> +       WARN_ON(atomic_read(&p->mm->mm_count) <= 0);
>
>         mmdrop(p->mm);
>
> @@ -229,7 +227,8 @@ static void kfd_process_notifier_release(struct mmu_notifier *mn,
>          * mmu_notifier srcu is read locked
>          */
>         p = container_of(mn, struct kfd_process, mmu_notifier);
> -       BUG_ON(p->mm != mm);
> +       if (WARN_ON(p->mm != mm))
> +               return;
>
>         mutex_lock(&kfd_processes_mutex);
>         hash_del_rcu(&p->kfd_processes);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index f6ecdff..1cae95e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -218,8 +218,8 @@ int pqm_create_queue(struct process_queue_manager *pqm,
>                                                         kq, &pdd->qpd);
>                 break;
>         default:
> -               BUG();
> -               break;
> +               WARN(1, "Invalid queue type %d", type);
> +               retval = -EINVAL;
>         }
>
>         if (retval != 0) {
> @@ -272,7 +272,8 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid)
>                 dev = pqn->kq->dev;
>         if (pqn->q)
>                 dev = pqn->q->device;
> -       BUG_ON(!dev);
> +       if (WARN_ON(!dev))
> +               return -ENODEV;
>
>         pdd = kfd_get_process_device_data(dev, pqm->process);
>         if (!pdd) {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index e5486f4..19ce590 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -799,10 +799,12 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>         int ret;
>         uint32_t i;
>
> +       if (WARN_ON(dev->kobj_node))
> +               return -EEXIST;
> +
>         /*
>          * Creating the sysfs folders
>          */
> -       BUG_ON(dev->kobj_node);
>         dev->kobj_node = kfd_alloc_struct(dev->kobj_node);
>         if (!dev->kobj_node)
>                 return -ENOMEM;
> --
> 2.7.4
>


More information about the amd-gfx mailing list