[PATCH 13/19] drm/amdkfd: Handle remaining BUG_ONs more gracefully

Oded Gabbay oded.gabbay at gmail.com
Sun Aug 13 08:48:18 UTC 2017


On Sat, Aug 12, 2017 at 9:37 PM, Kuehling, Felix <Felix.Kuehling at amd.com> wrote:
> Sorry about the weird quoting format. I'm using outlook web access from home. Comments inline [FK]
>
> ________________________________________
> From: Oded Gabbay <oded.gabbay at gmail.com>
> Sent: Saturday, August 12, 2017 10:39 AM
> To: Kuehling, Felix
> Cc: amd-gfx list
> Subject: Re: [PATCH 13/19] drm/amdkfd: Handle remaining BUG_ONs more gracefully
>
> On Sat, Aug 12, 2017 at 12:56 AM, Felix Kuehling <Felix.Kuehling at amd.com> wrote:
>> In most cases, BUG_ONs can be replaced with WARN_ON with an error
>> return. In some void functions just turn them into a WARN_ON and
>> possibly an early exit.
>>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c            |  3 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.c            |  3 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_device.c            | 16 ++++----
>>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 19 ++++-----
>>  .../drm/amd/amdkfd/kfd_device_queue_manager_cik.c  |  2 +-
>>  .../drm/amd/amdkfd/kfd_device_queue_manager_vi.c   |  2 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c      | 20 +++++++---
>>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c   |  3 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c    |  3 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    | 45 +++++++++++++---------
>>  drivers/gpu/drm/amd/amdkfd/kfd_pasid.c             |  4 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_process.c           |  9 ++---
>>  .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |  7 ++--
>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c          |  4 +-
>>  14 files changed, 84 insertions(+), 56 deletions(-)
> [snip]
>
>>> @@ -610,12 +616,15 @@ int pm_send_unmap_queue(struct packet_manager *pm, enum kfd_queue_type type,
>>                                 queue_sel__mes_unmap_queues__perform_request_on_dynamic_queues_only;
>>                 break;
>>         default:
>> -               BUG();
>> -               break;
>> +               WARN(1, "filter %d", mode);
>> +               retval = -EINVAL;
>>         }
>>
>> -       pm->priv_queue->ops.submit_packet(pm->priv_queue);
>> -
>> +err_invalid:
>> +       if (!retval)
>> +               pm->priv_queue->ops.submit_packet(pm->priv_queue);
>> +       else
>> +               pm->priv_queue->ops.rollback_packet(pm->priv_queue);
>
> I don't feel comfortable putting a valid code path under an "err_invalid" label.
> This defeats the purpose of goto statement and common cleanup code,
> making the code unreadable.
>
> [FK] It is common clean-up code that is needed both in the error and success case. If you prefer, I can separate the error cleanup from the normal cleanup, but it will result in some duplication.

I believe the correct way is to do the following because then you
don't need to check retval in common cleanup code, which to me looks
strange.

<start> pm_send_unmap_queue()

        if (retval != 0)
-                goto err_acquire_packet_buffer;
+               goto unlock_mutex;

<...>

+               WARN(1, "filter %d", mode);
+               retval = -EINVAL;
+               goto err_invalid;
        }

        pm->priv_queue->ops.submit_packet(pm->priv_queue);
+      goto unlock_mutex;

+err_invalid:
+      pm->priv_queue->ops.rollback_packet(pm->priv_queue);
-err_acquire_packet_buffer:
+unlock_mutex:
        mutex_unlock(&pm->lock);
        return retval;
}

<end> pm_send_unmap_queue()

>
> Also, the rollback packet function was not in the original code. Why
> did you add it here ?
>
> [FK] With the BUG(), this function would not return in case of an error. Without the BUG it will return with an error code, so it needs to clean up after itself. So it needs to release the space it allocated on the queue. Otherwise the next potential user of the queue will submit garbage.
>
>>  err_acquire_packet_buffer:
>>         mutex_unlock(&pm->lock);
>>         return retval;
> [snip]
>
>> @@ -202,10 +200,10 @@ static void kfd_process_destroy_delayed(struct rcu_head *rcu)
>>         struct kfd_process_release_work *work;
>>         struct kfd_process *p;
>>
>> -       BUG_ON(!kfd_process_wq);
>> +       WARN_ON(!kfd_process_wq);
> I think this is redundant, as kfd_process_wq is later derefernced
> inside queue_work (as *wq). So we will get a violation there anyway.
>
> [FK] OK.
>
> Regards,
>   Felix
>
>>
>>         p = container_of(rcu, struct kfd_process, rcu);
>> -       BUG_ON(atomic_read(&p->mm->mm_count) <= 0);
>> +       WARN_ON(atomic_read(&p->mm->mm_count) <= 0);
>>
>>         mmdrop(p->mm);
>>
>> @@ -229,7 +227,8 @@ static void kfd_process_notifier_release(struct mmu_notifier *mn,
>>          * mmu_notifier srcu is read locked
>>          */
>>         p = container_of(mn, struct kfd_process, mmu_notifier);
>> -       BUG_ON(p->mm != mm);
>> +       if (WARN_ON(p->mm != mm))
>> +               return;
>>
>>         mutex_lock(&kfd_processes_mutex);
>>         hash_del_rcu(&p->kfd_processes);
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> index f6ecdff..1cae95e 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> @@ -218,8 +218,8 @@ int pqm_create_queue(struct process_queue_manager *pqm,
>>                                                         kq, &pdd->qpd);
>>                 break;
>>         default:
>> -               BUG();
>> -               break;
>> +               WARN(1, "Invalid queue type %d", type);
>> +               retval = -EINVAL;
>>         }
>>
>>         if (retval != 0) {
>> @@ -272,7 +272,8 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid)
>>                 dev = pqn->kq->dev;
>>         if (pqn->q)
>>                 dev = pqn->q->device;
>> -       BUG_ON(!dev);
>> +       if (WARN_ON(!dev))
>> +               return -ENODEV;
>>
>>         pdd = kfd_get_process_device_data(dev, pqm->process);
>>         if (!pdd) {
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> index e5486f4..19ce590 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> @@ -799,10 +799,12 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>         int ret;
>>         uint32_t i;
>>
>> +       if (WARN_ON(dev->kobj_node))
>> +               return -EEXIST;
>> +
>>         /*
>>          * Creating the sysfs folders
>>          */
>> -       BUG_ON(dev->kobj_node);
>>         dev->kobj_node = kfd_alloc_struct(dev->kobj_node);
>>         if (!dev->kobj_node)
>>                 return -ENOMEM;
>> --
>> 2.7.4
>>


More information about the amd-gfx mailing list