[PATCH v5 3/5] drm/amdkfd: set activated flag true when event age unmatchs

Felix Kuehling felix.kuehling at amd.com
Mon Jun 12 16:31:27 UTC 2023


Testing for intermittent failures or race conditions is not easy. If we 
create such a test, we need to make sure it can catch the problem when 
not using the event ages, just to know that the test is good enough.

I guess it could be a parametrized test that can run with or without 
event age. Without event age, we'd expect to catch a timeout. Not 
catching a timeout would be a test failure (indicating that the test is 
not good enough). With event age it should not time out, i.e. a timeout 
would be considered a failure in this case (indicating a problem with 
the event age mechanism).

That said, I'd feel better about a ROCr test that doesn't just cover the 
KFD event age mechanism, but also its use in the ROCr implementation of 
HSA signal waiting.

Regards,
   Felix


Am 2023-06-12 um 12:19 schrieb Yat Sin, David:
> [AMD Official Use Only - General]
>
> The current ROCr patches already address my previous feedback. I am ok with the current ROCr patches.
>
> Currently, there is no ROCrtst that would stress this multiple-waiters issue. I was thinking something like the KFDTest, but with by calling the waiters from different threads. @Zhu, James Would you have time to look into this?
>
> ~David
>
>> -----Original Message-----
>> From: Kuehling, Felix <Felix.Kuehling at amd.com>
>> Sent: Friday, June 9, 2023 6:44 PM
>> To: Zhu, James <James.Zhu at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Yat Sin, David <David.YatSin at amd.com>; Zhu, James
>> <James.Zhu at amd.com>
>> Subject: Re: [PATCH v5 3/5] drm/amdkfd: set activated flag true when event
>> age unmatchs
>>
>>   From the KFD perspective, the series is
>>
>> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>
>> David, I looked at the ROCr and Thunk changes as well, and they look
>> reasonable to me. Do you have any feedback on these patches from a ROCr
>> point of view? Is there a reasonable stress test that could be used check that
>> this handles the race conditions as expected and allows all waiters to sleep?
>>
>> Regards,
>>     Felix
>>
>>
>> On 2023-06-09 16:43, James Zhu wrote:
>>> Set waiter's activated flag true when event age unmatchs with
>> last_event_age.
>>> -v4: add event type check
>>> -v5: improve on event age enable and activated flags
>>>
>>> Signed-off-by: James Zhu <James.Zhu at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdkfd/kfd_events.c | 17 +++++++++++++----
>>>    1 file changed, 13 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> index c7689181cc22..b2586a1dd35d 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> @@ -41,6 +41,7 @@ struct kfd_event_waiter {
>>>      wait_queue_entry_t wait;
>>>      struct kfd_event *event; /* Event to wait for */
>>>      bool activated;          /* Becomes true when event is signaled */
>>> +   bool event_age_enabled;  /* set to true when last_event_age is
>>> +non-zero */
>>>    };
>>>
>>>    /*
>>> @@ -797,9 +798,9 @@ static struct kfd_event_waiter
>>> *alloc_event_waiters(uint32_t num_events)
>>>
>>>    static int init_event_waiter(struct kfd_process *p,
>>>              struct kfd_event_waiter *waiter,
>>> -           uint32_t event_id)
>>> +           struct kfd_event_data *event_data)
>>>    {
>>> -   struct kfd_event *ev = lookup_event_by_id(p, event_id);
>>> +   struct kfd_event *ev = lookup_event_by_id(p, event_data->event_id);
>>>
>>>      if (!ev)
>>>              return -EINVAL;
>>> @@ -808,6 +809,15 @@ static int init_event_waiter(struct kfd_process *p,
>>>      waiter->event = ev;
>>>      waiter->activated = ev->signaled;
>>>      ev->signaled = ev->signaled && !ev->auto_reset;
>>> +
>>> +   /* last_event_age = 0 reserved for backward compatible */
>>> +   if (waiter->event->type == KFD_EVENT_TYPE_SIGNAL &&
>>> +           event_data->signal_event_data.last_event_age) {
>>> +           waiter->event_age_enabled = true;
>>> +           if (ev->event_age != event_data-
>>> signal_event_data.last_event_age)
>>> +                   waiter->activated = true;
>>> +   }
>>> +
>>>      if (!waiter->activated)
>>>              add_wait_queue(&ev->wq, &waiter->wait);
>>>      spin_unlock(&ev->lock);
>>> @@ -948,8 +958,7 @@ int kfd_wait_on_events(struct kfd_process *p,
>>>                      goto out_unlock;
>>>              }
>>>
>>> -           ret = init_event_waiter(p, &event_waiters[i],
>>> -                                   event_data.event_id);
>>> +           ret = init_event_waiter(p, &event_waiters[i], &event_data);
>>>              if (ret)
>>>                      goto out_unlock;
>>>      }


More information about the amd-gfx mailing list