[PATCH v5 3/5] drm/amdkfd: set activated flag true when event age unmatchs
Felix Kuehling
felix.kuehling at amd.com
Mon Jun 12 16:31:27 UTC 2023
Testing for intermittent failures or race conditions is not easy. If we
create such a test, we need to make sure it can catch the problem when
not using the event ages, just to know that the test is good enough.
I guess it could be a parametrized test that can run with or without
event age. Without event age, we'd expect to catch a timeout. Not
catching a timeout would be a test failure (indicating that the test is
not good enough). With event age it should not time out, i.e. a timeout
would be considered a failure in this case (indicating a problem with
the event age mechanism).
That said, I'd feel better about a ROCr test that doesn't just cover the
KFD event age mechanism, but also its use in the ROCr implementation of
HSA signal waiting.
Regards,
Felix
Am 2023-06-12 um 12:19 schrieb Yat Sin, David:
> [AMD Official Use Only - General]
>
> The current ROCr patches already address my previous feedback. I am ok with the current ROCr patches.
>
> Currently, there is no ROCrtst that would stress this multiple-waiters issue. I was thinking something like the KFDTest, but with by calling the waiters from different threads. @Zhu, James Would you have time to look into this?
>
> ~David
>
>> -----Original Message-----
>> From: Kuehling, Felix <Felix.Kuehling at amd.com>
>> Sent: Friday, June 9, 2023 6:44 PM
>> To: Zhu, James <James.Zhu at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Yat Sin, David <David.YatSin at amd.com>; Zhu, James
>> <James.Zhu at amd.com>
>> Subject: Re: [PATCH v5 3/5] drm/amdkfd: set activated flag true when event
>> age unmatchs
>>
>> From the KFD perspective, the series is
>>
>> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>
>> David, I looked at the ROCr and Thunk changes as well, and they look
>> reasonable to me. Do you have any feedback on these patches from a ROCr
>> point of view? Is there a reasonable stress test that could be used check that
>> this handles the race conditions as expected and allows all waiters to sleep?
>>
>> Regards,
>> Felix
>>
>>
>> On 2023-06-09 16:43, James Zhu wrote:
>>> Set waiter's activated flag true when event age unmatchs with
>> last_event_age.
>>> -v4: add event type check
>>> -v5: improve on event age enable and activated flags
>>>
>>> Signed-off-by: James Zhu <James.Zhu at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdkfd/kfd_events.c | 17 +++++++++++++----
>>> 1 file changed, 13 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> index c7689181cc22..b2586a1dd35d 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> @@ -41,6 +41,7 @@ struct kfd_event_waiter {
>>> wait_queue_entry_t wait;
>>> struct kfd_event *event; /* Event to wait for */
>>> bool activated; /* Becomes true when event is signaled */
>>> + bool event_age_enabled; /* set to true when last_event_age is
>>> +non-zero */
>>> };
>>>
>>> /*
>>> @@ -797,9 +798,9 @@ static struct kfd_event_waiter
>>> *alloc_event_waiters(uint32_t num_events)
>>>
>>> static int init_event_waiter(struct kfd_process *p,
>>> struct kfd_event_waiter *waiter,
>>> - uint32_t event_id)
>>> + struct kfd_event_data *event_data)
>>> {
>>> - struct kfd_event *ev = lookup_event_by_id(p, event_id);
>>> + struct kfd_event *ev = lookup_event_by_id(p, event_data->event_id);
>>>
>>> if (!ev)
>>> return -EINVAL;
>>> @@ -808,6 +809,15 @@ static int init_event_waiter(struct kfd_process *p,
>>> waiter->event = ev;
>>> waiter->activated = ev->signaled;
>>> ev->signaled = ev->signaled && !ev->auto_reset;
>>> +
>>> + /* last_event_age = 0 reserved for backward compatible */
>>> + if (waiter->event->type == KFD_EVENT_TYPE_SIGNAL &&
>>> + event_data->signal_event_data.last_event_age) {
>>> + waiter->event_age_enabled = true;
>>> + if (ev->event_age != event_data-
>>> signal_event_data.last_event_age)
>>> + waiter->activated = true;
>>> + }
>>> +
>>> if (!waiter->activated)
>>> add_wait_queue(&ev->wq, &waiter->wait);
>>> spin_unlock(&ev->lock);
>>> @@ -948,8 +958,7 @@ int kfd_wait_on_events(struct kfd_process *p,
>>> goto out_unlock;
>>> }
>>>
>>> - ret = init_event_waiter(p, &event_waiters[i],
>>> - event_data.event_id);
>>> + ret = init_event_waiter(p, &event_waiters[i], &event_data);
>>> if (ret)
>>> goto out_unlock;
>>> }
More information about the amd-gfx
mailing list