[PATCH 3/3] drm/amdkfd: don't sleep when event age unmatch
James Zhu
jamesz at amd.com
Thu Jun 1 22:06:18 UTC 2023
On 2023-06-01 17:17, Felix Kuehling wrote:
> On 2023-06-01 16:47, James Zhu wrote:
>> Don't sleep when event age unmatch, and update last_event_age.
>> It is only for KFD_EVENT_TYPE_SIGNAL which is checked by user space.
>>
>> Signed-off-by: James Zhu <James.Zhu at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdkfd/kfd_events.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> index b27a79c5f826..23e154811471 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> @@ -964,6 +964,19 @@ int kfd_wait_on_events(struct kfd_process *p,
>> event_data.event_id);
>> if (ret)
>> goto out_unlock;
>> +
>> + /* last_event_age = 0 reserved for backward compatible */
>> + if (event_data.last_event_age &&
>> + event_waiters[i].event->event_age !=
>> event_data.last_event_age) {
>> + event_data.last_event_age =
>> event_waiters[i].event->event_age;
>> + WRITE_ONCE(event_waiters[i].activated, true);
>
> I think this is probably not necessary if you're returning before the
> first call to test_event_condition.
[JZ] Currently, the returning is with test_event_condition. Since some
conditions needs return with all events signaled.
so all waiters need check and update if any event interrupts are missing
here(unmatched event age).
>
>
>> +
>> + if (copy_to_user(&events[i], &event_data,
>> + sizeof(struct kfd_event_data))) {
>> + ret = -EFAULT;
>> + goto out_unlock;
>> + }
>> + }
>
> When waiting on multiple events, it would be more efficient to catch
> all events with outdated age in a single system call, instead of
> returning after finding the first one. Then return after the loop if
> it found one or more outdated events.
[JZ] Yes, the code is working in this way, when all events' waiters are
activated, the following test_event_condition ==
KFD_IOC_WAIT_RESULT_COMPLETE, then return to user mode without sleep.
>
>
> Also EFAULT is the wrong error code. It means "bad address". I think
> we could return 0 here because there is really no error. It's just a
> normal condition to allow user mode to update its event information
> before going to sleep. If you want a non-0 return code, I'd recommend
> -EDEADLK, because sleeping without outdated event information can lead
> to deadlocks. We'd also need to translate that into a meaningful
> status code in the Thunk to let ROCr know what's going on. I don't see
> a suitable status code in the current Thunk API for this.
[JZ] Basically, this failure is for copy_to_user. It will lead to busy
wait instead of deadlock.
> **
> Regards,
> Felix
>
>
>> }
>> /* Check condition once. */
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20230601/9100de18/attachment-0001.htm>
More information about the amd-gfx
mailing list