[PATCH 3/3] drm/amdkfd: don't sleep when event age unmatch

James Zhu jamesz at amd.com
Thu Jun 1 22:06:18 UTC 2023


On 2023-06-01 17:17, Felix Kuehling wrote:
> On 2023-06-01 16:47, James Zhu wrote:
>> Don't sleep when event age unmatch, and update last_event_age.
>> It is only for KFD_EVENT_TYPE_SIGNAL which is checked by user space.
>>
>> Signed-off-by: James Zhu <James.Zhu at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_events.c | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> index b27a79c5f826..23e154811471 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> @@ -964,6 +964,19 @@ int kfd_wait_on_events(struct kfd_process *p,
>>                       event_data.event_id);
>>           if (ret)
>>               goto out_unlock;
>> +
>> +        /* last_event_age = 0 reserved for backward compatible */
>> +        if (event_data.last_event_age &&
>> +            event_waiters[i].event->event_age != 
>> event_data.last_event_age) {
>> +            event_data.last_event_age = 
>> event_waiters[i].event->event_age;
>> +            WRITE_ONCE(event_waiters[i].activated, true);
>
> I think this is probably not necessary if you're returning before the 
> first call to test_event_condition.

[JZ] Currently, the returning is with test_event_condition. Since some 
conditions needs return with all events signaled.

so all waiters need check and update if any event interrupts are missing 
here(unmatched event age).

>
>
>> +
>> +            if (copy_to_user(&events[i], &event_data,
>> +                sizeof(struct kfd_event_data))) {
>> +                ret = -EFAULT;
>> +                goto out_unlock;
>> +            }
>> +        }
>
> When waiting on multiple events, it would be more efficient to catch 
> all events with outdated age in a single system call, instead of 
> returning after finding the first one. Then return after the loop if 
> it found one or more outdated events.
[JZ] Yes, the code is working in this way, when all events' waiters are 
activated, the following test_event_condition == 
KFD_IOC_WAIT_RESULT_COMPLETE, then return to user mode without sleep.
>
>
> Also EFAULT is the wrong error code. It means "bad address". I think 
> we could return 0 here because there is really no error. It's just a 
> normal condition to allow user mode to update its event information 
> before going to sleep. If you want a non-0 return code, I'd recommend 
> -EDEADLK, because sleeping without outdated event information can lead 
> to deadlocks. We'd also need to translate that into a meaningful 
> status code in the Thunk to let ROCr know what's going on. I don't see 
> a suitable status code in the current Thunk API for this.
[JZ] Basically, this failure is for copy_to_user. It will lead to busy 
wait instead of deadlock.
> **
> Regards,
>   Felix
>
>
>>       }
>>         /* Check condition once. */
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20230601/9100de18/attachment-0001.htm>


More information about the amd-gfx mailing list