[PATCH 3/3] drm/amdkfd: don't sleep when event age unmatch

Felix Kuehling felix.kuehling at amd.com
Thu Jun 1 21:17:34 UTC 2023


On 2023-06-01 16:47, James Zhu wrote:
> Don't sleep when event age unmatch, and update last_event_age.
> It is only for KFD_EVENT_TYPE_SIGNAL which is checked by user space.
>
> Signed-off-by: James Zhu <James.Zhu at amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_events.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index b27a79c5f826..23e154811471 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -964,6 +964,19 @@ int kfd_wait_on_events(struct kfd_process *p,
>   					event_data.event_id);
>   		if (ret)
>   			goto out_unlock;
> +
> +		/* last_event_age = 0 reserved for backward compatible */
> +		if (event_data.last_event_age &&
> +			event_waiters[i].event->event_age != event_data.last_event_age) {
> +			event_data.last_event_age = event_waiters[i].event->event_age;
> +			WRITE_ONCE(event_waiters[i].activated, true);

I think this is probably not necessary if you're returning before the 
first call to test_event_condition.


> +
> +			if (copy_to_user(&events[i], &event_data,
> +				sizeof(struct kfd_event_data))) {
> +				ret = -EFAULT;
> +				goto out_unlock;
> +			}
> +		}

When waiting on multiple events, it would be more efficient to catch all 
events with outdated age in a single system call, instead of returning 
after finding the first one. Then return after the loop if it found one 
or more outdated events.

Also EFAULT is the wrong error code. It means "bad address". I think we 
could return 0 here because there is really no error. It's just a normal 
condition to allow user mode to update its event information before 
going to sleep. If you want a non-0 return code, I'd recommend -EDEADLK, 
because sleeping without outdated event information can lead to 
deadlocks. We'd also need to translate that into a meaningful status 
code in the Thunk to let ROCr know what's going on. I don't see a 
suitable status code in the current Thunk API for this.

Regards,
   Felix


>   	}
>   
>   	/* Check condition once. */


More information about the amd-gfx mailing list