[PATCH] drm/amdkfd: improve performance with XNACK enable

Thu May 8 14:50:56 UTC 2025

On 5/8/25 16:46, James Zhu wrote:
> When XNACK on, hang or low performance is observed with some test cases.
> The restoring page process has unexpected stuck during evicting/restoring
> if some bo's flag has KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED setting:
> 1. when xnack on, retry pagefault will invoke restoring pages process
> 2. A. if there is enough VRAM space, simply migrating pages from ram to vram
>    B. if there is no enough VRAM space left, searching resource LRU list, and
>       scheduling a new eviction work queue to evict LRU bo from vram to ram
>       first, then resume restoring pages process, or waiting for eviction
>       timeout and try to schedule evicting next LRU bo
> 3. for case 2B, if bo has KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED setting,
>    queue eviction will be triggered.So restoring work queue will be scheduled.
> 4. step 1, restoring pages process will hold one mm->mmap_lock's read until
>    restoring pages is completed
>    step 2B, evictiion work queue process will hold one mm->mmap_lock's read
>    until evicting bo is completed
>    step 3, restoring work queue process is trying to acquire one mm->mmap_lock's
>    write after the above two mm->mmap_lock's read are released, and in the
>    meantime which will block all following mm->mmap_lock's read request.
> 5. in step 2, if the first eviction bo's size is big enough for step 1
>    restoring pages request, everything is fine. if not, which means that the
>    mm->mmap_lock's read step 1 won't be release right the way. In step 3, first
>    eviction bo's restoring work queue will compete for mm->mmap_lock's write,
>    the second and following LRU bo's evictiion work queue will be blocked by
>    tring to acquire mm->mmap_lock's read until timeout. All restoring pages
>    process will be stuck here.
> Using down_write_trylock to replace mmap_write_lock will help not block the
> second and following evictiion work queue process.
> 
> Signed-off-by: James Zhu <James.Zhu at amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 72be6e152e88..5f6ed70559b7 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -1794,7 +1794,11 @@ svm_range_list_lock_and_flush_work(struct svm_range_list *svms,
>  {
>  retry_flush_work:
>  	flush_work(&svms->deferred_list_work);
> -	mmap_write_lock(mm);
> +        while (true) {
> +                if (down_write_trylock(&(mm->mmap_lock)))
> +                        break;
> +                schedule();
> +        }

Oh, stuff like that is usually an absolutely clear NAK from upstream.

As far as I know that is not something we can do so easily.

Would it be possible to wait for progress instead of calling schedule() here?

Regards,
Christian.

>  
>  	if (list_empty(&svms->deferred_range_list))
>  		return;