[PATCH] drm/amdkfd: improve performance with XNACK enable
James Zhu
jamesz at amd.com
Thu May 8 17:25:51 UTC 2025
On 2025-05-08 11:20, James Zhu wrote:
>
> On 2025-05-08 10:50, Christian König wrote:
>> On 5/8/25 16:46, James Zhu wrote:
>>> When XNACK on, hang or low performance is observed with some test
>>> cases.
>>> The restoring page process has unexpected stuck during
>>> evicting/restoring
>>> if some bo's flag has KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED setting:
>>> 1. when xnack on, retry pagefault will invoke restoring pages process
>>> 2. A. if there is enough VRAM space, simply migrating pages from ram
>>> to vram
>>> B. if there is no enough VRAM space left, searching resource LRU
>>> list, and
>>> scheduling a new eviction work queue to evict LRU bo from
>>> vram to ram
>>> first, then resume restoring pages process, or waiting for
>>> eviction
>>> timeout and try to schedule evicting next LRU bo
>>> 3. for case 2B, if bo has KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED setting,
>>> queue eviction will be triggered.So restoring work queue will be
>>> scheduled.
>>> 4. step 1, restoring pages process will hold one mm->mmap_lock's
>>> read until
>>> restoring pages is completed
>>> step 2B, evictiion work queue process will hold one
>>> mm->mmap_lock's read
>>> until evicting bo is completed
>>> step 3, restoring work queue process is trying to acquire one
>>> mm->mmap_lock's
>>> write after the above two mm->mmap_lock's read are released, and
>>> in the
>>> meantime which will block all following mm->mmap_lock's read
>>> request.
>>> 5. in step 2, if the first eviction bo's size is big enough for step 1
>>> restoring pages request, everything is fine. if not, which means
>>> that the
>>> mm->mmap_lock's read step 1 won't be release right the way. In
>>> step 3, first
>>> eviction bo's restoring work queue will compete for
>>> mm->mmap_lock's write,
>>> the second and following LRU bo's evictiion work queue will be
>>> blocked by
>>> tring to acquire mm->mmap_lock's read until timeout. All
>>> restoring pages
>>> process will be stuck here.
>>> Using down_write_trylock to replace mmap_write_lock will help not
>>> block the
>>> second and following evictiion work queue process.
>>>
>>> Signed-off-by: James Zhu <James.Zhu at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 +++++-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> index 72be6e152e88..5f6ed70559b7 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> @@ -1794,7 +1794,11 @@ svm_range_list_lock_and_flush_work(struct
>>> svm_range_list *svms,
>>> {
>>> retry_flush_work:
>>> flush_work(&svms->deferred_list_work);
>>> - mmap_write_lock(mm);
>>> + while (true) {
>>> + if (down_write_trylock(&(mm->mmap_lock)))
>>> + break;
>>> + schedule();
>>> + }
>> Oh, stuff like that is usually an absolutely clear NAK from upstream.
>>
>> As far as I know that is not something we can do so easily.
>>
>> Would it be possible to wait for progress instead of calling
>> schedule() here?
>
> [JZ] At 1st beginning, I am thinking adding sync with restoring pages
> done.
>
> but the original restoring work design philosophy is blindly scheduled
> after certain delay.
>
> the changes with sync may take more time and risk. I would like Felix
> and Philip give comments
>
> if there is efficient and safe way to fix it. Thanks!
[JZ] BTW, in worse case, mmap_write_lock will fall into
rwsem_down_write_slowpath(), schedule_preempt_disabled() and schedule();
>
>>
>> Regards,
>> Christian.
>>
>>> if (list_empty(&svms->deferred_range_list))
>>> return;
More information about the amd-gfx
mailing list