[PATCH] drm/amdkfd: Fix some issues at userptr buffer validation process.

Fri Apr 14 03:27:37 UTC 2023

On 4/13/2023 3:08 PM, Felix Kuehling wrote:
> Am 2023-04-12 um 02:14 schrieb Xiaogang.Chen:
>> From: Xiaogang Chen<xiaogang.chen at amd.com>
>>
>> Notice userptr buffer restore process has following issues:
>>
>> 1: amdgpu_ttm_tt_get_user_pages can fail(-EFAULT). If it failed we 
>> should not set
>> it valid(mem->invalid = 0). In this case mem has no associated hmm 
>> range or user_pages
>> associated.
>
> We don't want to suspend the process indefinitely when this happens. 
> This can happen if usermode calls munmap before unregistering the 
> userptr. What we want to happen in this case is, the process should 
> resume. If it accesses the virtual address, it will result in a page 
> fault, which alerts the application to its mistake. If it doesn't 
> access the virtual address, then there is no harm.
>
> It's a good catch that there is no useful hmm_range in this case to 
> check validity, so we should not warn about it in 
> confirm_valid_user_pages_locked.
>
Not sure why you said "suspend the process indefinitely". When 
mem(kgd_mem) has no hmm_range due to amdgpu_ttm_tt_get_user_pages fail 
the patch does not mark it valid( set mem->invalid != 0) and keep it at 
userptr_inval_list. The process has not been suspended.

Yes, in this customer application case amdgpu_ttm_tt_get_user_pages 
failed at vma_lookup. I think it unmap its buffer before unregister from 
KFD. This patch handles amdgpu_ttm_tt_get_user_pages in general: not 
mark it valid(mem->invalid != 0), keep it at userptr_inval_list, then at 
confirm_valid_user_pages_locked, check if mem has hmm range associated 
before WARN.

>> 2: mmu notifier can happen concurrently and update 
>> mem->range->notifier->invalidate_seq,
>> but not mem->range->notifier_seq. That causes 
>> mem->range->notifier_seq stale
>> when mem is in process_info->userptr_inval_list and 
>> amdgpu_amdkfd_restore_userptr_worker
>> got interrupted. At next rescheduled next attempt we use stale 
>> mem->range->notifier_seq
>> to compare with mem->range->notifier->invalidate_seq.
>
> amdgpu_hmm_range_get_pages updates mem->range->notifier_seq with the 
> current mem->range->notifier->invalidate_seq. If an eviction happens 
> after this, there is a collision and the range needs to be 
> revalidated. I think when you say "mem->range->notifier_seq is stale", 
> it means there was a collision. When this happens, mem->invalid should 
> be set to true at the same time. So confirm_valid_user_pages_locked 
> should not complain because mem->invalid and 
> amdgpu_ttm_tt_get_user_pages_done should agree that the range is invalid.
>
Yes, "mem->range->notifier_seq is stale" means it is different from 
mem->range->notifier_seq. It is caused by mmu notifier happened 
concurrently on same buffer that is still in restore process. For this 
case the patch update mem->range->notifier_seq:

+            if (mem->range)
+                mem->range->notifier_seq = 
mem->range->notifier->invalidate_seq;

Then restore process will skip confirm_valid_user_pages_locked, jump to 
next scheduled attempt: "goto unlock_notifier_out".

> "At next rescheduled next attempt we use stale 
> mem->range->notifier_seq": This is not really stale. The notifier_seq 
> indicates whether the pages returned by the last call to 
> amdgpu_hmm_range_get_pages are still valid. If it's "stale", it means 
> an invalidation (evict_userptr) happened and we need to 
> amdgpu_hmm_range_get_pages again. In theory, if an invalidation 
> happened since the last call, then mem->invalid should also be true. 
> So again, the sequence numbers and mem->invalid should agree and there 
> should be no warning.
>
When invalidation (evict_userptr) happen concurrently restore process 
will schedule next attempt. The mem->invalid is set to true by 
evict_userptr, also the sequence numbers. Both agree and with this patch 
we do not see WARN.
> The warning messages printed in confirm_valid_user_pages_locked 
> indicate that there is a mismatch between the sequence numbers and 
> mem->invalid. As I understand it, such a mismatch should be 
> impossible. Unless there are some bad assumptions in the code. I 
> haven't figured out what those bad assumptions are yet. Other than the 
> case for -EFAULT you pointed out above.
>
 From my debugging this customer app the warnings is due to we did not 
handle well if amdgpu_hmm_range_get_pages return -EFAULT and mmu 
notifier happen on same buffer concurrently. That lead we use mem 
without hmm range associated and stale mem->range->notifier_seq for 
following operations. With this patch there is no warning messages and 
not see other errors.

Xiaogang

> Regards,
>   Felix
>
>
>>
>> Signed-off-by: Xiaogang Chen<Xiaogang.Chen at amd.com>
>> ---
>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 45 +++++++++++++++----
>>   1 file changed, 37 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 7b1f5933ebaa..6881f1b0844c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -2444,7 +2444,9 @@ static int update_invalid_user_pages(struct 
>> amdkfd_process_info *process_info,
>>               ret = -EAGAIN;
>>               goto unlock_out;
>>           }
>> -        mem->invalid = 0;
>> +         /* set mem valid if mem has hmm range associated */
>> +        if (mem->range)
>> +            mem->invalid = 0;
>>       }
>>     unlock_out:
>> @@ -2576,16 +2578,28 @@ static int 
>> confirm_valid_user_pages_locked(struct amdkfd_process_info *process_i
>>       list_for_each_entry_safe(mem, tmp_mem,
>>                    &process_info->userptr_inval_list,
>>                    validate_list.head) {
>> -        bool valid = amdgpu_ttm_tt_get_user_pages_done(
>> -                mem->bo->tbo.ttm, mem->range);
>> +        /* Only check mem with hmm range associated */
>> +        bool valid;
>>   -        mem->range = NULL;
>> -        if (!valid) {
>> -            WARN(!mem->invalid, "Invalid BO not marked invalid");
>> +        if (mem->range) {
>> +            valid = amdgpu_ttm_tt_get_user_pages_done(
>> +                    mem->bo->tbo.ttm, mem->range);
>> +
>> +            mem->range = NULL;
>> +            if (!valid) {
>> +                WARN(!mem->invalid, "Invalid BO not marked invalid");
>> +                ret = -EAGAIN;
>> +                continue;
>> +            }
>> +        } else
>> +            /* keep mem without hmm range at userptr_inval_list */
>> +            continue;
>> +
>> +        if (mem->invalid) {
>> +            WARN(1, "Valid BO is marked invalid");
>>               ret = -EAGAIN;
>>               continue;
>>           }
>> -        WARN(mem->invalid, "Valid BO is marked invalid");
>>             list_move_tail(&mem->validate_list.head,
>>                      &process_info->userptr_valid_list);
>> @@ -2644,8 +2658,23 @@ static void 
>> amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
>>        * reference counting inside KFD will handle this case.
>>        */
>>       mutex_lock(&process_info->notifier_lock);
>> -    if (process_info->evicted_bos != evicted_bos)
>> +    if (process_info->evicted_bos != evicted_bos) {
>> +        /* mmu notifier interrupted 
>> amdgpu_amdkfd_restore_userptr_worker
>> +         * before reschedule next attempt update stale 
>> mem->range->notifier_seq
>> +         * inside userptr_inval_list
>> +         */
>> +        struct kgd_mem *mem, *tmp_mem;
>> +
>> +        list_for_each_entry_safe(mem, tmp_mem,
>> +                &process_info->userptr_inval_list,
>> +                validate_list.head) {
>> +
>> +            if (mem->range)
>> +                mem->range->notifier_seq = 
>> mem->range->notifier->invalidate_seq;
>> +        }
>> +
>>           goto unlock_notifier_out;
>> +    }
>>         if (confirm_valid_user_pages_locked(process_info)) {
>>           WARN(1, "User pages unexpectedly invalid");