[PATCH] drm/amdkfd: Fix some issues at userptr buffer validation process.

Tue Apr 18 23:17:56 UTC 2023

On 2023-04-13 23:27, Chen, Xiaogang wrote:
>
> On 4/13/2023 3:08 PM, Felix Kuehling wrote:
>> Am 2023-04-12 um 02:14 schrieb Xiaogang.Chen:
>>> From: Xiaogang Chen<xiaogang.chen at amd.com>
>>>
>>> Notice userptr buffer restore process has following issues:
>>>
>>> 1: amdgpu_ttm_tt_get_user_pages can fail(-EFAULT). If it failed we 
>>> should not set
>>> it valid(mem->invalid = 0). In this case mem has no associated hmm 
>>> range or user_pages
>>> associated.
>>
>> We don't want to suspend the process indefinitely when this happens. 
>> This can happen if usermode calls munmap before unregistering the 
>> userptr. What we want to happen in this case is, the process should 
>> resume. If it accesses the virtual address, it will result in a page 
>> fault, which alerts the application to its mistake. If it doesn't 
>> access the virtual address, then there is no harm.
>>
>> It's a good catch that there is no useful hmm_range in this case to 
>> check validity, so we should not warn about it in 
>> confirm_valid_user_pages_locked.
>>
> Not sure why you said "suspend the process indefinitely". When 
> mem(kgd_mem) has no hmm_range due to amdgpu_ttm_tt_get_user_pages fail 
> the patch does not mark it valid( set mem->invalid != 0) and keep it 
> at userptr_inval_list. The process has not been suspended.

User mode queues are stopped. Until the queues are restarted, the 
process is effectively suspended (for GPU execution). If invalid userptr 
mappings cause restore to fail, that means, the GPU queues will remain 
stopped. That's what I mean with "suspend the process indefinitely".

>
> Yes, in this customer application case amdgpu_ttm_tt_get_user_pages 
> failed at vma_lookup. I think it unmap its buffer before unregister 
> from KFD. This patch handles amdgpu_ttm_tt_get_user_pages in general: 
> not mark it valid(mem->invalid != 0), keep it at userptr_inval_list, 
> then at confirm_valid_user_pages_locked, check if mem has hmm range 
> associated before WARN.

I think it would be easier to just mark it as valid. mem->invalid == 0 
means, it's safe to resume the user mode queues. For userptrs without a 
valid VMA this is the case as the corresponding page table entries have 
been invalidated (V=0).

>
>>> 2: mmu notifier can happen concurrently and update 
>>> mem->range->notifier->invalidate_seq,
>>> but not mem->range->notifier_seq. That causes 
>>> mem->range->notifier_seq stale
>>> when mem is in process_info->userptr_inval_list and 
>>> amdgpu_amdkfd_restore_userptr_worker
>>> got interrupted. At next rescheduled next attempt we use stale 
>>> mem->range->notifier_seq
>>> to compare with mem->range->notifier->invalidate_seq.
>>
>> amdgpu_hmm_range_get_pages updates mem->range->notifier_seq with the 
>> current mem->range->notifier->invalidate_seq. If an eviction happens 
>> after this, there is a collision and the range needs to be 
>> revalidated. I think when you say "mem->range->notifier_seq is 
>> stale", it means there was a collision. When this happens, 
>> mem->invalid should be set to true at the same time. So 
>> confirm_valid_user_pages_locked should not complain because 
>> mem->invalid and amdgpu_ttm_tt_get_user_pages_done should agree that 
>> the range is invalid.
>>
> Yes, "mem->range->notifier_seq is stale" means it is different from 
> mem->range->notifier_seq. It is caused by mmu notifier happened 
> concurrently on same buffer that is still in restore process. For this 
> case the patch update mem->range->notifier_seq:
>
> +            if (mem->range)
> +                mem->range->notifier_seq = 
> mem->range->notifier->invalidate_seq;

You should not update mem->range->notifier_seq without also getting an 
up-to-date page address list. Matching sequence numbers indicate that 
your page list is up to date. If you just update the sequence number, 
you're basically lying to yourself.

You need to call amdgpu_hmm_range_get_pages to get an updated page list 
and update the mem->range->notifier_seq at the same time. There is no 
need to do this in more than one place. That's in update_invalid_user_pages.

>
> Then restore process will skip confirm_valid_user_pages_locked, jump 
> to next scheduled attempt: "goto unlock_notifier_out".
>
>> "At next rescheduled next attempt we use stale 
>> mem->range->notifier_seq": This is not really stale. The notifier_seq 
>> indicates whether the pages returned by the last call to 
>> amdgpu_hmm_range_get_pages are still valid. If it's "stale", it means 
>> an invalidation (evict_userptr) happened and we need to 
>> amdgpu_hmm_range_get_pages again. In theory, if an invalidation 
>> happened since the last call, then mem->invalid should also be true. 
>> So again, the sequence numbers and mem->invalid should agree and 
>> there should be no warning.
>>
> When invalidation (evict_userptr) happen concurrently restore process 
> will schedule next attempt. The mem->invalid is set to true by 
> evict_userptr, also the sequence numbers. Both agree and with this 
> patch we do not see WARN.

Why do they disagree without this patch? I think what you did there 
(updating the sequence number without getting updated pages) is 
incorrect. If the sequence number and mem->invalid are updated together 
under the same lock, they should always agree. There should be no need 
to mess with the sequence numbers after the fact.

Regards,
   Felix

>> The warning messages printed in confirm_valid_user_pages_locked 
>> indicate that there is a mismatch between the sequence numbers and 
>> mem->invalid. As I understand it, such a mismatch should be 
>> impossible. Unless there are some bad assumptions in the code. I 
>> haven't figured out what those bad assumptions are yet. Other than 
>> the case for -EFAULT you pointed out above.
>>
> From my debugging this customer app the warnings is due to we did not 
> handle well if amdgpu_hmm_range_get_pages return -EFAULT and mmu 
> notifier happen on same buffer concurrently. That lead we use mem 
> without hmm range associated and stale mem->range->notifier_seq for 
> following operations. With this patch there is no warning messages and 
> not see other errors.
>
> Xiaogang
>
>> Regards,
>>   Felix
>>
>>
>>>
>>> Signed-off-by: Xiaogang Chen<Xiaogang.Chen at amd.com>
>>> ---
>>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 45 
>>> +++++++++++++++----
>>>   1 file changed, 37 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> index 7b1f5933ebaa..6881f1b0844c 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> @@ -2444,7 +2444,9 @@ static int update_invalid_user_pages(struct 
>>> amdkfd_process_info *process_info,
>>>               ret = -EAGAIN;
>>>               goto unlock_out;
>>>           }
>>> -        mem->invalid = 0;
>>> +         /* set mem valid if mem has hmm range associated */
>>> +        if (mem->range)
>>> +            mem->invalid = 0;
>>>       }
>>>     unlock_out:
>>> @@ -2576,16 +2578,28 @@ static int 
>>> confirm_valid_user_pages_locked(struct amdkfd_process_info *process_i
>>>       list_for_each_entry_safe(mem, tmp_mem,
>>>                    &process_info->userptr_inval_list,
>>>                    validate_list.head) {
>>> -        bool valid = amdgpu_ttm_tt_get_user_pages_done(
>>> -                mem->bo->tbo.ttm, mem->range);
>>> +        /* Only check mem with hmm range associated */
>>> +        bool valid;
>>>   -        mem->range = NULL;
>>> -        if (!valid) {
>>> -            WARN(!mem->invalid, "Invalid BO not marked invalid");
>>> +        if (mem->range) {
>>> +            valid = amdgpu_ttm_tt_get_user_pages_done(
>>> +                    mem->bo->tbo.ttm, mem->range);
>>> +
>>> +            mem->range = NULL;
>>> +            if (!valid) {
>>> +                WARN(!mem->invalid, "Invalid BO not marked invalid");
>>> +                ret = -EAGAIN;
>>> +                continue;
>>> +            }
>>> +        } else
>>> +            /* keep mem without hmm range at userptr_inval_list */
>>> +            continue;
>>> +
>>> +        if (mem->invalid) {
>>> +            WARN(1, "Valid BO is marked invalid");
>>>               ret = -EAGAIN;
>>>               continue;
>>>           }
>>> -        WARN(mem->invalid, "Valid BO is marked invalid");
>>>             list_move_tail(&mem->validate_list.head,
>>> &process_info->userptr_valid_list);
>>> @@ -2644,8 +2658,23 @@ static void 
>>> amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
>>>        * reference counting inside KFD will handle this case.
>>>        */
>>>       mutex_lock(&process_info->notifier_lock);
>>> -    if (process_info->evicted_bos != evicted_bos)
>>> +    if (process_info->evicted_bos != evicted_bos) {
>>> +        /* mmu notifier interrupted 
>>> amdgpu_amdkfd_restore_userptr_worker
>>> +         * before reschedule next attempt update stale 
>>> mem->range->notifier_seq
>>> +         * inside userptr_inval_list
>>> +         */
>>> +        struct kgd_mem *mem, *tmp_mem;
>>> +
>>> +        list_for_each_entry_safe(mem, tmp_mem,
>>> +                &process_info->userptr_inval_list,
>>> +                validate_list.head) {
>>> +
>>> +            if (mem->range)
>>> +                mem->range->notifier_seq = 
>>> mem->range->notifier->invalidate_seq;
>>> +        }
>>> +
>>>           goto unlock_notifier_out;
>>> +    }
>>>         if (confirm_valid_user_pages_locked(process_info)) {
>>>           WARN(1, "User pages unexpectedly invalid");