[PATCH] drm/amdkfd: Fix some issues at userptr buffer validation process.

Thu Apr 20 05:28:15 UTC 2023

On 4/18/2023 6:17 PM, Felix Kuehling wrote:
>
> On 2023-04-13 23:27, Chen, Xiaogang wrote:
>>
>> On 4/13/2023 3:08 PM, Felix Kuehling wrote:
>>> Am 2023-04-12 um 02:14 schrieb Xiaogang.Chen:
>>>> From: Xiaogang Chen<xiaogang.chen at amd.com>
>>>>
>>>> Notice userptr buffer restore process has following issues:
>>>>
>>>> 1: amdgpu_ttm_tt_get_user_pages can fail(-EFAULT). If it failed we 
>>>> should not set
>>>> it valid(mem->invalid = 0). In this case mem has no associated hmm 
>>>> range or user_pages
>>>> associated.
>>>
>>> We don't want to suspend the process indefinitely when this happens. 
>>> This can happen if usermode calls munmap before unregistering the 
>>> userptr. What we want to happen in this case is, the process should 
>>> resume. If it accesses the virtual address, it will result in a page 
>>> fault, which alerts the application to its mistake. If it doesn't 
>>> access the virtual address, then there is no harm.
>>>
>>> It's a good catch that there is no useful hmm_range in this case to 
>>> check validity, so we should not warn about it in 
>>> confirm_valid_user_pages_locked.
>>>
>> Not sure why you said "suspend the process indefinitely". When 
>> mem(kgd_mem) has no hmm_range due to amdgpu_ttm_tt_get_user_pages 
>> fail the patch does not mark it valid( set mem->invalid != 0) and 
>> keep it at userptr_inval_list. The process has not been suspended.
>
> User mode queues are stopped. Until the queues are restarted, the 
> process is effectively suspended (for GPU execution). If invalid 
> userptr mappings cause restore to fail, that means, the GPU queues 
> will remain stopped. That's what I mean with "suspend the process 
> indefinitely".
>
>
My understanding your concern is that I have restore process reschedule 
itself indefinitely. At confirm_valid_user_pages_locked the real thing 
is if mem has hmm range associated. If it has and mem->invalid if true I 
will reschedule next attempt. If mem has no hmm range it will be kept at 
invalid list and not trigger reschedule.
>>
>> Yes, in this customer application case amdgpu_ttm_tt_get_user_pages 
>> failed at vma_lookup. I think it unmap its buffer before unregister 
>> from KFD. This patch handles amdgpu_ttm_tt_get_user_pages in general: 
>> not mark it valid(mem->invalid != 0), keep it at userptr_inval_list, 
>> then at confirm_valid_user_pages_locked, check if mem has hmm range 
>> associated before WARN.
>
> I think it would be easier to just mark it as valid. mem->invalid == 0 
> means, it's safe to resume the user mode queues. For userptrs without 
> a valid VMA this is the case as the corresponding page table entries 
> have been invalidated (V=0).
>
I have mem->invalid != 0 because it has no hmm range associated and stay 
at invalid list. I think that keep consistency with its status.
>
>>
>>>> 2: mmu notifier can happen concurrently and update 
>>>> mem->range->notifier->invalidate_seq,
>>>> but not mem->range->notifier_seq. That causes 
>>>> mem->range->notifier_seq stale
>>>> when mem is in process_info->userptr_inval_list and 
>>>> amdgpu_amdkfd_restore_userptr_worker
>>>> got interrupted. At next rescheduled next attempt we use stale 
>>>> mem->range->notifier_seq
>>>> to compare with mem->range->notifier->invalidate_seq.
>>>
>>> amdgpu_hmm_range_get_pages updates mem->range->notifier_seq with the 
>>> current mem->range->notifier->invalidate_seq. If an eviction happens 
>>> after this, there is a collision and the range needs to be 
>>> revalidated. I think when you say "mem->range->notifier_seq is 
>>> stale", it means there was a collision. When this happens, 
>>> mem->invalid should be set to true at the same time. So 
>>> confirm_valid_user_pages_locked should not complain because 
>>> mem->invalid and amdgpu_ttm_tt_get_user_pages_done should agree that 
>>> the range is invalid.
>>>
>> Yes, "mem->range->notifier_seq is stale" means it is different from 
>> mem->range->notifier_seq. It is caused by mmu notifier happened 
>> concurrently on same buffer that is still in restore process. For 
>> this case the patch update mem->range->notifier_seq:
>>
>> +            if (mem->range)
>> +                mem->range->notifier_seq = 
>> mem->range->notifier->invalidate_seq;
>
> You should not update mem->range->notifier_seq without also getting an 
> up-to-date page address list. Matching sequence numbers indicate that 
> your page list is up to date. If you just update the sequence number, 
> you're basically lying to yourself.
>
> You need to call amdgpu_hmm_range_get_pages to get an updated page 
> list and update the mem->range->notifier_seq at the same time. There 
> is no need to do this in more than one place. That's in 
> update_invalid_user_pages.
>
>
ok, it maybe redundant. At next round I will remove it, depend on next 
scheduled attempt to update mem->range->notifier_seq by 
amdgpu_ttm_tt_get_user_pages.
>>
>> Then restore process will skip confirm_valid_user_pages_locked, jump 
>> to next scheduled attempt: "goto unlock_notifier_out".
>>
>>> "At next rescheduled next attempt we use stale 
>>> mem->range->notifier_seq": This is not really stale. The 
>>> notifier_seq indicates whether the pages returned by the last call 
>>> to amdgpu_hmm_range_get_pages are still valid. If it's "stale", it 
>>> means an invalidation (evict_userptr) happened and we need to 
>>> amdgpu_hmm_range_get_pages again. In theory, if an invalidation 
>>> happened since the last call, then mem->invalid should also be true. 
>>> So again, the sequence numbers and mem->invalid should agree and 
>>> there should be no warning.
>>>
>> When invalidation (evict_userptr) happen concurrently restore process 
>> will schedule next attempt. The mem->invalid is set to true by 
>> evict_userptr, also the sequence numbers. Both agree and with this 
>> patch we do not see WARN.
>
> Why do they disagree without this patch? I think what you did there 
> (updating the sequence number without getting updated pages) is 
> incorrect. If the sequence number and mem->invalid are updated 
> together under the same lock, they should always agree. There should 
> be no need to mess with the sequence numbers after the fact.
>
I did not mean mem->invalid and sequence number disagree. I mean 
mem->range->notifier_seq and mem->range->notifier->invalidate_seq 
disagree. We can update mem->range->notifier_seq at next attempt.
> Regards,
>   Felix
>
>
>>> The warning messages printed in confirm_valid_user_pages_locked 
>>> indicate that there is a mismatch between the sequence numbers and 
>>> mem->invalid. As I understand it, such a mismatch should be 
>>> impossible. Unless there are some bad assumptions in the code. I 
>>> haven't figured out what those bad assumptions are yet. Other than 
>>> the case for -EFAULT you pointed out above.
>>>
>> From my debugging this customer app the warnings is due to we did not 
>> handle well if amdgpu_hmm_range_get_pages return -EFAULT and mmu 
>> notifier happen on same buffer concurrently. That lead we use mem 
>> without hmm range associated and stale mem->range->notifier_seq for 
>> following operations. With this patch there is no warning messages 
>> and not see other errors.
>>
>> Xiaogang
>>
>>> Regards,
>>>   Felix
>>>
>>>
>>>>
>>>> Signed-off-by: Xiaogang Chen<Xiaogang.Chen at amd.com>
>>>> ---
>>>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 45 
>>>> +++++++++++++++----
>>>>   1 file changed, 37 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> index 7b1f5933ebaa..6881f1b0844c 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> @@ -2444,7 +2444,9 @@ static int update_invalid_user_pages(struct 
>>>> amdkfd_process_info *process_info,
>>>>               ret = -EAGAIN;
>>>>               goto unlock_out;
>>>>           }
>>>> -        mem->invalid = 0;
>>>> +         /* set mem valid if mem has hmm range associated */
>>>> +        if (mem->range)
>>>> +            mem->invalid = 0;
>>>>       }
>>>>     unlock_out:
>>>> @@ -2576,16 +2578,28 @@ static int 
>>>> confirm_valid_user_pages_locked(struct amdkfd_process_info *process_i
>>>>       list_for_each_entry_safe(mem, tmp_mem,
>>>>                    &process_info->userptr_inval_list,
>>>>                    validate_list.head) {
>>>> -        bool valid = amdgpu_ttm_tt_get_user_pages_done(
>>>> -                mem->bo->tbo.ttm, mem->range);
>>>> +        /* Only check mem with hmm range associated */
>>>> +        bool valid;
>>>>   -        mem->range = NULL;
>>>> -        if (!valid) {
>>>> -            WARN(!mem->invalid, "Invalid BO not marked invalid");
>>>> +        if (mem->range) {
>>>> +            valid = amdgpu_ttm_tt_get_user_pages_done(
>>>> +                    mem->bo->tbo.ttm, mem->range);
>>>> +
>>>> +            mem->range = NULL;
>>>> +            if (!valid) {
>>>> +                WARN(!mem->invalid, "Invalid BO not marked invalid");
>>>> +                ret = -EAGAIN;
>>>> +                continue;
>>>> +            }
>>>> +        } else
>>>> +            /* keep mem without hmm range at userptr_inval_list */
>>>> +            continue;
>>>> +
>>>> +        if (mem->invalid) {
>>>> +            WARN(1, "Valid BO is marked invalid");
>>>>               ret = -EAGAIN;
>>>>               continue;
>>>>           }
>>>> -        WARN(mem->invalid, "Valid BO is marked invalid");
>>>>             list_move_tail(&mem->validate_list.head,
>>>> &process_info->userptr_valid_list);
>>>> @@ -2644,8 +2658,23 @@ static void 
>>>> amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
>>>>        * reference counting inside KFD will handle this case.
>>>>        */
>>>>       mutex_lock(&process_info->notifier_lock);
>>>> -    if (process_info->evicted_bos != evicted_bos)
>>>> +    if (process_info->evicted_bos != evicted_bos) {
>>>> +        /* mmu notifier interrupted 
>>>> amdgpu_amdkfd_restore_userptr_worker
>>>> +         * before reschedule next attempt update stale 
>>>> mem->range->notifier_seq
>>>> +         * inside userptr_inval_list
>>>> +         */
>>>> +        struct kgd_mem *mem, *tmp_mem;
>>>> +
>>>> +        list_for_each_entry_safe(mem, tmp_mem,
>>>> +                &process_info->userptr_inval_list,
>>>> +                validate_list.head) {
>>>> +
>>>> +            if (mem->range)
>>>> +                mem->range->notifier_seq = 
>>>> mem->range->notifier->invalidate_seq;
>>>> +        }
>>>> +
>>>>           goto unlock_notifier_out;
>>>> +    }
>>>>         if (confirm_valid_user_pages_locked(process_info)) {
>>>>           WARN(1, "User pages unexpectedly invalid");