[PATCH] drm/amdkfd: Fix some issues at userptr buffer validation process.
Chen, Xiaogang
xiaogang.chen at amd.com
Thu Apr 20 05:28:15 UTC 2023
On 4/18/2023 6:17 PM, Felix Kuehling wrote:
>
> On 2023-04-13 23:27, Chen, Xiaogang wrote:
>>
>> On 4/13/2023 3:08 PM, Felix Kuehling wrote:
>>> Am 2023-04-12 um 02:14 schrieb Xiaogang.Chen:
>>>> From: Xiaogang Chen<xiaogang.chen at amd.com>
>>>>
>>>> Notice userptr buffer restore process has following issues:
>>>>
>>>> 1: amdgpu_ttm_tt_get_user_pages can fail(-EFAULT). If it failed we
>>>> should not set
>>>> it valid(mem->invalid = 0). In this case mem has no associated hmm
>>>> range or user_pages
>>>> associated.
>>>
>>> We don't want to suspend the process indefinitely when this happens.
>>> This can happen if usermode calls munmap before unregistering the
>>> userptr. What we want to happen in this case is, the process should
>>> resume. If it accesses the virtual address, it will result in a page
>>> fault, which alerts the application to its mistake. If it doesn't
>>> access the virtual address, then there is no harm.
>>>
>>> It's a good catch that there is no useful hmm_range in this case to
>>> check validity, so we should not warn about it in
>>> confirm_valid_user_pages_locked.
>>>
>> Not sure why you said "suspend the process indefinitely". When
>> mem(kgd_mem) has no hmm_range due to amdgpu_ttm_tt_get_user_pages
>> fail the patch does not mark it valid( set mem->invalid != 0) and
>> keep it at userptr_inval_list. The process has not been suspended.
>
> User mode queues are stopped. Until the queues are restarted, the
> process is effectively suspended (for GPU execution). If invalid
> userptr mappings cause restore to fail, that means, the GPU queues
> will remain stopped. That's what I mean with "suspend the process
> indefinitely".
>
>
My understanding your concern is that I have restore process reschedule
itself indefinitely. At confirm_valid_user_pages_locked the real thing
is if mem has hmm range associated. If it has and mem->invalid if true I
will reschedule next attempt. If mem has no hmm range it will be kept at
invalid list and not trigger reschedule.
>>
>> Yes, in this customer application case amdgpu_ttm_tt_get_user_pages
>> failed at vma_lookup. I think it unmap its buffer before unregister
>> from KFD. This patch handles amdgpu_ttm_tt_get_user_pages in general:
>> not mark it valid(mem->invalid != 0), keep it at userptr_inval_list,
>> then at confirm_valid_user_pages_locked, check if mem has hmm range
>> associated before WARN.
>
> I think it would be easier to just mark it as valid. mem->invalid == 0
> means, it's safe to resume the user mode queues. For userptrs without
> a valid VMA this is the case as the corresponding page table entries
> have been invalidated (V=0).
>
I have mem->invalid != 0 because it has no hmm range associated and stay
at invalid list. I think that keep consistency with its status.
>
>>
>>>> 2: mmu notifier can happen concurrently and update
>>>> mem->range->notifier->invalidate_seq,
>>>> but not mem->range->notifier_seq. That causes
>>>> mem->range->notifier_seq stale
>>>> when mem is in process_info->userptr_inval_list and
>>>> amdgpu_amdkfd_restore_userptr_worker
>>>> got interrupted. At next rescheduled next attempt we use stale
>>>> mem->range->notifier_seq
>>>> to compare with mem->range->notifier->invalidate_seq.
>>>
>>> amdgpu_hmm_range_get_pages updates mem->range->notifier_seq with the
>>> current mem->range->notifier->invalidate_seq. If an eviction happens
>>> after this, there is a collision and the range needs to be
>>> revalidated. I think when you say "mem->range->notifier_seq is
>>> stale", it means there was a collision. When this happens,
>>> mem->invalid should be set to true at the same time. So
>>> confirm_valid_user_pages_locked should not complain because
>>> mem->invalid and amdgpu_ttm_tt_get_user_pages_done should agree that
>>> the range is invalid.
>>>
>> Yes, "mem->range->notifier_seq is stale" means it is different from
>> mem->range->notifier_seq. It is caused by mmu notifier happened
>> concurrently on same buffer that is still in restore process. For
>> this case the patch update mem->range->notifier_seq:
>>
>> + if (mem->range)
>> + mem->range->notifier_seq =
>> mem->range->notifier->invalidate_seq;
>
> You should not update mem->range->notifier_seq without also getting an
> up-to-date page address list. Matching sequence numbers indicate that
> your page list is up to date. If you just update the sequence number,
> you're basically lying to yourself.
>
> You need to call amdgpu_hmm_range_get_pages to get an updated page
> list and update the mem->range->notifier_seq at the same time. There
> is no need to do this in more than one place. That's in
> update_invalid_user_pages.
>
>
ok, it maybe redundant. At next round I will remove it, depend on next
scheduled attempt to update mem->range->notifier_seq by
amdgpu_ttm_tt_get_user_pages.
>>
>> Then restore process will skip confirm_valid_user_pages_locked, jump
>> to next scheduled attempt: "goto unlock_notifier_out".
>>
>>> "At next rescheduled next attempt we use stale
>>> mem->range->notifier_seq": This is not really stale. The
>>> notifier_seq indicates whether the pages returned by the last call
>>> to amdgpu_hmm_range_get_pages are still valid. If it's "stale", it
>>> means an invalidation (evict_userptr) happened and we need to
>>> amdgpu_hmm_range_get_pages again. In theory, if an invalidation
>>> happened since the last call, then mem->invalid should also be true.
>>> So again, the sequence numbers and mem->invalid should agree and
>>> there should be no warning.
>>>
>> When invalidation (evict_userptr) happen concurrently restore process
>> will schedule next attempt. The mem->invalid is set to true by
>> evict_userptr, also the sequence numbers. Both agree and with this
>> patch we do not see WARN.
>
> Why do they disagree without this patch? I think what you did there
> (updating the sequence number without getting updated pages) is
> incorrect. If the sequence number and mem->invalid are updated
> together under the same lock, they should always agree. There should
> be no need to mess with the sequence numbers after the fact.
>
I did not mean mem->invalid and sequence number disagree. I mean
mem->range->notifier_seq and mem->range->notifier->invalidate_seq
disagree. We can update mem->range->notifier_seq at next attempt.
> Regards,
> Felix
>
>
>>> The warning messages printed in confirm_valid_user_pages_locked
>>> indicate that there is a mismatch between the sequence numbers and
>>> mem->invalid. As I understand it, such a mismatch should be
>>> impossible. Unless there are some bad assumptions in the code. I
>>> haven't figured out what those bad assumptions are yet. Other than
>>> the case for -EFAULT you pointed out above.
>>>
>> From my debugging this customer app the warnings is due to we did not
>> handle well if amdgpu_hmm_range_get_pages return -EFAULT and mmu
>> notifier happen on same buffer concurrently. That lead we use mem
>> without hmm range associated and stale mem->range->notifier_seq for
>> following operations. With this patch there is no warning messages
>> and not see other errors.
>>
>> Xiaogang
>>
>>> Regards,
>>> Felix
>>>
>>>
>>>>
>>>> Signed-off-by: Xiaogang Chen<Xiaogang.Chen at amd.com>
>>>> ---
>>>> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 45
>>>> +++++++++++++++----
>>>> 1 file changed, 37 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> index 7b1f5933ebaa..6881f1b0844c 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> @@ -2444,7 +2444,9 @@ static int update_invalid_user_pages(struct
>>>> amdkfd_process_info *process_info,
>>>> ret = -EAGAIN;
>>>> goto unlock_out;
>>>> }
>>>> - mem->invalid = 0;
>>>> + /* set mem valid if mem has hmm range associated */
>>>> + if (mem->range)
>>>> + mem->invalid = 0;
>>>> }
>>>> unlock_out:
>>>> @@ -2576,16 +2578,28 @@ static int
>>>> confirm_valid_user_pages_locked(struct amdkfd_process_info *process_i
>>>> list_for_each_entry_safe(mem, tmp_mem,
>>>> &process_info->userptr_inval_list,
>>>> validate_list.head) {
>>>> - bool valid = amdgpu_ttm_tt_get_user_pages_done(
>>>> - mem->bo->tbo.ttm, mem->range);
>>>> + /* Only check mem with hmm range associated */
>>>> + bool valid;
>>>> - mem->range = NULL;
>>>> - if (!valid) {
>>>> - WARN(!mem->invalid, "Invalid BO not marked invalid");
>>>> + if (mem->range) {
>>>> + valid = amdgpu_ttm_tt_get_user_pages_done(
>>>> + mem->bo->tbo.ttm, mem->range);
>>>> +
>>>> + mem->range = NULL;
>>>> + if (!valid) {
>>>> + WARN(!mem->invalid, "Invalid BO not marked invalid");
>>>> + ret = -EAGAIN;
>>>> + continue;
>>>> + }
>>>> + } else
>>>> + /* keep mem without hmm range at userptr_inval_list */
>>>> + continue;
>>>> +
>>>> + if (mem->invalid) {
>>>> + WARN(1, "Valid BO is marked invalid");
>>>> ret = -EAGAIN;
>>>> continue;
>>>> }
>>>> - WARN(mem->invalid, "Valid BO is marked invalid");
>>>> list_move_tail(&mem->validate_list.head,
>>>> &process_info->userptr_valid_list);
>>>> @@ -2644,8 +2658,23 @@ static void
>>>> amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
>>>> * reference counting inside KFD will handle this case.
>>>> */
>>>> mutex_lock(&process_info->notifier_lock);
>>>> - if (process_info->evicted_bos != evicted_bos)
>>>> + if (process_info->evicted_bos != evicted_bos) {
>>>> + /* mmu notifier interrupted
>>>> amdgpu_amdkfd_restore_userptr_worker
>>>> + * before reschedule next attempt update stale
>>>> mem->range->notifier_seq
>>>> + * inside userptr_inval_list
>>>> + */
>>>> + struct kgd_mem *mem, *tmp_mem;
>>>> +
>>>> + list_for_each_entry_safe(mem, tmp_mem,
>>>> + &process_info->userptr_inval_list,
>>>> + validate_list.head) {
>>>> +
>>>> + if (mem->range)
>>>> + mem->range->notifier_seq =
>>>> mem->range->notifier->invalidate_seq;
>>>> + }
>>>> +
>>>> goto unlock_notifier_out;
>>>> + }
>>>> if (confirm_valid_user_pages_locked(process_info)) {
>>>> WARN(1, "User pages unexpectedly invalid");
More information about the amd-gfx
mailing list