[PATCH] drm/amdgpu: check vm bo eviction valuable at last

Qiang Yu yuq825 at gmail.com
Mon Feb 21 03:28:20 UTC 2022


On Fri, Feb 18, 2022 at 6:24 PM Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> Am 18.02.22 um 11:16 schrieb Qiang Yu:
> > [SNIP]
> >>> If amdgpu_vm_ready() use evicting flag, it's still not equivalent to check
> >>> vm idle: true -> vm idle, false -> vm may be idle or busy.
> >> Yeah, but why should that be relevant?
> >>
> >> The amdgpu_vm_ready() return if we can do page table updates or not. If
> >> the VM is idle or not is only relevant for eviction.
> >>
> >> In other words any CS or page table update makes the VM busy, but that
> >> only affects if the VM can be evicted or not.
> >>
> > My point is: we can't use amdgpu_vm_ready() to replace vm_is_busy(), so
> > currently we update vm even when vm is busy. So why not use:
Sorry, should be "vm is idle".

> > if (!amdgpu_vm_ready() || vm_is_busy()) return;
> > in amdgpu_gem_va_update_vm(), as you mentioned we prefer to not
> > update vm when it's idle.
>
> Because updating the VM while it is busy is perfectly fine, we do it all
> the time.
>
Yeah, as above, my typo.

> We should just not update it when it is already idle and was considered
> for eviction.
"and", not "or"?

> In this situation it makes most of the time sense to keep
> it idle and postpone the update till the next command submission.
>
> >>>>> Then we can keep the evicting flag accurate (after solving your
> >>>>> concern for this patch that eviction may fail latter by further delay
> >>>>> the flag update after eviction success).
> >>>> That won't work. See we need to mark the VM as evicted before we
> >>>> actually evict them because otherwise somebody could use the VM in
> >>>> parallel and add another fence to it.
> >>>>
> >>> I see, make this too accurate should cost too much like holding the
> >>> eviction_lock when eviction. But just delay it in
> >>> amdgpu_ttm_bo_eviction_valuable()
> >>> could avoid most false positive case.
> >> Partially correct. Another fundamental problem is that we can't hold the
> >> eviction lock because that would result in lock inversion and potential
> >> deadlock.
> >>
> >> We could set the flag later on, but as I said before that when we set
> >> the evicted flag when the VM is already idle is a desired effect.
> >>
> > As above, this confuse me as we can explicitly check vm idle when
> > user update vm, why bother to embed it in evicting flag implicitly?
>
> Well as I said it's irrelevant for the update if the VM is idle or not.
>
> To summarize the rules once more:
> 1. When VM page tables are used by CS or page tables updates it is
> considered busy, e.g. not idle.
>
> 2. When we want to evict a VM it must be idle. As soon as we considered
> this we should set the evicted flag to make sure to keep it idle as much
> as possible.
>
> 3. When we want to update the page tables we just need to check if the
> VM is idle or not.
>
But now we does not check vm idle directly in amdgpu_gem_va_update_vm().
If VM bo has not been considered for eviction, it could be either idle or busy.

Just want to confirm if the fix should be only change amdgpu_vm_ready()
to use evicting flag or besides using evicting flag, also check vm_idle() in
amdgpu_gem_va_update_vm().

Regards,
Qiang

> 4. When a CS happens we don't have another chance and make the VM busy
> again. And do all postponed page table updates.
>
Anyway,

> Regards,
> Christian.
>
> >
> > Check vm idle need to hold resv lock. Read your patch for adding
> > evicting flag is to update vm without resv lock. But user vm ops in
> > amdgpu_gem_va_update_vm() do hold the resv lock, so the difference
> > happens when calling amdgpu_vm_bo_update_mapping() from
> > svm_range_(un)map_to_gpu(). So embed vm idle in evicting flag
> > is for svm_range_(un)map_to_gpu() also do nothing when vm idle?
>
>
>
> >
> > Regards,
> > Qiang
> >
> >> Regards,
> >> Christian.
> >>
> >>> Regards,
> >>> Qiang
> >>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>>> Regards,
> >>>>> Qiang
> >>>>>
> >>>>>
> >>>>>> Regards,
> >>>>>> Christian.
> >>>>>>
> >>>>>>> Regards,
> >>>>>>> Qiang
> >>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Christian.
> >>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Qiang
> >>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Christian.
> >>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Qiang
> >>>>>>>>>>>
> >>>>>>>>>>>> What we should rather do is to fix amdgpu_vm_ready() to take a look at
> >>>>>>>>>>>> the flag instead of the linked list.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Christian.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Signed-off-by: Qiang Yu <qiang.yu at amd.com>
> >>>>>>>>>>>>> ---
> >>>>>>>>>>>>>         drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 85 ++++++++++++++-----------
> >>>>>>>>>>>>>         1 file changed, 47 insertions(+), 38 deletions(-)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>>>>>>>>>>>> index 5a32ee66d8c8..88a27911054f 100644
> >>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>>>>>>>>>>>> @@ -1306,45 +1306,11 @@ uint64_t amdgpu_ttm_tt_pte_flags(struct amdgpu_device *adev, struct ttm_tt *ttm,
> >>>>>>>>>>>>>             return flags;
> >>>>>>>>>>>>>         }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -/*
> >>>>>>>>>>>>> - * amdgpu_ttm_bo_eviction_valuable - Check to see if we can evict a buffer
> >>>>>>>>>>>>> - * object.
> >>>>>>>>>>>>> - *
> >>>>>>>>>>>>> - * Return true if eviction is sensible. Called by ttm_mem_evict_first() on
> >>>>>>>>>>>>> - * behalf of ttm_bo_mem_force_space() which tries to evict buffer objects until
> >>>>>>>>>>>>> - * it can find space for a new object and by ttm_bo_force_list_clean() which is
> >>>>>>>>>>>>> - * used to clean out a memory space.
> >>>>>>>>>>>>> - */
> >>>>>>>>>>>>> -static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
> >>>>>>>>>>>>> -                                         const struct ttm_place *place)
> >>>>>>>>>>>>> +static bool amdgpu_ttm_mem_eviction_valuable(struct ttm_buffer_object *bo,
> >>>>>>>>>>>>> +                                          const struct ttm_place *place)
> >>>>>>>>>>>>>         {
> >>>>>>>>>>>>>             unsigned long num_pages = bo->resource->num_pages;
> >>>>>>>>>>>>>             struct amdgpu_res_cursor cursor;
> >>>>>>>>>>>>> -     struct dma_resv_list *flist;
> >>>>>>>>>>>>> -     struct dma_fence *f;
> >>>>>>>>>>>>> -     int i;
> >>>>>>>>>>>>> -
> >>>>>>>>>>>>> -     /* Swapout? */
> >>>>>>>>>>>>> -     if (bo->resource->mem_type == TTM_PL_SYSTEM)
> >>>>>>>>>>>>> -             return true;
> >>>>>>>>>>>>> -
> >>>>>>>>>>>>> -     if (bo->type == ttm_bo_type_kernel &&
> >>>>>>>>>>>>> -         !amdgpu_vm_evictable(ttm_to_amdgpu_bo(bo)))
> >>>>>>>>>>>>> -             return false;
> >>>>>>>>>>>>> -
> >>>>>>>>>>>>> -     /* If bo is a KFD BO, check if the bo belongs to the current process.
> >>>>>>>>>>>>> -      * If true, then return false as any KFD process needs all its BOs to
> >>>>>>>>>>>>> -      * be resident to run successfully
> >>>>>>>>>>>>> -      */
> >>>>>>>>>>>>> -     flist = dma_resv_shared_list(bo->base.resv);
> >>>>>>>>>>>>> -     if (flist) {
> >>>>>>>>>>>>> -             for (i = 0; i < flist->shared_count; ++i) {
> >>>>>>>>>>>>> -                     f = rcu_dereference_protected(flist->shared[i],
> >>>>>>>>>>>>> -                             dma_resv_held(bo->base.resv));
> >>>>>>>>>>>>> -                     if (amdkfd_fence_check_mm(f, current->mm))
> >>>>>>>>>>>>> -                             return false;
> >>>>>>>>>>>>> -             }
> >>>>>>>>>>>>> -     }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>             switch (bo->resource->mem_type) {
> >>>>>>>>>>>>>             case AMDGPU_PL_PREEMPT:
> >>>>>>>>>>>>> @@ -1377,10 +1343,53 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
> >>>>>>>>>>>>>                     return false;
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>             default:
> >>>>>>>>>>>>> -             break;
> >>>>>>>>>>>>> +             return ttm_bo_eviction_valuable(bo, place);
> >>>>>>>>>>>>>             }
> >>>>>>>>>>>>> +}
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -     return ttm_bo_eviction_valuable(bo, place);
> >>>>>>>>>>>>> +/*
> >>>>>>>>>>>>> + * amdgpu_ttm_bo_eviction_valuable - Check to see if we can evict a buffer
> >>>>>>>>>>>>> + * object.
> >>>>>>>>>>>>> + *
> >>>>>>>>>>>>> + * Return true if eviction is sensible. Called by ttm_mem_evict_first() on
> >>>>>>>>>>>>> + * behalf of ttm_bo_mem_force_space() which tries to evict buffer objects until
> >>>>>>>>>>>>> + * it can find space for a new object and by ttm_bo_force_list_clean() which is
> >>>>>>>>>>>>> + * used to clean out a memory space.
> >>>>>>>>>>>>> + */
> >>>>>>>>>>>>> +static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
> >>>>>>>>>>>>> +                                         const struct ttm_place *place)
> >>>>>>>>>>>>> +{
> >>>>>>>>>>>>> +     struct dma_resv_list *flist;
> >>>>>>>>>>>>> +     struct dma_fence *f;
> >>>>>>>>>>>>> +     int i;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +     /* Swapout? */
> >>>>>>>>>>>>> +     if (bo->resource->mem_type == TTM_PL_SYSTEM)
> >>>>>>>>>>>>> +             return true;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +     /* If bo is a KFD BO, check if the bo belongs to the current process.
> >>>>>>>>>>>>> +      * If true, then return false as any KFD process needs all its BOs to
> >>>>>>>>>>>>> +      * be resident to run successfully
> >>>>>>>>>>>>> +      */
> >>>>>>>>>>>>> +     flist = dma_resv_shared_list(bo->base.resv);
> >>>>>>>>>>>>> +     if (flist) {
> >>>>>>>>>>>>> +             for (i = 0; i < flist->shared_count; ++i) {
> >>>>>>>>>>>>> +                     f = rcu_dereference_protected(flist->shared[i],
> >>>>>>>>>>>>> +                             dma_resv_held(bo->base.resv));
> >>>>>>>>>>>>> +                     if (amdkfd_fence_check_mm(f, current->mm))
> >>>>>>>>>>>>> +                             return false;
> >>>>>>>>>>>>> +             }
> >>>>>>>>>>>>> +     }
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +     /* Check by different mem type. */
> >>>>>>>>>>>>> +     if (!amdgpu_ttm_mem_eviction_valuable(bo, place))
> >>>>>>>>>>>>> +             return false;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +     /* VM bo should be checked at last because it will mark VM evicting. */
> >>>>>>>>>>>>> +     if (bo->type == ttm_bo_type_kernel)
> >>>>>>>>>>>>> +             return amdgpu_vm_evictable(ttm_to_amdgpu_bo(bo));
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +     return true;
> >>>>>>>>>>>>>         }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>         static void amdgpu_ttm_vram_mm_access(struct amdgpu_device *adev, loff_t pos,
>


More information about the dri-devel mailing list