[PATCH] drm/amdgpu: check vm bo eviction valuable at last
Qiang Yu
yuq825 at gmail.com
Mon Feb 21 03:28:20 UTC 2022
On Fri, Feb 18, 2022 at 6:24 PM Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> Am 18.02.22 um 11:16 schrieb Qiang Yu:
> > [SNIP]
> >>> If amdgpu_vm_ready() use evicting flag, it's still not equivalent to check
> >>> vm idle: true -> vm idle, false -> vm may be idle or busy.
> >> Yeah, but why should that be relevant?
> >>
> >> The amdgpu_vm_ready() return if we can do page table updates or not. If
> >> the VM is idle or not is only relevant for eviction.
> >>
> >> In other words any CS or page table update makes the VM busy, but that
> >> only affects if the VM can be evicted or not.
> >>
> > My point is: we can't use amdgpu_vm_ready() to replace vm_is_busy(), so
> > currently we update vm even when vm is busy. So why not use:
Sorry, should be "vm is idle".
> > if (!amdgpu_vm_ready() || vm_is_busy()) return;
> > in amdgpu_gem_va_update_vm(), as you mentioned we prefer to not
> > update vm when it's idle.
>
> Because updating the VM while it is busy is perfectly fine, we do it all
> the time.
>
Yeah, as above, my typo.
> We should just not update it when it is already idle and was considered
> for eviction.
"and", not "or"?
> In this situation it makes most of the time sense to keep
> it idle and postpone the update till the next command submission.
>
> >>>>> Then we can keep the evicting flag accurate (after solving your
> >>>>> concern for this patch that eviction may fail latter by further delay
> >>>>> the flag update after eviction success).
> >>>> That won't work. See we need to mark the VM as evicted before we
> >>>> actually evict them because otherwise somebody could use the VM in
> >>>> parallel and add another fence to it.
> >>>>
> >>> I see, make this too accurate should cost too much like holding the
> >>> eviction_lock when eviction. But just delay it in
> >>> amdgpu_ttm_bo_eviction_valuable()
> >>> could avoid most false positive case.
> >> Partially correct. Another fundamental problem is that we can't hold the
> >> eviction lock because that would result in lock inversion and potential
> >> deadlock.
> >>
> >> We could set the flag later on, but as I said before that when we set
> >> the evicted flag when the VM is already idle is a desired effect.
> >>
> > As above, this confuse me as we can explicitly check vm idle when
> > user update vm, why bother to embed it in evicting flag implicitly?
>
> Well as I said it's irrelevant for the update if the VM is idle or not.
>
> To summarize the rules once more:
> 1. When VM page tables are used by CS or page tables updates it is
> considered busy, e.g. not idle.
>
> 2. When we want to evict a VM it must be idle. As soon as we considered
> this we should set the evicted flag to make sure to keep it idle as much
> as possible.
>
> 3. When we want to update the page tables we just need to check if the
> VM is idle or not.
>
But now we does not check vm idle directly in amdgpu_gem_va_update_vm().
If VM bo has not been considered for eviction, it could be either idle or busy.
Just want to confirm if the fix should be only change amdgpu_vm_ready()
to use evicting flag or besides using evicting flag, also check vm_idle() in
amdgpu_gem_va_update_vm().
Regards,
Qiang
> 4. When a CS happens we don't have another chance and make the VM busy
> again. And do all postponed page table updates.
>
Anyway,
> Regards,
> Christian.
>
> >
> > Check vm idle need to hold resv lock. Read your patch for adding
> > evicting flag is to update vm without resv lock. But user vm ops in
> > amdgpu_gem_va_update_vm() do hold the resv lock, so the difference
> > happens when calling amdgpu_vm_bo_update_mapping() from
> > svm_range_(un)map_to_gpu(). So embed vm idle in evicting flag
> > is for svm_range_(un)map_to_gpu() also do nothing when vm idle?
>
>
>
> >
> > Regards,
> > Qiang
> >
> >> Regards,
> >> Christian.
> >>
> >>> Regards,
> >>> Qiang
> >>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>>> Regards,
> >>>>> Qiang
> >>>>>
> >>>>>
> >>>>>> Regards,
> >>>>>> Christian.
> >>>>>>
> >>>>>>> Regards,
> >>>>>>> Qiang
> >>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Christian.
> >>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Qiang
> >>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Christian.
> >>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Qiang
> >>>>>>>>>>>
> >>>>>>>>>>>> What we should rather do is to fix amdgpu_vm_ready() to take a look at
> >>>>>>>>>>>> the flag instead of the linked list.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Christian.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Signed-off-by: Qiang Yu <qiang.yu at amd.com>
> >>>>>>>>>>>>> ---
> >>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 85 ++++++++++++++-----------
> >>>>>>>>>>>>> 1 file changed, 47 insertions(+), 38 deletions(-)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>>>>>>>>>>>> index 5a32ee66d8c8..88a27911054f 100644
> >>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>>>>>>>>>>>> @@ -1306,45 +1306,11 @@ uint64_t amdgpu_ttm_tt_pte_flags(struct amdgpu_device *adev, struct ttm_tt *ttm,
> >>>>>>>>>>>>> return flags;
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -/*
> >>>>>>>>>>>>> - * amdgpu_ttm_bo_eviction_valuable - Check to see if we can evict a buffer
> >>>>>>>>>>>>> - * object.
> >>>>>>>>>>>>> - *
> >>>>>>>>>>>>> - * Return true if eviction is sensible. Called by ttm_mem_evict_first() on
> >>>>>>>>>>>>> - * behalf of ttm_bo_mem_force_space() which tries to evict buffer objects until
> >>>>>>>>>>>>> - * it can find space for a new object and by ttm_bo_force_list_clean() which is
> >>>>>>>>>>>>> - * used to clean out a memory space.
> >>>>>>>>>>>>> - */
> >>>>>>>>>>>>> -static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
> >>>>>>>>>>>>> - const struct ttm_place *place)
> >>>>>>>>>>>>> +static bool amdgpu_ttm_mem_eviction_valuable(struct ttm_buffer_object *bo,
> >>>>>>>>>>>>> + const struct ttm_place *place)
> >>>>>>>>>>>>> {
> >>>>>>>>>>>>> unsigned long num_pages = bo->resource->num_pages;
> >>>>>>>>>>>>> struct amdgpu_res_cursor cursor;
> >>>>>>>>>>>>> - struct dma_resv_list *flist;
> >>>>>>>>>>>>> - struct dma_fence *f;
> >>>>>>>>>>>>> - int i;
> >>>>>>>>>>>>> -
> >>>>>>>>>>>>> - /* Swapout? */
> >>>>>>>>>>>>> - if (bo->resource->mem_type == TTM_PL_SYSTEM)
> >>>>>>>>>>>>> - return true;
> >>>>>>>>>>>>> -
> >>>>>>>>>>>>> - if (bo->type == ttm_bo_type_kernel &&
> >>>>>>>>>>>>> - !amdgpu_vm_evictable(ttm_to_amdgpu_bo(bo)))
> >>>>>>>>>>>>> - return false;
> >>>>>>>>>>>>> -
> >>>>>>>>>>>>> - /* If bo is a KFD BO, check if the bo belongs to the current process.
> >>>>>>>>>>>>> - * If true, then return false as any KFD process needs all its BOs to
> >>>>>>>>>>>>> - * be resident to run successfully
> >>>>>>>>>>>>> - */
> >>>>>>>>>>>>> - flist = dma_resv_shared_list(bo->base.resv);
> >>>>>>>>>>>>> - if (flist) {
> >>>>>>>>>>>>> - for (i = 0; i < flist->shared_count; ++i) {
> >>>>>>>>>>>>> - f = rcu_dereference_protected(flist->shared[i],
> >>>>>>>>>>>>> - dma_resv_held(bo->base.resv));
> >>>>>>>>>>>>> - if (amdkfd_fence_check_mm(f, current->mm))
> >>>>>>>>>>>>> - return false;
> >>>>>>>>>>>>> - }
> >>>>>>>>>>>>> - }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> switch (bo->resource->mem_type) {
> >>>>>>>>>>>>> case AMDGPU_PL_PREEMPT:
> >>>>>>>>>>>>> @@ -1377,10 +1343,53 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
> >>>>>>>>>>>>> return false;
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> default:
> >>>>>>>>>>>>> - break;
> >>>>>>>>>>>>> + return ttm_bo_eviction_valuable(bo, place);
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>> +}
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - return ttm_bo_eviction_valuable(bo, place);
> >>>>>>>>>>>>> +/*
> >>>>>>>>>>>>> + * amdgpu_ttm_bo_eviction_valuable - Check to see if we can evict a buffer
> >>>>>>>>>>>>> + * object.
> >>>>>>>>>>>>> + *
> >>>>>>>>>>>>> + * Return true if eviction is sensible. Called by ttm_mem_evict_first() on
> >>>>>>>>>>>>> + * behalf of ttm_bo_mem_force_space() which tries to evict buffer objects until
> >>>>>>>>>>>>> + * it can find space for a new object and by ttm_bo_force_list_clean() which is
> >>>>>>>>>>>>> + * used to clean out a memory space.
> >>>>>>>>>>>>> + */
> >>>>>>>>>>>>> +static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
> >>>>>>>>>>>>> + const struct ttm_place *place)
> >>>>>>>>>>>>> +{
> >>>>>>>>>>>>> + struct dma_resv_list *flist;
> >>>>>>>>>>>>> + struct dma_fence *f;
> >>>>>>>>>>>>> + int i;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> + /* Swapout? */
> >>>>>>>>>>>>> + if (bo->resource->mem_type == TTM_PL_SYSTEM)
> >>>>>>>>>>>>> + return true;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> + /* If bo is a KFD BO, check if the bo belongs to the current process.
> >>>>>>>>>>>>> + * If true, then return false as any KFD process needs all its BOs to
> >>>>>>>>>>>>> + * be resident to run successfully
> >>>>>>>>>>>>> + */
> >>>>>>>>>>>>> + flist = dma_resv_shared_list(bo->base.resv);
> >>>>>>>>>>>>> + if (flist) {
> >>>>>>>>>>>>> + for (i = 0; i < flist->shared_count; ++i) {
> >>>>>>>>>>>>> + f = rcu_dereference_protected(flist->shared[i],
> >>>>>>>>>>>>> + dma_resv_held(bo->base.resv));
> >>>>>>>>>>>>> + if (amdkfd_fence_check_mm(f, current->mm))
> >>>>>>>>>>>>> + return false;
> >>>>>>>>>>>>> + }
> >>>>>>>>>>>>> + }
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> + /* Check by different mem type. */
> >>>>>>>>>>>>> + if (!amdgpu_ttm_mem_eviction_valuable(bo, place))
> >>>>>>>>>>>>> + return false;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> + /* VM bo should be checked at last because it will mark VM evicting. */
> >>>>>>>>>>>>> + if (bo->type == ttm_bo_type_kernel)
> >>>>>>>>>>>>> + return amdgpu_vm_evictable(ttm_to_amdgpu_bo(bo));
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> + return true;
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> static void amdgpu_ttm_vram_mm_access(struct amdgpu_device *adev, loff_t pos,
>
More information about the amd-gfx
mailing list