[PATCH 5/6] drm/amdgpu: always enable move threshold for BOs

Tvrtko Ursulin tursulin at ursulin.net
Fri Jun 14 16:06:31 UTC 2024


On 14/06/2024 10:53, Christian König wrote:
> 
>>>           if (domain & abo->preferred_domains & 
>>> AMDGPU_GEM_DOMAIN_VRAM &&
>>> -            !(adev->flags & AMD_IS_APU))
>>> -            places[c].flags |= TTM_PL_FLAG_FALLBACK;
>>> +            !(adev->flags & AMD_IS_APU)) {
>>> +            /*
>>> +             * When GTT is just an alternative to VRAM make sure 
>>> that we
>>> +             * only use it as fallback and still try to fill up VRAM 
>>> first.
>>> +            */
>>> +            if (abo->preferred_domains & AMDGPU_GEM_DOMAIN_GTT)
>>> +                places[c].flags |= TTM_PL_FLAG_FALLBACK;
>>> +
>>> +            /*
>>> +             * Enable GTT when the threshold of moved bytes is
>>> +             * reached. This prevents any non essential buffer move
>>> +             * when the links are already saturated.
>>> +             */
>>> +            places[c].flags |= TTM_PL_FLAG_MOVE_THRESHOLD;
>>> +        }
>>
>> For the APU case I *think* this works, but for discrete I am not sure 
>> yet.
> 
> Agree, APUs are basically already fine as they are. VRAM is just used so 
> that it isn't wasted there.

Well yeah it works, but because re-validation is broken so it cannot hit 
the broken migration budget. ;)

>> As a side note and disclaimer, the TTM "resource compatible" logic has 
>> a half-life of about one week in my brain until I need to almost 
>> re-figure it all out. I don't know if it just me, but I find it really 
>> non-intuitive and almost like double, triple, or even quadruple 
>> negation way of thinking about things.
> 
> Yeah I was also going back and forth between the different approaches 
> multiple times and just ended up in this implementation because it 
> seemed to do what I wanted to have.
> 
> It's certainly not very intuitive what's going on here.
> 
>>
>> It is not helping that with this proposal you set threshold on just 
>> one of the possible object placements which further increases the 
>> asymmetry. For me intuitive thing would be that thresholds apply to 
>> the act of changing the current placement directly. Not indirectly via 
>> playing with one of the placement flags dynamically.
> 
> Interesting idea, how would the handling then be? Currently we have only 
> the stages - 'don't evict' and 'evict'. Should we make it something more 
> like 'don't move', 'move', 'evict' ?

Intuitively I would think "don't move" aligns with the "out of migration 
budget" concept.

Since in this patch you add move_threshold to ttm_operation_context 
could it simply be used as the overall criteria if it is set?

In a way like:

  1. If the current placement is from the list of userspace supplied 
valid ones, and
  2. Migration limit has been set, and
  3. It is spent.

-> Then just don't migrate, return "all is good" from ttm_bo_validate.

Though I am not sure at the moment how that would interact with the 
amdgpu_evict_flags and placements userspace did not specify.

>> Anyway, lets see.. So you set TTM_PL_FLAG_MOVE_THRESHOLD and 
>> TTM_PL_FLAG_FALLBACK on the GTT placement, with the logic that it will 
>> be considered compatible while under the migration budget?
>>
>> (Side note, the fact both flags are set I also find very difficult to 
>> mentally model.)
>>
>> Say a buffer was evicted to GTT already. What then brings it back to 
>> VRAM?
>>
>> The first subsequent ttm_bo_validate pass (!evicting) says GTT is fine 
>> (applicable) while ctx->bytes_moved < ctx->move_threshold, no? Isn't 
>> that the opposite of what would be required and causes nothing to be 
>> migrated back in? What am I missing?
> 
> The flag says that GTT is fine when ctx->bytes_moved >= 
> ctx->move_threshold. The logic is exactly inverted to what you described.
> 
> This way a BO will be moved back into VRAM as long as bytes moved 
> doesn't exceed the threshold.

I'm afraid I need to sketch it out... If buffer is currently in GTT and 
placements are VRAM+GTT.

ttm_bo_validate(evicting=false)

1st iteration:
res=GTT != place=VRAM
    continue

2nd iteration:
res=GTT == place=GTT+FALLBACK+THRESHOLD

ttm_place_applicable(GTT)
   moved < threshold
     return true

Buffer stays in GTT while under migration budget -> wrong, no? Or am I 
still confused?

Regards,

Tvrtko

> Setting both flags has the effect of saying: It's ok that the BO stays 
> in GTT when you either above the move threshold or would have to evict 
> something.
> 
> Regards,
> Christian.
> 
>>
>> Regards,
>>
>> Tvrtko
> 


More information about the amd-gfx mailing list