[PATCH 5/6] drm/amdgpu: always enable move threshold for BOs

Tvrtko Ursulin tursulin at ursulin.net
Fri Jun 28 08:13:10 UTC 2024


Hey Christian,

Any thoughts on the below reply? Did I get it wrong or I found a 
legitimate issue?

Regards,

Tvrtko

On 14/06/2024 17:06, Tvrtko Ursulin wrote:
> 
> On 14/06/2024 10:53, Christian König wrote:
>>
>>>>           if (domain & abo->preferred_domains & 
>>>> AMDGPU_GEM_DOMAIN_VRAM &&
>>>> -            !(adev->flags & AMD_IS_APU))
>>>> -            places[c].flags |= TTM_PL_FLAG_FALLBACK;
>>>> +            !(adev->flags & AMD_IS_APU)) {
>>>> +            /*
>>>> +             * When GTT is just an alternative to VRAM make sure 
>>>> that we
>>>> +             * only use it as fallback and still try to fill up 
>>>> VRAM first.
>>>> +            */
>>>> +            if (abo->preferred_domains & AMDGPU_GEM_DOMAIN_GTT)
>>>> +                places[c].flags |= TTM_PL_FLAG_FALLBACK;
>>>> +
>>>> +            /*
>>>> +             * Enable GTT when the threshold of moved bytes is
>>>> +             * reached. This prevents any non essential buffer move
>>>> +             * when the links are already saturated.
>>>> +             */
>>>> +            places[c].flags |= TTM_PL_FLAG_MOVE_THRESHOLD;
>>>> +        }
>>>
>>> For the APU case I *think* this works, but for discrete I am not sure 
>>> yet.
>>
>> Agree, APUs are basically already fine as they are. VRAM is just used 
>> so that it isn't wasted there.
> 
> Well yeah it works, but because re-validation is broken so it cannot hit 
> the broken migration budget. ;)
> 
>>> As a side note and disclaimer, the TTM "resource compatible" logic 
>>> has a half-life of about one week in my brain until I need to almost 
>>> re-figure it all out. I don't know if it just me, but I find it 
>>> really non-intuitive and almost like double, triple, or even 
>>> quadruple negation way of thinking about things.
>>
>> Yeah I was also going back and forth between the different approaches 
>> multiple times and just ended up in this implementation because it 
>> seemed to do what I wanted to have.
>>
>> It's certainly not very intuitive what's going on here.
>>
>>>
>>> It is not helping that with this proposal you set threshold on just 
>>> one of the possible object placements which further increases the 
>>> asymmetry. For me intuitive thing would be that thresholds apply to 
>>> the act of changing the current placement directly. Not indirectly 
>>> via playing with one of the placement flags dynamically.
>>
>> Interesting idea, how would the handling then be? Currently we have 
>> only the stages - 'don't evict' and 'evict'. Should we make it 
>> something more like 'don't move', 'move', 'evict' ?
> 
> Intuitively I would think "don't move" aligns with the "out of migration 
> budget" concept.
> 
> Since in this patch you add move_threshold to ttm_operation_context 
> could it simply be used as the overall criteria if it is set?
> 
> In a way like:
> 
>   1. If the current placement is from the list of userspace supplied 
> valid ones, and
>   2. Migration limit has been set, and
>   3. It is spent.
> 
> -> Then just don't migrate, return "all is good" from ttm_bo_validate.
> 
> Though I am not sure at the moment how that would interact with the 
> amdgpu_evict_flags and placements userspace did not specify.
> 
>>> Anyway, lets see.. So you set TTM_PL_FLAG_MOVE_THRESHOLD and 
>>> TTM_PL_FLAG_FALLBACK on the GTT placement, with the logic that it 
>>> will be considered compatible while under the migration budget?
>>>
>>> (Side note, the fact both flags are set I also find very difficult to 
>>> mentally model.)
>>>
>>> Say a buffer was evicted to GTT already. What then brings it back to 
>>> VRAM?
>>>
>>> The first subsequent ttm_bo_validate pass (!evicting) says GTT is 
>>> fine (applicable) while ctx->bytes_moved < ctx->move_threshold, no? 
>>> Isn't that the opposite of what would be required and causes nothing 
>>> to be migrated back in? What am I missing?
>>
>> The flag says that GTT is fine when ctx->bytes_moved >= 
>> ctx->move_threshold. The logic is exactly inverted to what you described.
>>
>> This way a BO will be moved back into VRAM as long as bytes moved 
>> doesn't exceed the threshold.
> 
> I'm afraid I need to sketch it out... If buffer is currently in GTT and 
> placements are VRAM+GTT.
> 
> ttm_bo_validate(evicting=false)
> 
> 1st iteration:
> res=GTT != place=VRAM
>     continue
> 
> 2nd iteration:
> res=GTT == place=GTT+FALLBACK+THRESHOLD
> 
> ttm_place_applicable(GTT)
>    moved < threshold
>      return true
> 
> Buffer stays in GTT while under migration budget -> wrong, no? Or am I 
> still confused?
> 
> Regards,
> 
> Tvrtko
> 
>> Setting both flags has the effect of saying: It's ok that the BO stays 
>> in GTT when you either above the move threshold or would have to evict 
>> something.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>


More information about the amd-gfx mailing list