[PATCH 5/6] drm/amdgpu: always enable move threshold for BOs
Tvrtko Ursulin
tursulin at ursulin.net
Fri Jun 28 08:13:10 UTC 2024
Hey Christian,
Any thoughts on the below reply? Did I get it wrong or I found a
legitimate issue?
Regards,
Tvrtko
On 14/06/2024 17:06, Tvrtko Ursulin wrote:
>
> On 14/06/2024 10:53, Christian König wrote:
>>
>>>> if (domain & abo->preferred_domains &
>>>> AMDGPU_GEM_DOMAIN_VRAM &&
>>>> - !(adev->flags & AMD_IS_APU))
>>>> - places[c].flags |= TTM_PL_FLAG_FALLBACK;
>>>> + !(adev->flags & AMD_IS_APU)) {
>>>> + /*
>>>> + * When GTT is just an alternative to VRAM make sure
>>>> that we
>>>> + * only use it as fallback and still try to fill up
>>>> VRAM first.
>>>> + */
>>>> + if (abo->preferred_domains & AMDGPU_GEM_DOMAIN_GTT)
>>>> + places[c].flags |= TTM_PL_FLAG_FALLBACK;
>>>> +
>>>> + /*
>>>> + * Enable GTT when the threshold of moved bytes is
>>>> + * reached. This prevents any non essential buffer move
>>>> + * when the links are already saturated.
>>>> + */
>>>> + places[c].flags |= TTM_PL_FLAG_MOVE_THRESHOLD;
>>>> + }
>>>
>>> For the APU case I *think* this works, but for discrete I am not sure
>>> yet.
>>
>> Agree, APUs are basically already fine as they are. VRAM is just used
>> so that it isn't wasted there.
>
> Well yeah it works, but because re-validation is broken so it cannot hit
> the broken migration budget. ;)
>
>>> As a side note and disclaimer, the TTM "resource compatible" logic
>>> has a half-life of about one week in my brain until I need to almost
>>> re-figure it all out. I don't know if it just me, but I find it
>>> really non-intuitive and almost like double, triple, or even
>>> quadruple negation way of thinking about things.
>>
>> Yeah I was also going back and forth between the different approaches
>> multiple times and just ended up in this implementation because it
>> seemed to do what I wanted to have.
>>
>> It's certainly not very intuitive what's going on here.
>>
>>>
>>> It is not helping that with this proposal you set threshold on just
>>> one of the possible object placements which further increases the
>>> asymmetry. For me intuitive thing would be that thresholds apply to
>>> the act of changing the current placement directly. Not indirectly
>>> via playing with one of the placement flags dynamically.
>>
>> Interesting idea, how would the handling then be? Currently we have
>> only the stages - 'don't evict' and 'evict'. Should we make it
>> something more like 'don't move', 'move', 'evict' ?
>
> Intuitively I would think "don't move" aligns with the "out of migration
> budget" concept.
>
> Since in this patch you add move_threshold to ttm_operation_context
> could it simply be used as the overall criteria if it is set?
>
> In a way like:
>
> 1. If the current placement is from the list of userspace supplied
> valid ones, and
> 2. Migration limit has been set, and
> 3. It is spent.
>
> -> Then just don't migrate, return "all is good" from ttm_bo_validate.
>
> Though I am not sure at the moment how that would interact with the
> amdgpu_evict_flags and placements userspace did not specify.
>
>>> Anyway, lets see.. So you set TTM_PL_FLAG_MOVE_THRESHOLD and
>>> TTM_PL_FLAG_FALLBACK on the GTT placement, with the logic that it
>>> will be considered compatible while under the migration budget?
>>>
>>> (Side note, the fact both flags are set I also find very difficult to
>>> mentally model.)
>>>
>>> Say a buffer was evicted to GTT already. What then brings it back to
>>> VRAM?
>>>
>>> The first subsequent ttm_bo_validate pass (!evicting) says GTT is
>>> fine (applicable) while ctx->bytes_moved < ctx->move_threshold, no?
>>> Isn't that the opposite of what would be required and causes nothing
>>> to be migrated back in? What am I missing?
>>
>> The flag says that GTT is fine when ctx->bytes_moved >=
>> ctx->move_threshold. The logic is exactly inverted to what you described.
>>
>> This way a BO will be moved back into VRAM as long as bytes moved
>> doesn't exceed the threshold.
>
> I'm afraid I need to sketch it out... If buffer is currently in GTT and
> placements are VRAM+GTT.
>
> ttm_bo_validate(evicting=false)
>
> 1st iteration:
> res=GTT != place=VRAM
> continue
>
> 2nd iteration:
> res=GTT == place=GTT+FALLBACK+THRESHOLD
>
> ttm_place_applicable(GTT)
> moved < threshold
> return true
>
> Buffer stays in GTT while under migration budget -> wrong, no? Or am I
> still confused?
>
> Regards,
>
> Tvrtko
>
>> Setting both flags has the effect of saying: It's ok that the BO stays
>> in GTT when you either above the move threshold or would have to evict
>> something.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>
More information about the amd-gfx
mailing list