[PATCH v5 4/4] drm/amdgpu: track bo memory stats at runtime
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Wed Oct 23 07:38:54 UTC 2024
On 22/10/2024 17:24, Christian König wrote:
> Am 22.10.24 um 17:17 schrieb Li, Yunxiang (Teddy):
>> [Public]
>>
>>>> +static uint32_t fold_memtype(uint32_t memtype) {
>>> In general please add prefixes to even static functions, e.g.
>>> amdgpu_vm_ or
>>> amdgpu_bo_.
>>>
>>>> + /* Squash private placements into 'cpu' to keep the legacy
>>>> userspace view.
>>> */
>>>> + switch (mem_type) {
>>>> + case TTM_PL_VRAM:
>>>> + case TTM_PL_TT:
>>>> + return memtype
>>>> + default:
>>>> + return TTM_PL_SYSTEM;
>>>> + }
>>>> +}
>>>> +
>>>> +static uint32_t bo_get_memtype(struct amdgpu_bo *bo) {
>>> That whole function belongs into amdgpu_bo.c
>> Do you mean bo_get_memtype or fold_memtype? I debated whether
>> bo_get_memtype should go into amdgpu_vm.c or amdgpu_bo.c, and since
>> it's using fold_memtype and only useful for memory stats because of
>> folding the private placements I just left them here together with the
>> other mem stats code.
>>
>> I can move it to amdgpu_bo.c make it return the memtype verbatim and
>> just fold it when I do the accounting.
>
> I think that folding GDS, GWS and OA into system is also a bug. We
> should really not doing that.
>
> Just wanted to point out for this round that the code to query the
> current placement from a BO should probably go into amdgpu_bo.c and not
> amdgpu_vm.c
>
>>
>>>> + struct ttm_resource *res = bo->tbo.resource;
>>>> + const uint32_t domain_to_pl[] = {
>>>> + [ilog2(AMDGPU_GEM_DOMAIN_CPU)] = TTM_PL_SYSTEM,
>>>> + [ilog2(AMDGPU_GEM_DOMAIN_GTT)] = TTM_PL_TT,
>>>> + [ilog2(AMDGPU_GEM_DOMAIN_VRAM)] = TTM_PL_VRAM,
>>>> + [ilog2(AMDGPU_GEM_DOMAIN_GDS)] = AMDGPU_PL_GDS,
>>>> + [ilog2(AMDGPU_GEM_DOMAIN_GWS)] = AMDGPU_PL_GWS,
>>>> + [ilog2(AMDGPU_GEM_DOMAIN_OA)] = AMDGPU_PL_OA,
>>>> + [ilog2(AMDGPU_GEM_DOMAIN_DOORBELL)] =
>>> AMDGPU_PL_DOORBELL,
>>>> + };
>>>> + uint32_t domain;
>>>> +
>>>> + if (res)
>>>> + return fold_memtype(res->mem_type);
>>>> +
>>>> + /*
>>>> + * If no backing store use one of the preferred domain for basic
>>>> + * stats. We take the MSB since that should give a reasonable
>>>> + * view.
>>>> + */
>>>> + BUILD_BUG_ON(TTM_PL_VRAM < TTM_PL_TT || TTM_PL_VRAM <
>>> TTM_PL_SYSTEM);
>>>> + domain = fls(bo->preferred_domains & AMDGPU_GEM_DOMAIN_MASK);
>>>> + if (drm_WARN_ON_ONCE(&adev->ddev,
>>>> + domain == 0 || --domain >=
>>>> ARRAY_SIZE(domain_to_pl)))
>>> It's perfectly legal to create a BO without a placement. That one
>>> just won't have a
>>> backing store.
>>>
>> This is lifted from the previous change I'm rebasing onto. I think
>> what it’s trying to do is if the BO doesn't have a placement, use the
>> "biggest" (VRAM > TT > SYSTEM) preferred placement for the purpose of
>> accounting. Previously we just ignore BOs that doesn't have a
>> placement. I guess there's argument for going with either approaches.
>
> I was not arguing, I'm simply pointing out a bug. It's perfectly valid
> for bo->preferred_domains to be 0.
>
> So the following WARN_ON() that no bit is set is incorrect.
>
>>
>>>> + return 0;
>>>> + return fold_memtype(domain_to_pl[domain])
>>> That would need specular execution mitigation if I'm not completely
>>> mistaken.
>>>
>>> Better use a switch/case statement.
>>>
>> Do you mean change the array indexing to a switch statement?
>
> Yes.
Did you mean array_index_nospec? Domain is not a direct userspace input
and is calculated from the mask which sanitized to allowed values prior
to this call. So I *think* switch is an overkill but don't mind it
either. Just commenting FWIW.
Regards,
Tvrtko
More information about the amd-gfx
mailing list