[PATCH v7 4/4] drm/amdgpu: track bo memory stats at runtime

Thu Nov 14 09:06:33 UTC 2024

Hi,

On 13/11/2024 17:30, Tvrtko Ursulin wrote:
> 
> On 13/11/2024 17:01, Li, Yunxiang (Teddy) wrote:
>> [Public]
>>
>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>> Sent: Wednesday, November 13, 2024 9:22
>>> Am 13.11.24 um 14:53 schrieb Li, Yunxiang (Teddy):
>>>> [Public]
>>>>
>>>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>>>> Sent: Wednesday, November 13, 2024 6:39 Am 13.11.24 um 11:25 schrieb
>>>>> Tvrtko Ursulin:
>>>>>> On 13/11/2024 08:49, Christian König wrote:
>>>>>>> Am 12.11.24 um 19:16 schrieb Li, Yunxiang (Teddy):
>>>>>>>> [SNIP]
>>>>>>>>>> +   size = sign * amdgpu_bo_size(bo);
>>>>>>>>>> +   res = bo->tbo.resource;
>>>>>>>>>> +   type = res ? res->mem_type :
>>>>>>>>>> amdgpu_bo_get_preferred_placement(bo);
>>>>>>>>> Again, it's a clear NAK from my side to do stuff like that.
>>>>>>>>>
>>>>>>>>> When there isn't any backing store the BO should *not* be
>>>>>>>>> accounted to anything.
>>>>>>>> I don't have a preference either way, but I think it should be a
>>>>>>>> separate discussion to properly define what drm-total- means.
>>>>>> Total must show the total size of all BOs which exist even if they
>>>>>> don't currently have a backing store. That's how drm-usage-stats.rst
>>>>>> defines the field and that is how all the other drivers work.
>>>>> In that case we should only look at the preferred placement and not
>>>>> the backing store at all.
>>>>>
>>>>> But that makes the total identical to the requested value, doesn't it?
>>>> Yes, the issue is not which BO needs to be counted but where they 
>>>> should be
>>> counted. This gets more complicated if we consider BOs to prefer 
>>> multiple
>>> placements.
>>>>
>>>> IMO it makes sense to have drm-total- to work like the legacy amd- 
>>>> requested-
>>> where we look at BO's preferred placement. For multiple preferred 
>>> placements we
>>> say that the implementation needs to pick one of them to avoid double 
>>> counting, but
>>> which one is up to the implementation as long as it's done in a 
>>> consistent manner.
>>> Does that sound reasonable?
>>>
>>> Yeah that works for me. Just don't look at both bo- 
>>> >preferred_placement and bo-
>>>> tbo.resource because that will certainly be inconsistent in some use 
>>>> cases.
>>
>> oof, from the commit message i915/xe is doing the exact opposite, BO 
>> gets counted in the totals for all the possible(preferred?) regions.
> 
> Which commit message? I was doing that early during i915 patch 
> development but stopped in v2:
> 
> commit 968853033d8aa4dbb80fbafa6f5d9b6a0ea21272
> Author: Tvrtko Ursulin <tursulin at ursulin.net>
> Date:   Tue Nov 7 10:18:06 2023 +0000
> 
>      drm/i915: Implement fdinfo memory stats printing
> 
>      Use the newly added drm_print_memory_stats helper to show memory
>      utilisation of our objects in drm/driver specific fdinfo output.
> 
>      To collect the stats we walk the per memory regions object lists
>      and accumulate object size into the respective drm_memory_stats
>      categories.
> 
>      v2:
>       * Only account against the active region.
> 
> ^^^ THIS ^^^
> 
>       * Use DMA_RESV_USAGE_BOOKKEEP when testing for active. (Tejas)
> 
>      v3:
>       * Update commit text. (Aravind)
>       * Update to use memory regions uabi names.
> 
> In code that would be here:
> 
> static void
> obj_meminfo(struct drm_i915_gem_object *obj,
>          struct drm_memory_stats stats[INTEL_REGION_UNKNOWN])
> {
>      const enum intel_region_id id = obj->mm.region ?
>                      obj->mm.region->id : INTEL_REGION_SMEM;
> 
> So either active region or SMEM if no backing store. Maybe that should 
> be improved too. Grr (to myself).
> 
> I don't see xe is counting total against all regions either, apart that 
> maybe it has potential null ptr deref?
> 
> static void bo_meminfo(struct xe_bo *bo,
>                 struct drm_memory_stats stats[TTM_NUM_MEM_TYPES])
> {
>      u64 sz = bo->size;
>      u32 mem_type = bo->ttm.resource->mem_type;
> 
> Or is bo->ttm.resource always present in xe? Adding Matt according to 
> git blame.

Right, we shouldn't currently have a way of seeing NULL resource here in 
xe, at least if we are holding the bo lock.

> 
> Regards,
> 
> Tvrtko
> 
>>
>> Teddy