[PATCH v7 4/4] drm/amdgpu: track bo memory stats at runtime
Matthew Auld
matthew.auld at intel.com
Thu Nov 14 09:06:33 UTC 2024
Hi,
On 13/11/2024 17:30, Tvrtko Ursulin wrote:
>
> On 13/11/2024 17:01, Li, Yunxiang (Teddy) wrote:
>> [Public]
>>
>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>> Sent: Wednesday, November 13, 2024 9:22
>>> Am 13.11.24 um 14:53 schrieb Li, Yunxiang (Teddy):
>>>> [Public]
>>>>
>>>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>>>> Sent: Wednesday, November 13, 2024 6:39 Am 13.11.24 um 11:25 schrieb
>>>>> Tvrtko Ursulin:
>>>>>> On 13/11/2024 08:49, Christian König wrote:
>>>>>>> Am 12.11.24 um 19:16 schrieb Li, Yunxiang (Teddy):
>>>>>>>> [SNIP]
>>>>>>>>>> + size = sign * amdgpu_bo_size(bo);
>>>>>>>>>> + res = bo->tbo.resource;
>>>>>>>>>> + type = res ? res->mem_type :
>>>>>>>>>> amdgpu_bo_get_preferred_placement(bo);
>>>>>>>>> Again, it's a clear NAK from my side to do stuff like that.
>>>>>>>>>
>>>>>>>>> When there isn't any backing store the BO should *not* be
>>>>>>>>> accounted to anything.
>>>>>>>> I don't have a preference either way, but I think it should be a
>>>>>>>> separate discussion to properly define what drm-total- means.
>>>>>> Total must show the total size of all BOs which exist even if they
>>>>>> don't currently have a backing store. That's how drm-usage-stats.rst
>>>>>> defines the field and that is how all the other drivers work.
>>>>> In that case we should only look at the preferred placement and not
>>>>> the backing store at all.
>>>>>
>>>>> But that makes the total identical to the requested value, doesn't it?
>>>> Yes, the issue is not which BO needs to be counted but where they
>>>> should be
>>> counted. This gets more complicated if we consider BOs to prefer
>>> multiple
>>> placements.
>>>>
>>>> IMO it makes sense to have drm-total- to work like the legacy amd-
>>>> requested-
>>> where we look at BO's preferred placement. For multiple preferred
>>> placements we
>>> say that the implementation needs to pick one of them to avoid double
>>> counting, but
>>> which one is up to the implementation as long as it's done in a
>>> consistent manner.
>>> Does that sound reasonable?
>>>
>>> Yeah that works for me. Just don't look at both bo-
>>> >preferred_placement and bo-
>>>> tbo.resource because that will certainly be inconsistent in some use
>>>> cases.
>>
>> oof, from the commit message i915/xe is doing the exact opposite, BO
>> gets counted in the totals for all the possible(preferred?) regions.
>
> Which commit message? I was doing that early during i915 patch
> development but stopped in v2:
>
> commit 968853033d8aa4dbb80fbafa6f5d9b6a0ea21272
> Author: Tvrtko Ursulin <tursulin at ursulin.net>
> Date: Tue Nov 7 10:18:06 2023 +0000
>
> drm/i915: Implement fdinfo memory stats printing
>
> Use the newly added drm_print_memory_stats helper to show memory
> utilisation of our objects in drm/driver specific fdinfo output.
>
> To collect the stats we walk the per memory regions object lists
> and accumulate object size into the respective drm_memory_stats
> categories.
>
> v2:
> * Only account against the active region.
>
> ^^^ THIS ^^^
>
> * Use DMA_RESV_USAGE_BOOKKEEP when testing for active. (Tejas)
>
> v3:
> * Update commit text. (Aravind)
> * Update to use memory regions uabi names.
>
> In code that would be here:
>
> static void
> obj_meminfo(struct drm_i915_gem_object *obj,
> struct drm_memory_stats stats[INTEL_REGION_UNKNOWN])
> {
> const enum intel_region_id id = obj->mm.region ?
> obj->mm.region->id : INTEL_REGION_SMEM;
>
> So either active region or SMEM if no backing store. Maybe that should
> be improved too. Grr (to myself).
>
> I don't see xe is counting total against all regions either, apart that
> maybe it has potential null ptr deref?
>
> static void bo_meminfo(struct xe_bo *bo,
> struct drm_memory_stats stats[TTM_NUM_MEM_TYPES])
> {
> u64 sz = bo->size;
> u32 mem_type = bo->ttm.resource->mem_type;
>
> Or is bo->ttm.resource always present in xe? Adding Matt according to
> git blame.
Right, we shouldn't currently have a way of seeing NULL resource here in
xe, at least if we are holding the bo lock.
>
> Regards,
>
> Tvrtko
>
>>
>> Teddy
More information about the amd-gfx
mailing list