[PATCH v7 4/4] drm/amdgpu: track bo memory stats at runtime
Li, Yunxiang (Teddy)
Yunxiang.Li at amd.com
Thu Nov 14 15:52:37 UTC 2024
[AMD Official Use Only - AMD Internal Distribution Only]
> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> Sent: Wednesday, November 13, 2024 12:31
> On 13/11/2024 17:01, Li, Yunxiang (Teddy) wrote:
> > [Public]
> >
> >> From: Koenig, Christian <Christian.Koenig at amd.com>
> >> Sent: Wednesday, November 13, 2024 9:22 Am 13.11.24 um 14:53 schrieb
> >> Li, Yunxiang (Teddy):
> >>> [Public]
> >>>
> >>>> From: Koenig, Christian <Christian.Koenig at amd.com>
> >>>> Sent: Wednesday, November 13, 2024 6:39 Am 13.11.24 um 11:25
> >>>> schrieb Tvrtko Ursulin:
> >>>>> On 13/11/2024 08:49, Christian König wrote:
> >>>>>> Am 12.11.24 um 19:16 schrieb Li, Yunxiang (Teddy):
> >>>>>>> [SNIP]
> >>>>>>>>> + size = sign * amdgpu_bo_size(bo);
> >>>>>>>>> + res = bo->tbo.resource;
> >>>>>>>>> + type = res ? res->mem_type :
> >>>>>>>>> amdgpu_bo_get_preferred_placement(bo);
> >>>>>>>> Again, it's a clear NAK from my side to do stuff like that.
> >>>>>>>>
> >>>>>>>> When there isn't any backing store the BO should *not* be
> >>>>>>>> accounted to anything.
> >>>>>>> I don't have a preference either way, but I think it should be a
> >>>>>>> separate discussion to properly define what drm-total- means.
> >>>>> Total must show the total size of all BOs which exist even if they
> >>>>> don't currently have a backing store. That's how
> >>>>> drm-usage-stats.rst defines the field and that is how all the other drivers
> work.
> >>>> In that case we should only look at the preferred placement and not
> >>>> the backing store at all.
> >>>>
> >>>> But that makes the total identical to the requested value, doesn't it?
> >>> Yes, the issue is not which BO needs to be counted but where they
> >>> should be
> >> counted. This gets more complicated if we consider BOs to prefer
> >> multiple placements.
> >>>
> >>> IMO it makes sense to have drm-total- to work like the legacy
> >>> amd-requested-
> >> where we look at BO's preferred placement. For multiple preferred
> >> placements we say that the implementation needs to pick one of them
> >> to avoid double counting, but which one is up to the implementation as long as
> it's done in a consistent manner.
> >> Does that sound reasonable?
> >>
> >> Yeah that works for me. Just don't look at both
> >> bo->preferred_placement and bo-
> >>> tbo.resource because that will certainly be inconsistent in some use cases.
> >
> > oof, from the commit message i915/xe is doing the exact opposite, BO gets
> counted in the totals for all the possible(preferred?) regions.
>
> Which commit message? I was doing that early during i915 patch development but
> stopped in v2:
>
> commit 968853033d8aa4dbb80fbafa6f5d9b6a0ea21272
> Author: Tvrtko Ursulin <tursulin at ursulin.net>
> Date: Tue Nov 7 10:18:06 2023 +0000
>
> drm/i915: Implement fdinfo memory stats printing
>
> Use the newly added drm_print_memory_stats helper to show memory
> utilisation of our objects in drm/driver specific fdinfo output.
>
> To collect the stats we walk the per memory regions object lists
> and accumulate object size into the respective drm_memory_stats
> categories.
>
> v2:
> * Only account against the active region.
>
> ^^^ THIS ^^^
>
> * Use DMA_RESV_USAGE_BOOKKEEP when testing for active. (Tejas)
>
> v3:
> * Update commit text. (Aravind)
> * Update to use memory regions uabi names.
>
> In code that would be here:
>
> static void
> obj_meminfo(struct drm_i915_gem_object *obj,
> struct drm_memory_stats stats[INTEL_REGION_UNKNOWN]) {
> const enum intel_region_id id = obj->mm.region ?
> obj->mm.region->id :
> INTEL_REGION_SMEM;
>
> So either active region or SMEM if no backing store. Maybe that should be
> improved too. Grr (to myself).
>
> I don't see xe is counting total against all regions either, apart that maybe it has
> potential null ptr deref?
>
> static void bo_meminfo(struct xe_bo *bo,
> struct drm_memory_stats stats[TTM_NUM_MEM_TYPES]) {
> u64 sz = bo->size;
> u32 mem_type = bo->ttm.resource->mem_type;
>
> Or is bo->ttm.resource always present in xe? Adding Matt according to git blame.
>
> Regards,
>
> Tvrtko
Thanks for the background, yeah it was mentioned in the xe commit, and the i915 changes looked very similar so I thought I should probably mention here. Now I dug around the mailing list and seems there was some confusion and the wrong commit message got merged.
https://lists.freedesktop.org/archives/intel-xe/2023-August/010885.html
More information about the amd-gfx
mailing list