[PATCH 17/17] amdgpu: add support for memory cgroups

Thu Jul 3 20:06:06 UTC 2025

On Thu, Jul 03, 2025 at 08:15:09PM +0200, Christian König wrote:
> On 03.07.25 19:58, Shakeel Butt wrote:
> > On Thu, Jul 03, 2025 at 12:53:44PM +1000, David Airlie wrote:
> >> On Thu, Jul 3, 2025 at 2:03 AM Shakeel Butt <shakeel.butt at linux.dev> wrote:
> >>>
> >>> On Mon, Jun 30, 2025 at 02:49:36PM +1000, Dave Airlie wrote:
> >>>> From: Dave Airlie <airlied at redhat.com>
> >>>>
> >>>> This adds support for adding a obj cgroup to a buffer object,
> >>>> and passing in the placement flags to make sure it's accounted
> >>>> properly.
> >>>>
> >>>> Signed-off-by: Dave Airlie <airlied at redhat.com>
> >>>> ---
> >>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    |  2 ++
> >>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13 +++++++++----
> >>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  1 +
> >>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  2 ++
> >>>>  4 files changed, 14 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >>>> index e5e33a68d935..d250183bab03 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >>>> @@ -198,6 +198,7 @@ static void amdgpu_gem_object_free(struct drm_gem_object *gobj)
> >>>>       struct amdgpu_bo *aobj = gem_to_amdgpu_bo(gobj);
> >>>>
> >>>>       amdgpu_hmm_unregister(aobj);
> >>>> +     obj_cgroup_put(aobj->tbo.objcg);
> >>>>       ttm_bo_put(&aobj->tbo);
> >>>>  }
> >>>>
> >>>> @@ -225,6 +226,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
> >>>>       bp.domain = initial_domain;
> >>>>       bp.bo_ptr_size = sizeof(struct amdgpu_bo);
> >>>>       bp.xcp_id_plus1 = xcp_id_plus1;
> >>>> +     bp.objcg = get_obj_cgroup_from_current();
> >>>
> >>> In what context this function is called? Is that the same for
> >>> ttm_pool_alloc_page()? Is remote charging happening in
> >>> ttm_pool_alloc_page()?
> >>>
> >>
> >> No, this function is called from userspace ioctl paths that allocate
> >> GPU objects (GEM objects).
> >>
> >> The objects are lazily allocated, so this might not trigger any pages
> >> being bound to the object, until it is populated, either via mapping +
> >> page faults or by being used in a GPU command submission, which is
> >> when the ttm_pool_alloc_page happens.
> >>
> > 
> > For the mapping + page fault or GPU command submission, can there be a
> > case where 'current' is not in the same cgroup as the task which has
> > called amdgpu_gem_object_create() through ioctl? Can the allocation
> > happen in kthread or workqueue or irq?
> 
> Yes, in some use cases that is actually the most common way of ending up in the memory allocation.
> 
> Background is that the first one who touches it actually does the allocation.

Do you mean a task in cgroup A does amdgpu_gem_object_create() and then
the actual allocation can happen in the task in cgroup B?

> 
> BTW: It might be a good idea to not only limit the amount of memory you actually have allocated, but also how much you wanted to allocate.

Do you mean accounting and limiting the reservations? Something like
what hugetlb cgroup provides?