[PATCH] drm/amdgpu: Use the slab allocator to reduce job allocation fragmentation
Liang, Prike
Prike.Liang at amd.com
Tue May 14 08:13:00 UTC 2024
[AMD Official Use Only - AMD Internal Distribution Only]
> From: Koenig, Christian <Christian.Koenig at amd.com>
> Sent: Friday, May 10, 2024 5:31 PM
> To: Liang, Prike <Prike.Liang at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: Use the slab allocator to reduce job
> allocation fragmentation
>
> Am 10.05.24 um 10:11 schrieb Prike Liang:
> > Using kzalloc() results in about 50% memory fragmentation, therefore
> > use the slab allocator to reproduce memory fragmentation.
> >
> > Signed-off-by: Prike Liang <Prike.Liang at amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
> > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 26
> ++++++++++++++++++++-----
> > drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 1 +
> > 3 files changed, 23 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index ea14f1c8f430..3de1b42291b6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -3040,6 +3040,7 @@ static void __exit amdgpu_exit(void)
> > amdgpu_fence_slab_fini();
> > mmu_notifier_synchronize();
> > amdgpu_xcp_drv_release();
> > + amdgpue_job_slab_fini();
> > }
> >
> > module_init(amdgpu_init);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > index e4742b65032d..8327bf017a0e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > @@ -31,6 +31,8 @@
> > #include "amdgpu_trace.h"
> > #include "amdgpu_reset.h"
> >
> > +static struct kmem_cache *amdgpu_job_slab;
> > +
> > static enum drm_gpu_sched_stat amdgpu_job_timedout(struct
> drm_sched_job *s_job)
> > {
> > struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); @@ -
> 101,10
> > +103,19 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct
> amdgpu_vm *vm,
> > if (num_ibs == 0)
> > return -EINVAL;
> >
> > - *job = kzalloc(struct_size(*job, ibs, num_ibs), GFP_KERNEL);
> > - if (!*job)
> > + amdgpu_job_slab = kmem_cache_create("amdgpu_job",
> > + struct_size(*job, ibs, num_ibs), 0,
> > + SLAB_HWCACHE_ALIGN, NULL);
>
> Well you are declaring a global slab cache for a dynamic job size, then try to
> set it up in the job allocation function which can be called concurrently with
> different number of IBs.
>
> To sum it up this is completely racy and will go boom immediately in testing.
> As far as I can see this suggestion is just utterly nonsense.
>
> Regards,
> Christian.
>
Hi, Christian
The num_ibs is calculated as 1 in amdgpu_cs_p1_ib() and from amdgpu_cs_pass1(), the num_ibs will be set to 1 as an input parameter at amdgpu_job_alloc(). Moreover, the num_ibs is only set from amdgpu_cs_p1_ib() and shouldn't have a chance to be overwritten from the user space driver side. Also, I checked a few GL and Vulkan applications and didn't find multiple IBs within one amdgpu job submission.
If there are still concerns about the IB array size on the amdgpu_job object allocated, we can remove the IBs member and decompose the IB with the job object. Then, we can export and access the IBs as a parameter from a new interface like amdgpu_cs_patch_ibs(struct amdgpu_cs_parser *p, struct amdgpu_job *job, struct amdgpu_ib *ib).
Regarding this patch, using kmem_cache_zalloc() instead of kzalloc() can save about 448 bytes of memory space for each amdgpu_job object allocated. Meanwhile, the job object allocation takes almost the same time, so it should have no side effect on the performance. If the idea is sensible, I will rework the patch by creating the job slab during the driver probe period.
Thanks,
Prike
> > + if (!amdgpu_job_slab) {
> > + DRM_ERROR("create amdgpu_job cache failed\n");
> > return -ENOMEM;
> > + }
> >
> > + *job = kmem_cache_zalloc(amdgpu_job_slab, GFP_KERNEL);
> > + if (!*job) {
> > + kmem_cache_destroy(amdgpu_job_slab);
> > + return -ENOMEM;
> > + }
> > /*
> > * Initialize the scheduler to at least some ring so that we always
> > * have a pointer to adev.
> > @@ -138,7 +149,7 @@ int amdgpu_job_alloc_with_ib(struct
> amdgpu_device *adev,
> > if (r) {
> > if (entity)
> > drm_sched_job_cleanup(&(*job)->base);
> > - kfree(*job);
> > + kmem_cache_free(amdgpu_job_slab, job);
> > }
> >
> > return r;
> > @@ -179,6 +190,11 @@ void amdgpu_job_free_resources(struct
> amdgpu_job *job)
> > amdgpu_ib_free(ring->adev, &job->ibs[i], f);
> > }
> >
> > +void amdgpue_job_slab_fini(void)
> > +{
> > + kmem_cache_destroy(amdgpu_job_slab);
> > +}
> > +
> > static void amdgpu_job_free_cb(struct drm_sched_job *s_job)
> > {
> > struct amdgpu_job *job = to_amdgpu_job(s_job); @@ -189,7 +205,7
> @@
> > static void amdgpu_job_free_cb(struct drm_sched_job *s_job)
> >
> > /* only put the hw fence if has embedded fence */
> > if (!job->hw_fence.ops)
> > - kfree(job);
> > + kmem_cache_free(amdgpu_job_slab, job);
> > else
> > dma_fence_put(&job->hw_fence);
> > }
> > @@ -221,7 +237,7 @@ void amdgpu_job_free(struct amdgpu_job *job)
> > dma_fence_put(job->gang_submit);
> >
> > if (!job->hw_fence.ops)
> > - kfree(job);
> > + kmem_cache_free(amdgpu_job_slab, job);
> > else
> > dma_fence_put(&job->hw_fence);
> > }
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > index a963a25ddd62..4491d5f9c74d 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > @@ -103,5 +103,6 @@ int amdgpu_job_submit_direct(struct amdgpu_job
> *job, struct amdgpu_ring *ring,
> > struct dma_fence **fence);
> >
> > void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler
> > *sched);
> > +void amdgpue_job_slab_fini(void);
> >
> > #endif
More information about the amd-gfx
mailing list