[PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency

Mon May 26 11:27:28 UTC 2025

On Mon, 2025-05-26 at 13:16 +0200, Christian König wrote:
> On 5/26/25 11:34, Philipp Stanner wrote:
> > On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote:
> > > On 5/23/25 16:16, Danilo Krummrich wrote:
> > > > On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich
> > > > wrote:
> > > > > On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König
> > > > > wrote:
> > > > > > It turned out that we can actually massively optimize here.
> > > > > > 
> > > > > > The previous code was horrible inefficient since it
> > > > > > constantly
> > > > > > released
> > > > > > and re-acquired the lock of the xarray and started each
> > > > > > iteration from the
> > > > > > base of the array to avoid concurrent modification which in
> > > > > > our
> > > > > > case
> > > > > > doesn't exist.
> > > > > > 
> > > > > > Additional to that the xas_find() and xas_store() functions
> > > > > > are
> > > > > > explicitly
> > > > > > made in a way so that you can efficiently check entries and
> > > > > > if
> > > > > > you don't
> > > > > > find a match store a new one at the end or replace existing
> > > > > > ones.
> > > > > > 
> > > > > > So use xas_for_each()/xa_store() instead of
> > > > > > xa_for_each()/xa_alloc().
> > > > > > It's a bit more code, but should be much faster in the end.
> > > > > 
> > > > > This commit message does neither explain the motivation of
> > > > > the
> > > > > commit nor what it
> > > > > does. It describes what instead belongs into the changelog
> > > > > between versions.
> > > > 
> > > > Sorry, this is wrong. I got confused, the commit message is
> > > > perfectly fine. :)
> > > > 
> > > > The rest still applies though.
> > > > 
> > > > > Speaking of versioning of the patch series, AFAIK there were
> > > > > previous versions,
> > > > > but this series was sent as a whole new series -- why?
> > > > > 
> > > > > Please resend with a proper commit message, version and
> > > > > changelog. Thanks!
> > > 
> > > 
> > > Well Philip asked to remove the changelog. I'm happy to bring it
> > > back, but yeah...
> > 
> > No no no no :D
> > 
> > Philipp asked for the changelog to be removed *from the git commit
> > message*; because it doesn't belong / isn't useful there.
> > 
> > If there's a cover letter, the changelog should be in the cover
> > letter.
> > If there's no cover letter, it should be between the ---
> > separators:
> 
> I can live with that, just clearly state what you want.

Sure thing:

 * Patches and patch series's should contain their version identifier
   within the square brackets [PATCH v3]. git format-patch -v3 does
   that automatically.
 * Changelog should be as described above
 * Ideally, cover letters always contain the full changelog, v2, v3 and
   so on, so that new readers get a sense of the evolution of the
   series.

> 
> For DRM the ask is often to keep the changelog in the commit message
> or remove it entirely.

Yup, I've seen that a few times. I think we, the DRM community, should
stop that. It's just not useful and makes the commit messages larger,
both for the human reader while scrolling, as for the hard drive
regarding storage size

Thx
P.

> 
> Regards,
> Christian.
> 
> > 
> > 
> > Signed-off-by: Gordon Freeman <freeman at blackmesa.org>
> > Reviewed-by: Alyx Vance <alyx at vance.edu>
> > ---
> > Changes in v2:
> >   - Provide more docu for crowbar-alloc-function.
> >   - Use NULL pointers for reserved xarray entries
> > ---
> > <DIFF>
> > 
> > 
> > P.
> > 
> > 
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > > > 
> > > > > > Signed-off-by: Christian König <christian.koenig at amd.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/scheduler/sched_main.c | 29
> > > > > > ++++++++++++++++++--------
> > > > > >  1 file changed, 20 insertions(+), 9 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > index f7118497e47a..cf200b1b643e 100644
> > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
> > > > > >  int drm_sched_job_add_dependency(struct drm_sched_job
> > > > > > *job,
> > > > > >   struct dma_fence *fence)
> > > > > >  {
> > > > > > + XA_STATE(xas, &job->dependencies, 0);
> > > > > >   struct dma_fence *entry;
> > > > > > - unsigned long index;
> > > > > > - u32 id = 0;
> > > > > > - int ret;
> > > > > >  
> > > > > >   if (!fence)
> > > > > >   return 0;
> > > > > > @@ -883,24 +881,37 @@ int
> > > > > > drm_sched_job_add_dependency(struct
> > > > > > drm_sched_job *job,
> > > > > >   * This lets the size of the array of deps scale with
> > > > > > the number of
> > > > > >   * engines involved, rather than the number of BOs.
> > > > > >   */
> > > > > > - xa_for_each(&job->dependencies, index, entry) {
> > > > > > + xa_lock(&job->dependencies);
> > > > > > + xas_for_each(&xas, entry, ULONG_MAX) {
> > > > > >   if (entry->context != fence->context)
> > > > > >   continue;
> > > > > >  
> > > > > >   if (dma_fence_is_later(fence, entry)) {
> > > > > >   dma_fence_put(entry);
> > > > > > - xa_store(&job->dependencies, index,
> > > > > > fence, GFP_KERNEL);
> > > > > > + xas_store(&xas, fence);
> > > > > >   } else {
> > > > > >   dma_fence_put(fence);
> > > > > >   }
> > > > > > - return 0;
> > > > > > + xa_unlock(&job->dependencies);
> > > > > > + return xas_error(&xas);
> > > > > >   }
> > > > > >  
> > > > > > - ret = xa_alloc(&job->dependencies, &id, fence,
> > > > > > xa_limit_32b, GFP_KERNEL);
> > > > > > - if (ret != 0)
> > > > > > +retry:
> > > > > > + entry = xas_store(&xas, fence);
> > > > > > + xa_unlock(&job->dependencies);
> > > > > > +
> > > > > > + /* There shouldn't be any concurrent add, so no need
> > > > > > to loop again */
> > > > > 
> > > > > Concurrency shouldn't matter, xas_nomem() stores the pre-
> > > > > allocated memory in the
> > > > > XA_STATE not the xarray. Hence, I think we should remove the
> > > > > comment.
> > > > > 
> > > > > > + if (xas_nomem(&xas, GFP_KERNEL)) {
> > > > > > + xa_lock(&job->dependencies);
> > > > > > + goto retry;
> > > > > 
> > > > > Please don't use a goto here, if we would have failed to
> > > > > allocate
> > > > > memory here,
> > > > > this would be an endless loop until we succeed eventually. It
> > > > > would be equal to:
> > > > > 
> > > > > while (!ptr) {
> > > > > ptr = kmalloc();
> > > > > }
> > > > > 
> > > > > Instead just take the lock and call xas_store() again.
> > > > > 
> > > > > > + }
> > > > > > +
> > > > > > + if (xas_error(&xas))
> > > > > >   dma_fence_put(fence);
> > > > > > + else
> > > > > > + WARN_ON(entry);
> > > > > 
> > > > > Please don't call WARN_ON() here, this isn't fatal, we only
> > > > > need
> > > > > to return the
> > > > > error code.
> > > 
> > 
>