[PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job

Philipp Stanner pstanner at redhat.com
Tue Oct 15 07:11:56 UTC 2024


On Mon, 2024-10-14 at 13:07 +0100, Tvrtko Ursulin wrote:
> 
> On 14/10/2024 12:32, Philipp Stanner wrote:
> > Hi,
> > 
> > On Mon, 2024-10-14 at 11:46 +0100, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> > > 
> > > In FIFO mode We can avoid dropping the lock only to immediately
> > > re-
> > > acquire
> > > by adding a new drm_sched_rq_update_fifo_locked() helper.
> > > 
> > 
> > Please write detailed commit messages, as described here [1].
> >     1. Describe the problem: current state and why it's bad.
> >     2. Then, describe in imperative (present tense) form what the
> > commit
> >        does about the problem.
> 
> Both pieces of info are already there:
> 
> 1. Drops the lock to immediately re-acquire it.
> 2. We avoid that by by adding a locked helper.
> > Optionally, in between can be information about why it's solved
> > this
> > way and not another etc.
> > 
> > Applies to the other patches, too.
> > 
> > 
> > [1]
> > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
> 
> Thanks I am new here and did not know this.
> 
> Seriosuly, lets not be too blindly strict about this because it can
> get 
> IMO ridiculous.
> 
> One example when I previously accomodated your request is patch 3/5
> from 
> this series:
> 
> """
> Current kerneldoc for struct drm_sched_rq incompletely documents what
> fields are protected by the lock.
> 
> This is not good because it is misleading.
> 
> Lets fix it by listing all the elements which are protected by the
> lock.
> """
> 
> While this was the original commit text you weren't happy with:
> 
> """
> drm/sched: Re-order struct drm_sched_rq members for clarity
> 
> Lets re-order the members to make it clear which are protected by the
> lock
> and at the same time document it via kerneldoc.
> """
> 
> I maintain the original text was passable.
> 
> On top, this was just a respin to accomodate the merge process. All 
> approvals were done and dusted couple weeks or so ago so asking for
> yet 
> another respin for such trivial objections is not great.

I understand that you're unhappy, but please understand the position
I'm coming from. As you know, since you sent these patches within a
different series (and, thus, since I reviewed them), I was trusted with
co-maintaining this piece of shared infrastructure.

And since you've worked on it a bit now, I suppose you also know that
the GPU Scheduler is arguably in quite a bad shape, has far too little
documentation, has leaks, maybe race conditions, parts *where the
locking rules are unclear* and is probably only fully understood by a
small hand full of people. I also argue that this is a *very*
complicated piece of software.

So I might be or appear to be a bit pedantic, but I'm not doing that to
terrorize you, but because I want this thing to become well documented,
understandable, and bisectable. Working towards a canonical, idiot-
proof commit style is one measure that will help with that.

I want to offer you the following: I can be more relaxed with things
universally recognized as trivial (comment changes, struct member
reordering) – but when something like a lock is touched in any way, we
shall document that in the commit message as canonically as possible,
so someone who's less experienced and just bisected the commit
immediately understands what has been done (or rather: was supposed to
be done).

Greetings
P.



> 
> Regards,
> 
> Tvrtko
> 
> > > v2:
> > >   * Remove drm_sched_rq_update_fifo() altogether. (Christian)
> > > 
> > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> > > Cc: Christian König <christian.koenig at amd.com>
> > > Cc: Alex Deucher <alexander.deucher at amd.com>
> > > Cc: Luben Tuikov <ltuikov89 at gmail.com>
> > > Cc: Matthew Brost <matthew.brost at intel.com>
> > > Cc: Philipp Stanner <pstanner at redhat.com>
> > > Reviewed-by: Christian König <christian.koenig at amd.com>
> > > ---
> > >   drivers/gpu/drm/scheduler/sched_entity.c | 13 +++++++++----
> > >   drivers/gpu/drm/scheduler/sched_main.c   |  6 +++---
> > >   include/drm/gpu_scheduler.h              |  2 +-
> > >   3 files changed, 13 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> > > b/drivers/gpu/drm/scheduler/sched_entity.c
> > > index 2951fcc2e6b1..b72cba292839 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > > @@ -514,8 +514,12 @@ struct drm_sched_job
> > > *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
> > >    struct drm_sched_job *next;
> > >   
> > >    next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> > > - if (next)
> > > - drm_sched_rq_update_fifo(entity, next->submit_ts);
> > > + if (next) {
> > > + spin_lock(&entity->rq_lock);
> > > + drm_sched_rq_update_fifo_locked(entity,
> > > + next->submit_ts);
> > > + spin_unlock(&entity->rq_lock);
> > > + }
> > >    }
> > >   
> > >    /* Jobs and entities might have different lifecycles. Since
> > > we're
> > > @@ -613,10 +617,11 @@ void drm_sched_entity_push_job(struct
> > > drm_sched_job *sched_job)
> > >    sched = rq->sched;
> > >   
> > >    drm_sched_rq_add_entity(rq, entity);
> > > - spin_unlock(&entity->rq_lock);
> > >   
> > >    if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> > > - drm_sched_rq_update_fifo(entity, submit_ts);
> > > + drm_sched_rq_update_fifo_locked(entity, submit_ts);
> > > +
> > > + spin_unlock(&entity->rq_lock);
> > >   
> > >    drm_sched_wakeup(sched);
> > >    }
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > index e32b0f7d7e94..bbd1630407e4 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -169,14 +169,15 @@ static inline void
> > > drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti
> > >    }
> > >   }
> > >   
> > > -void drm_sched_rq_update_fifo(struct drm_sched_entity *entity,
> > > ktime_t ts)
> > > +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity
> > > *entity, ktime_t ts)
> > 
> > Since you touch function name / signature already, would you mind
> > writing a small doc string that also mentions the locking
> > requirements
> > or lack of the same?
> > 
> > >   {
> > >    /*
> > >    * Both locks need to be grabbed, one to protect from entity-
> > > >rq
> > > change
> > >    * for entity from within concurrent drm_sched_entity_select_rq
> > > and
> > > the
> > >    * other to update the rb tree structure.
> > >    */
> > 
> > It seems to me that the comment above is now out of date, no?
> > 
> > 
> > Thx for your efforts,
> > P.
> > 
> > > - spin_lock(&entity->rq_lock);
> > > + lockdep_assert_held(&entity->rq_lock);
> > > +
> > >    spin_lock(&entity->rq->lock);
> > >   
> > >    drm_sched_rq_remove_fifo_locked(entity);
> > > @@ -187,7 +188,6 @@ void drm_sched_rq_update_fifo(struct
> > > drm_sched_entity *entity, ktime_t ts)
> > >          drm_sched_entity_compare_before);
> > >   
> > >    spin_unlock(&entity->rq->lock);
> > > - spin_unlock(&entity->rq_lock);
> > >   }
> > >   
> > >   /**
> > > diff --git a/include/drm/gpu_scheduler.h
> > > b/include/drm/gpu_scheduler.h
> > > index e9f075f51db3..3658a6cb048e 100644
> > > --- a/include/drm/gpu_scheduler.h
> > > +++ b/include/drm/gpu_scheduler.h
> > > @@ -593,7 +593,7 @@ void drm_sched_rq_add_entity(struct
> > > drm_sched_rq
> > > *rq,
> > >   void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
> > >    struct drm_sched_entity *entity);
> > >   
> > > -void drm_sched_rq_update_fifo(struct drm_sched_entity *entity,
> > > ktime_t ts);
> > > +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity
> > > *entity, ktime_t ts);
> > >   
> > >   int drm_sched_entity_init(struct drm_sched_entity *entity,
> > >      enum drm_sched_priority priority,
> > 
> 



More information about the dri-devel mailing list