[PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job

Philipp Stanner pstanner at redhat.com
Tue Oct 15 14:00:56 UTC 2024


On Tue, 2024-10-15 at 14:14 +0100, Tvrtko Ursulin wrote:
> 
> On 15/10/2024 12:38, Philipp Stanner wrote:
> > On Tue, 2024-10-15 at 09:12 +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 15/10/2024 08:11, Philipp Stanner wrote:
> > > > On Mon, 2024-10-14 at 13:07 +0100, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 14/10/2024 12:32, Philipp Stanner wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > On Mon, 2024-10-14 at 11:46 +0100, Tvrtko Ursulin wrote:
> > > > > > > From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> > > > > > > 
> > > > > > > In FIFO mode We can avoid dropping the lock only to
> > > > > > > immediately
> > > > > > > re-
> > > > > > > acquire
> > > > > > > by adding a new drm_sched_rq_update_fifo_locked() helper.
> > > > > > > 
> > > > > > 
> > > > > > Please write detailed commit messages, as described here
> > > > > > [1].
> > > > > >       1. Describe the problem: current state and why it's
> > > > > > bad.
> > > > > >       2. Then, describe in imperative (present tense) form
> > > > > > what
> > > > > > the
> > > > > > commit
> > > > > >          does about the problem.
> > > > > 
> > > > > Both pieces of info are already there:
> > > > > 
> > > > > 1. Drops the lock to immediately re-acquire it.
> > > > > 2. We avoid that by by adding a locked helper.
> > > > > > Optionally, in between can be information about why it's
> > > > > > solved
> > > > > > this
> > > > > > way and not another etc.
> > > > > > 
> > > > > > Applies to the other patches, too.
> > > > > > 
> > > > > > 
> > > > > > [1]
> > > > > > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
> > > > > 
> > > > > Thanks I am new here and did not know this.
> > > > > 
> > > > > Seriosuly, lets not be too blindly strict about this because
> > > > > it
> > > > > can
> > > > > get
> > > > > IMO ridiculous.
> > > > > 
> > > > > One example when I previously accomodated your request is
> > > > > patch
> > > > > 3/5
> > > > > from
> > > > > this series:
> > > > > 
> > > > > """
> > > > > Current kerneldoc for struct drm_sched_rq incompletely
> > > > > documents
> > > > > what
> > > > > fields are protected by the lock.
> > > > > 
> > > > > This is not good because it is misleading.
> > > > > 
> > > > > Lets fix it by listing all the elements which are protected
> > > > > by
> > > > > the
> > > > > lock.
> > > > > """
> > > > > 
> > > > > While this was the original commit text you weren't happy
> > > > > with:
> > > > > 
> > > > > """
> > > > > drm/sched: Re-order struct drm_sched_rq members for clarity
> > > > > 
> > > > > Lets re-order the members to make it clear which are
> > > > > protected by
> > > > > the
> > > > > lock
> > > > > and at the same time document it via kerneldoc.
> > > > > """
> > > > > 
> > > > > I maintain the original text was passable.
> > > > > 
> > > > > On top, this was just a respin to accomodate the merge
> > > > > process.
> > > > > All
> > > > > approvals were done and dusted couple weeks or so ago so
> > > > > asking
> > > > > for
> > > > > yet
> > > > > another respin for such trivial objections is not great.
> > > > 
> > > > I understand that you're unhappy, but please understand the
> > > > position
> > > > I'm coming from. As you know, since you sent these patches
> > > > within a
> > > > different series (and, thus, since I reviewed them), I was
> > > > trusted
> > > > with
> > > > co-maintaining this piece of shared infrastructure.
> > > > 
> > > > And since you've worked on it a bit now, I suppose you also
> > > > know
> > > > that
> > > > the GPU Scheduler is arguably in quite a bad shape, has far too
> > > > little
> > > > documentation, has leaks, maybe race conditions, parts *where
> > > > the
> > > > locking rules are unclear* and is probably only fully
> > > > understood by
> > > > a
> > > > small hand full of people. I also argue that this is a *very*
> > > > complicated piece of software.
> > > 
> > > We already went over that and agreed. Not least I agreed the base
> > > is
> > > shaky since few years  ago. :)
> > > 
> > > Btw if things align, I hope you will at some point see a follow
> > > up
> > > series from me which makes some significant simplifications and
> > > improvements at the same time.
> > 
> > Cool, good to hear!
> > (Would be even cooler if simplifications and improvements can be
> > delivered through separate patch series to be easier to review
> > etc.)
> 
> Yes, when I spot something I pull it ahead and/or standalone when it 
> makes sense. But it is early days and a big job.
> 
> > > > So I might be or appear to be a bit pedantic, but I'm not doing
> > > > that to
> > > > terrorize you, but because I want this thing to become well
> > > > documented,
> > > > understandable, and bisectable. Working towards a canonical,
> > > > idiot-
> > > > proof commit style is one measure that will help with that.
> > > > 
> > > > I want to offer you the following: I can be more relaxed with
> > > > things
> > > > universally recognized as trivial (comment changes, struct
> > > > member
> > > > reordering) – but when something like a lock is touched in any
> > > > way,
> > > > we
> > > > shall document that in the commit message as canonically as
> > > > possible,
> > > > so someone who's less experienced and just bisected the commit
> > > > immediately understands what has been done (or rather: was
> > > > supposed
> > > > to
> > > > be done).
> > > 
> > > So how would you suggest to expand this commit text so it doesn't
> > > read
> > > too self-repeating?
> > 
> > My issue with this particular commit message is mainly that it
> > doesn't
> > make it obvious what the patch is supposed to do. So one can make
> > it
> > quicker and better to review by detailing it a bit more, so the
> > reviewer then can compare commit message vs. what the code does. It
> > seems to me for example that the actual optimization is being done
> > in
> > drm_sched_entity_push_job(), and drm_sched_entity_pop_job() had to
> > be
> > ported, too, for correctness
> 
> "It seems" aka the commit title says so. ;)
> 
> > Another small thing that might be cool is something that makes it a
> > bit
> > more obvious that this is an optimization, not a fix.
> > 
> > So I would probably write:
> > 
> > "So far, drm_sched_rq_update_fifo() automatically takes
> > drm_sched_entity.rq_lock. For DRM_SCHED_POLICY_FIFO, this is
> > inefficient because that lock is then taken, released and retaken
> > in
> > drm_sched_entity_push_job().
> > 
> > Improve performance by moving the locking out of
> > drm_sched_rq_update_fifo()."
> > 
> > Not that much longer but makes it far more obvious what shall be
> > achieved where :]
> 
> How about this:
> 
> """
> In FIFO mode (which is the default), both drm_sched_entity_push_job()
> and drm_sched_rq_update_fifo(), where the latter calls the former,
> are 
> currently taking and releasing the same entity->rq_lock.
> 
> We can avoid that design inelegance, and also have a miniscule 
> efficiency improvement on the idle submit path, by introducing a new 
> drm_sched_rq_update_fifo_locked() helper and pulling up the lock
> taking 
> to its callers.
> """

That looks good to me

P.


> 
> > (Let me read through the other patches briefly. Then we should be
> > good
> > with v2 of this series.. or would it be v3 then?)
> 
> Depends how you count. By unique series titles or by fundamental
> content.
> 
> Regards,
> 
> Tvrtko
> 



More information about the dri-devel mailing list