[PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job

Tvrtko Ursulin tvrtko.ursulin at igalia.com
Tue Oct 15 13:14:35 UTC 2024


On 15/10/2024 12:38, Philipp Stanner wrote:
> On Tue, 2024-10-15 at 09:12 +0100, Tvrtko Ursulin wrote:
>>
>> On 15/10/2024 08:11, Philipp Stanner wrote:
>>> On Mon, 2024-10-14 at 13:07 +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 14/10/2024 12:32, Philipp Stanner wrote:
>>>>> Hi,
>>>>>
>>>>> On Mon, 2024-10-14 at 11:46 +0100, Tvrtko Ursulin wrote:
>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>>>>>
>>>>>> In FIFO mode We can avoid dropping the lock only to
>>>>>> immediately
>>>>>> re-
>>>>>> acquire
>>>>>> by adding a new drm_sched_rq_update_fifo_locked() helper.
>>>>>>
>>>>>
>>>>> Please write detailed commit messages, as described here [1].
>>>>>       1. Describe the problem: current state and why it's bad.
>>>>>       2. Then, describe in imperative (present tense) form what
>>>>> the
>>>>> commit
>>>>>          does about the problem.
>>>>
>>>> Both pieces of info are already there:
>>>>
>>>> 1. Drops the lock to immediately re-acquire it.
>>>> 2. We avoid that by by adding a locked helper.
>>>>> Optionally, in between can be information about why it's solved
>>>>> this
>>>>> way and not another etc.
>>>>>
>>>>> Applies to the other patches, too.
>>>>>
>>>>>
>>>>> [1]
>>>>> https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
>>>>
>>>> Thanks I am new here and did not know this.
>>>>
>>>> Seriosuly, lets not be too blindly strict about this because it
>>>> can
>>>> get
>>>> IMO ridiculous.
>>>>
>>>> One example when I previously accomodated your request is patch
>>>> 3/5
>>>> from
>>>> this series:
>>>>
>>>> """
>>>> Current kerneldoc for struct drm_sched_rq incompletely documents
>>>> what
>>>> fields are protected by the lock.
>>>>
>>>> This is not good because it is misleading.
>>>>
>>>> Lets fix it by listing all the elements which are protected by
>>>> the
>>>> lock.
>>>> """
>>>>
>>>> While this was the original commit text you weren't happy with:
>>>>
>>>> """
>>>> drm/sched: Re-order struct drm_sched_rq members for clarity
>>>>
>>>> Lets re-order the members to make it clear which are protected by
>>>> the
>>>> lock
>>>> and at the same time document it via kerneldoc.
>>>> """
>>>>
>>>> I maintain the original text was passable.
>>>>
>>>> On top, this was just a respin to accomodate the merge process.
>>>> All
>>>> approvals were done and dusted couple weeks or so ago so asking
>>>> for
>>>> yet
>>>> another respin for such trivial objections is not great.
>>>
>>> I understand that you're unhappy, but please understand the
>>> position
>>> I'm coming from. As you know, since you sent these patches within a
>>> different series (and, thus, since I reviewed them), I was trusted
>>> with
>>> co-maintaining this piece of shared infrastructure.
>>>
>>> And since you've worked on it a bit now, I suppose you also know
>>> that
>>> the GPU Scheduler is arguably in quite a bad shape, has far too
>>> little
>>> documentation, has leaks, maybe race conditions, parts *where the
>>> locking rules are unclear* and is probably only fully understood by
>>> a
>>> small hand full of people. I also argue that this is a *very*
>>> complicated piece of software.
>>
>> We already went over that and agreed. Not least I agreed the base is
>> shaky since few years  ago. :)
>>
>> Btw if things align, I hope you will at some point see a follow up
>> series from me which makes some significant simplifications and
>> improvements at the same time.
> 
> Cool, good to hear!
> (Would be even cooler if simplifications and improvements can be
> delivered through separate patch series to be easier to review etc.)

Yes, when I spot something I pull it ahead and/or standalone when it 
makes sense. But it is early days and a big job.

>>> So I might be or appear to be a bit pedantic, but I'm not doing
>>> that to
>>> terrorize you, but because I want this thing to become well
>>> documented,
>>> understandable, and bisectable. Working towards a canonical, idiot-
>>> proof commit style is one measure that will help with that.
>>>
>>> I want to offer you the following: I can be more relaxed with
>>> things
>>> universally recognized as trivial (comment changes, struct member
>>> reordering) – but when something like a lock is touched in any way,
>>> we
>>> shall document that in the commit message as canonically as
>>> possible,
>>> so someone who's less experienced and just bisected the commit
>>> immediately understands what has been done (or rather: was supposed
>>> to
>>> be done).
>>
>> So how would you suggest to expand this commit text so it doesn't
>> read
>> too self-repeating?
> 
> My issue with this particular commit message is mainly that it doesn't
> make it obvious what the patch is supposed to do. So one can make it
> quicker and better to review by detailing it a bit more, so the
> reviewer then can compare commit message vs. what the code does. It
> seems to me for example that the actual optimization is being done in
> drm_sched_entity_push_job(), and drm_sched_entity_pop_job() had to be
> ported, too, for correctness

"It seems" aka the commit title says so. ;)

> Another small thing that might be cool is something that makes it a bit
> more obvious that this is an optimization, not a fix.
> 
> So I would probably write:
> 
> "So far, drm_sched_rq_update_fifo() automatically takes
> drm_sched_entity.rq_lock. For DRM_SCHED_POLICY_FIFO, this is
> inefficient because that lock is then taken, released and retaken in
> drm_sched_entity_push_job().
> 
> Improve performance by moving the locking out of
> drm_sched_rq_update_fifo()."
> 
> Not that much longer but makes it far more obvious what shall be
> achieved where :]

How about this:

"""
In FIFO mode (which is the default), both drm_sched_entity_push_job() 
and drm_sched_rq_update_fifo(), where the latter calls the former, are 
currently taking and releasing the same entity->rq_lock.

We can avoid that design inelegance, and also have a miniscule 
efficiency improvement on the idle submit path, by introducing a new 
drm_sched_rq_update_fifo_locked() helper and pulling up the lock taking 
to its callers.
"""

> (Let me read through the other patches briefly. Then we should be good
> with v2 of this series.. or would it be v3 then?)

Depends how you count. By unique series titles or by fundamental content.

Regards,

Tvrtko


More information about the dri-devel mailing list