[PATCH drm-misc-next v3] drm/sched: implement dynamic job-flow control

Tue Oct 31 13:20:50 UTC 2023

Hi Danilo,

sorry for splitting up the mail thread. I had to fetch this mail from my 
spam folder for some reason.

Am 30.10.23 um 18:56 schrieb Danilo Krummrich:
> Hi Christian,
>
> [SNIP]
>>> And yes, we can live with the overhead of making jobs
>>> slightly bigger than they actually are, thus potentially delaying their
>>> execution. It's just that I don't quite understand the rational behind
>>> this conservatism, as I don't really see what negative impact this extra
>>> ->update_job_credits() call in the credit checking path has, other than
>>> the slight overhead of an if-check for drivers that don't need it.
>>  From experience it showed that we should not make the scheduler more
>> complicated than necessary. And I still think that the ring buffers only
>> need to be filled enough to keep the hardware busy.
> Right, and this callback contributes to exactly that.
>
> I don't really think there is much to worry about in terms of introducing more
> complexity. The implementation behind this callback is fairly trivial - it's
> simply called right before we check whether a job fits on the ring, to fetch
> the job's actual size.
>
> I would agree if the implementation of that would be bulky, tricky and asking
> for a compromise. But, since it actually is simple and straight forward I really
> think that if we implement some kind of dynamic job-flow control it should be
> implemented as acurate as possible rather than doing it half-baked.

Yeah, I see the intention here and can perfectly relate to it it's just 
that I have prioritize other things.

Adding this callback allows for the driver to influence the job 
submission and while this might seems useful now I'm just to much of a 
burned child to do stuff like this without having a really good reason 
for it.

>> If this here has some measurable positive effect then yeah we should
>> probably do it, but as long as it's only nice to have I have some objections
>> to that.
> Can't answer this, since Nouveau doesn't support native fence waits. However, I
> guess it depends on how many native fences a job carries. So, if we'd have two
> jobs with each of them carrying a lot of native fences, but not a lot of actual
> work, I can very well imagine that over-accounting can have a measureable
> impact.

What I can imagine as well is things like the hardware or firmware is 
looking at the fullness of the ring buffer to predict how much pending 
work there is.

> I just wonder if we really want to ask for real measurements given that the
> optimization is rather trivial and cheap and we already have enough evidence
> that it makes sense conceptually.

Well that's the point I disagree on, this callback isn't trivial. If 
(for example) the driver returns a value larger than initially estimated 
for the job we can run into an endless loop.

It's just one more thing which can go boom. At bare minimum we should 
check that the value is always decreasing.

Christian.

>
> - Danilo
>
>> Regards,
>> Christian.
>>
>>> Regards,
>>>
>>> Boris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20231031/ef4a0e00/attachment.htm>