[Nouveau] [PATCH] drm/sched: Convert the GPU scheduler to variable number of run-queues

Dave Airlie airlied at gmail.com
Wed Nov 1 02:37:18 UTC 2023


On Wed, 1 Nov 2023 at 11:46, Luben Tuikov <ltuikov89 at gmail.com> wrote:
>
> On 2023-10-31 09:33, Danilo Krummrich wrote:
> >
> > On 10/26/23 19:25, Luben Tuikov wrote:
> >> On 2023-10-26 12:39, Danilo Krummrich wrote:
> >>> On 10/23/23 05:22, Luben Tuikov wrote:
> >>>> The GPU scheduler has now a variable number of run-queues, which are set up at
> >>>> drm_sched_init() time. This way, each driver announces how many run-queues it
> >>>> requires (supports) per each GPU scheduler it creates. Note, that run-queues
> >>>> correspond to scheduler "priorities", thus if the number of run-queues is set
> >>>> to 1 at drm_sched_init(), then that scheduler supports a single run-queue,
> >>>> i.e. single "priority". If a driver further sets a single entity per
> >>>> run-queue, then this creates a 1-to-1 correspondence between a scheduler and
> >>>> a scheduled entity.
> >>>
> >>> Generally, I'm fine with this patch and how it replaces / generalizes the single
> >>> entity approach.
> >>
> >> Great!
> >>
> >>> However, I'm not quite sure how to properly use this. What is a driver supposed to
> >>> do, which previously took advantage of DRM_SCHED_POLICY_SINGLE_ENTITY?
> >>>
> >>> Is it supposed to call drm_sched_init() with num_rqs=1? If so, what's the correct way
> >>
> >> Yes, you call drm_sched_init() with num_rqs set to 1.
> >>
> >>> to initialize the drm_sched_entity then? Calling drm_sched_entity_init() with priority=0?
> >>
> >> Yes, with priority set to 0.
> >>
> >> One unfortunate fact I noticed when doing this patch is that the numerical values
> >> assigned to enum drm_sched_priority is that the names to values are upside down.
> >> Instead of min being 0, normal:1, high:2, kernel:3, it should've been kernel:0 (highest),
> >> high:1, normal:2, low:4, and so on.
> >>
> >> The reason for this is absolutely clear: if you had a single priority, it would be
> >> 0, the kernel, one, highest one. This is similar to how lanes in a highway are counted:
> >> you always have lane 1. Similarly to nice(1) and kernel priorities...
> >>
> >>> Any other priority consequently faults in drm_sched_job_arm().
> >>
> >> drm_sched_job_arm() faults on !ENTITY, but the "priority" is just
> >> assigned to s_priority:
> >>      job->s_priority = entity->priority;
> >>
> >>
> >>> While I might sound like a broken record (sorry for that), I really think everything
> >>> related to Matt's series needs documentation, as in:
> >>
> >> Yes, I agree.
> >
> > Great! Do you plan to send a subsequent patch adding some documentation for this one? I
> > think it'd be good to get all the above documented.
>
> A lot of this would be the magic sauce of drivers and hardware--as we've seen with Xe,
> and it would be presumptuous of me to write down to the detail of what and how this
> and that should be used.

Nope it wouldn't be. Please feel free to persume how drivers might
work in the form of documentation.

At some point the scheduler needs to be documented and so far two
maintainers have avoided doing so, and it's causing no end of
problems.

Write documentation, this goes for Xe scheduler patches, Danilo's work.

When someone asks you for docs, consider it a blocker on getting stuff
merged, because this stuff isn't obvious.

Dave.


More information about the Nouveau mailing list