[Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Tue Jan 10 12:19:35 UTC 2023
On 10/01/2023 11:28, Tvrtko Ursulin wrote:
>
>
> On 09/01/2023 17:27, Jason Ekstrand wrote:
>
> [snip]
>
>> >>> AFAICT it proposes to have 1:1 between *userspace* created
>> contexts (per
>> >>> context _and_ engine) and drm_sched. I am not sure avoiding
>> invasive changes
>> >>> to the shared code is in the spirit of the overall idea and
>> instead
>> >>> opportunity should be used to look at way to refactor/improve
>> drm_sched.
>>
>>
>> Maybe? I'm not convinced that what Xe is doing is an abuse at all or
>> really needs to drive a re-factor. (More on that later.) There's
>> only one real issue which is that it fires off potentially a lot of
>> kthreads. Even that's not that bad given that kthreads are pretty
>> light and you're not likely to have more kthreads than userspace
>> threads which are much heavier. Not ideal, but not the end of the
>> world either. Definitely something we can/should optimize but if we
>> went through with Xe without this patch, it would probably be mostly ok.
>>
>> >> Yes, it is 1:1 *userspace* engines and drm_sched.
>> >>
>> >> I'm not really prepared to make large changes to DRM scheduler
>> at the
>> >> moment for Xe as they are not really required nor does Boris
>> seem they
>> >> will be required for his work either. I am interested to see
>> what Boris
>> >> comes up with.
>> >>
>> >>> Even on the low level, the idea to replace drm_sched threads
>> with workers
>> >>> has a few problems.
>> >>>
>> >>> To start with, the pattern of:
>> >>>
>> >>> while (not_stopped) {
>> >>> keep picking jobs
>> >>> }
>> >>>
>> >>> Feels fundamentally in disagreement with workers (while
>> obviously fits
>> >>> perfectly with the current kthread design).
>> >>
>> >> The while loop breaks and worker exists if no jobs are ready.
>>
>>
>> I'm not very familiar with workqueues. What are you saying would fit
>> better? One scheduling job per work item rather than one big work item
>> which handles all available jobs?
>
> Yes and no, it indeed IMO does not fit to have a work item which is
> potentially unbound in runtime. But it is a bit moot conceptual mismatch
> because it is a worst case / theoretical, and I think due more
> fundamental concerns.
>
> If we have to go back to the low level side of things, I've picked this
> random spot to consolidate what I have already mentioned and perhaps
> expand.
>
> To start with, let me pull out some thoughts from workqueue.rst:
>
> """
> Generally, work items are not expected to hog a CPU and consume many
> cycles. That means maintaining just enough concurrency to prevent work
> processing from stalling should be optimal.
> """
>
> For unbound queues:
> """
> The responsibility of regulating concurrency level is on the users.
> """
>
> Given the unbound queues will be spawned on demand to service all queued
> work items (more interesting when mixing up with the system_unbound_wq),
> in the proposed design the number of instantiated worker threads does
> not correspond to the number of user threads (as you have elsewhere
> stated), but pessimistically to the number of active user contexts. That
> is the number which drives the maximum number of not-runnable jobs that
> can become runnable at once, and hence spawn that many work items, and
> in turn unbound worker threads.
>
> Several problems there.
>
> It is fundamentally pointless to have potentially that many more threads
> than the number of CPU cores - it simply creates a scheduling storm.
To make matters worse, if I follow the code correctly, all these per
user context worker thread / work items end up contending on the same
lock or circular buffer, both are one instance per GPU:
guc_engine_run_job
-> submit_engine
a) wq_item_append
-> wq_wait_for_space
-> msleep
b) xe_guc_ct_send
-> guc_ct_send
-> mutex_lock(&ct->lock);
-> later a potential msleep in h2g_has_room
Regards,
Tvrtko
More information about the Intel-gfx
mailing list