[PATCH 1/8] drm/sched: Convert drm scheduler to use a work queue rather than kthread

Thu Aug 3 14:56:07 UTC 2023

Am 03.08.23 um 16:43 schrieb Matthew Brost:
> On Thu, Aug 03, 2023 at 11:11:13AM +0100, Tvrtko Ursulin wrote:
>> On 01/08/2023 21:50, Matthew Brost wrote:
>> [SNIP]
>>>    	sched->ops = ops;
>>>    	sched->hw_submission_limit = hw_submission;
>>>    	sched->name = name;
>>> +	sched->run_wq = run_wq ? : system_wq;
>> I still think it is not nice to implicitly move everyone over to the shared
>> system wq. Maybe even more so with now one at a time execution, since effect
>> on latency can be even greater.
>>
> No one that has a stake in this has pushed back that I can recall. Open
> to feedback stakeholders (maintainers of drivers that use the drm
> scheduler).

No objections to using the system_wq here. Drivers can still pass in 
their own or simply use system_highpri_wq instead.

Additional to that the system_wq isn't single threaded, it will create 
as much threads as needed to fully utilize all CPUs.

>   The i915 doesn't use the DRM scheduler last time I looked.
> Has that changed?
>   
>> Have you considered kthread_work as a backend? Maybe it would work to have
>> callers pass in a kthread_worker they create, or provide a drm_sched helper
>> to create one, which would then be passed to drm_sched_init.
>>
>> That would enable per driver kthread_worker, or per device, or whatever
>> granularity each driver would want/need/desire.
>>
>> driver init:
>> struct drm_sched_worker = drm_sched_create_worker(...);
>>
>> queue/whatever init:
>> drm_sched_init(.., worker, ...);
>>
> This idea doesn't seem to work for varitey of reasons. Will type it out
> if needed but not going to spend time on this unless someone with a
> stake raises this as an issue.

Agree completely. kthread_work is for real time workers IIRC.

>   
>> You could create one inside drm_sched_init if not passed in, which would
>> keep the behaviour for existing drivers more similar - they would still have
>> a 1:1 kthread context for their exclusive use.
>>
> Part of the idea of a work queue is so a user can't directly create a
> kthread via an IOCTL (XE_EXEC_QUEUE_CREATE). What you suggesting exposes
> this issue.

Yeah, prevent that is indeed a very good idea.

>
>> And I *think* self-re-arming would be less problematic latency wise since
>> kthread_worker consumes everything queued without relinquishing control and
>> execution context would be guaranteed not to be shared with random system
>> stuff.
>>
> So this is essentially so we can use a loop? Seems like a lot effort for
> what is pure speculation. Again if a stakeholder raises an issue we can
> address then.

Instead of a loop what you usually do in the worker is to submit one 
item (if possible) and then re-queue yourself if there is more work to do.

This way you give others chance to run as well and/or cancel the work etc...

Christian.

>
> Matt
>
>> Regards,
>>
>> Tvrtko
>>