[RFC PATCH 00/10] Xe DRM scheduler and long running workload plans
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Tue Apr 4 17:29:25 UTC 2023
On 04/04/2023 14:52, Matthew Brost wrote:
> On Tue, Apr 04, 2023 at 10:43:03AM +0100, Tvrtko Ursulin wrote:
>>
>> On 04/04/2023 01:22, Matthew Brost wrote:
>>> Hello,
>>>
>>> As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we
>>> have been asked to merge our common DRM scheduler patches first as well
>>> as develop a common solution for long running workloads with the DRM
>>> scheduler. This RFC series is our first attempt at doing this. We
>>> welcome any and all feedback.
>>>
>>> This can we thought of as 4 parts detailed below.
>>>
>>> - DRM scheduler changes for 1 to 1 relationship between scheduler and
>>> entity (patches 1-3)
>>>
>>> In Xe all of the scheduling of jobs is done by a firmware scheduler (the
>>> GuC) which is a new paradigm WRT to the DRM scheduler and presents
>>> severals problems as the DRM was originally designed to schedule jobs on
>>> hardware queues. The main problem being that DRM scheduler expects the
>>> submission order of jobs to be the completion order of jobs even across
>>> multiple entities. This assumption falls apart with a firmware scheduler
>>> as a firmware scheduler has no concept of jobs and jobs can complete out
>>> of order. A novel solution for was originally thought of by Faith during
>>> the initial prototype of Xe, create a 1 to 1 relationship between scheduler
>>> and entity. I believe the AGX driver [3] is using this approach and
>>> Boris may use approach as well for the Mali driver [4].
>>>
>>> To support a 1 to 1 relationship we move the main execution function
>>> from a kthread to a work queue and add a new scheduling mode which
>>> bypasses code in the DRM which isn't needed in a 1 to 1 relationship.
>>> The new scheduling mode should unify all drivers usage with a 1 to 1
>>> relationship and can be thought of as using scheduler as a dependency /
>>> infligt job tracker rather than a true scheduler.
>>
>> Once you add capability for a more proper 1:1 via
>> DRM_SCHED_POLICY_SINGLE_ENTITY, do you still have further need to replace
>> kthreads with a wq?
>>
>> Or in other words, what purpose does the offloading of a job picking code to
>> a separate execution context serve? Could it be done directly in the 1:1
>> mode and leave kthread setup for N:M?
>>
>
> Addressed the other two on my reply to Christian...
>
> For this one basically the concept of a single entity point IMO is a
> very good concept which I'd like to keep. But most important reason
> being the main execution thread (now worker) is kicked when a dependency
> for a job is resolved, dependencies are dma-fences signaled via a
> callback, and these call backs can be signaled in IRQ contexts. We
> absolutely do not want to enter the backend in an IRQ context for a
> variety of reasons.
Sounds like a fair enough requirement but if drivers will not be
comfortable with the wq conversion, it is probably possible to introduce
some vfuncs for the 1:1 case which would allow scheduler users override
the scheduler wakeup and select a special "pick one job" path. That
could allow 1:1 users do their thing, leaving rest as is. I mean you
already have the special single entity scheduler, you'd just need to add
some more specialization on the init, wake up, etc paths.
And I will mention once more that I find a wq item with a loop such as:
while (!READ_ONCE(sched->pause_run_wq)) {
...
A bit dodgy. If you piggy back on any system_wq it smells of system wide
starvation so for me any proposal with an option to use a system shared
wq is a no go.
Regards,
Tvrtko
>> Apart from those design level questions, low level open IMO still is that
>> default fallback of using the system_wq has the potential to affect latency
>> for other drivers. But that's for those driver owners to approve.
>>
>> Regards,
>>
>> Tvrtko
>>
>>> - Generic messaging interface for DRM scheduler
>>>
>>> Idea is to be able to communicate to the submission backend with in band
>>> (relative to main execution function) messages. Messages are backend
>>> defined and flexable enough for any use case. In Xe we use these
>>> messages to clean up entites, set properties for entites, and suspend /
>>> resume execution of an entity [5]. I suspect other driver can leverage
>>> this messaging concept too as it a convenient way to avoid races in the
>>> backend.
>>>
>>> - Support for using TDR for all error paths of a scheduler / entity
>>>
>>> Fix a few races / bugs, add function to dynamically set the TDR timeout.
>>>
>>> - Annotate dma-fences for long running workloads.
>>>
>>> The idea here is to use dma-fences only as sync points within the
>>> scheduler and never export them for long running workloads. By
>>> annotating these fences as long running we ensure that these dma-fences
>>> are never used in a way that breaks the dma-fence rules. A benefit of
>>> thus approach is the scheduler can still safely flow control the
>>> execution ring buffer via the job limit without breaking the dma-fence
>>> rules.
>>>
>>> Again this a first draft and looking forward to feedback.
>>>
>>> Enjoy - Matt
>>>
>>> [1] https://gitlab.freedesktop.org/drm/xe/kernel
>>> [2] https://patchwork.freedesktop.org/series/112188/
>>> [3] https://patchwork.freedesktop.org/series/114772/
>>> [4] https://patchwork.freedesktop.org/patch/515854/?series=112188&rev=1
>>> [5] https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_guc_submit.c#L1031
>>>
>>> Matthew Brost (8):
>>> drm/sched: Convert drm scheduler to use a work queue rather than
>>> kthread
>>> drm/sched: Move schedule policy to scheduler / entity
>>> drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
>>> drm/sched: Add generic scheduler message interface
>>> drm/sched: Start run wq before TDR in drm_sched_start
>>> drm/sched: Submit job before starting TDR
>>> drm/sched: Add helper to set TDR timeout
>>> drm/syncobj: Warn on long running dma-fences
>>>
>>> Thomas Hellström (2):
>>> dma-buf/dma-fence: Introduce long-running completion fences
>>> drm/sched: Support long-running sched entities
>>>
>>> drivers/dma-buf/dma-fence.c | 142 +++++++---
>>> drivers/dma-buf/dma-resv.c | 5 +
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 14 +-
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +-
>>> drivers/gpu/drm/drm_syncobj.c | 5 +-
>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 5 +-
>>> drivers/gpu/drm/lima/lima_sched.c | 5 +-
>>> drivers/gpu/drm/msm/adreno/adreno_device.c | 6 +-
>>> drivers/gpu/drm/msm/msm_ringbuffer.c | 5 +-
>>> drivers/gpu/drm/panfrost/panfrost_job.c | 5 +-
>>> drivers/gpu/drm/scheduler/sched_entity.c | 127 +++++++--
>>> drivers/gpu/drm/scheduler/sched_fence.c | 6 +-
>>> drivers/gpu/drm/scheduler/sched_main.c | 278 +++++++++++++++-----
>>> drivers/gpu/drm/v3d/v3d_sched.c | 25 +-
>>> include/drm/gpu_scheduler.h | 130 +++++++--
>>> include/linux/dma-fence.h | 60 ++++-
>>> 16 files changed, 649 insertions(+), 184 deletions(-)
>>>
More information about the dri-devel
mailing list