[Intel-xe] [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans

Tue Apr 4 17:29:25 UTC 2023

On 04/04/2023 14:52, Matthew Brost wrote:
> On Tue, Apr 04, 2023 at 10:43:03AM +0100, Tvrtko Ursulin wrote:
>>
>> On 04/04/2023 01:22, Matthew Brost wrote:
>>> Hello,
>>>
>>> As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we
>>> have been asked to merge our common DRM scheduler patches first as well
>>> as develop a common solution for long running workloads with the DRM
>>> scheduler. This RFC series is our first attempt at doing this. We
>>> welcome any and all feedback.
>>>
>>> This can we thought of as 4 parts detailed below.
>>>
>>> - DRM scheduler changes for 1 to 1 relationship between scheduler and
>>> entity (patches 1-3)
>>>
>>> In Xe all of the scheduling of jobs is done by a firmware scheduler (the
>>> GuC) which is a new paradigm WRT to the DRM scheduler and presents
>>> severals problems as the DRM was originally designed to schedule jobs on
>>> hardware queues. The main problem being that DRM scheduler expects the
>>> submission order of jobs to be the completion order of jobs even across
>>> multiple entities. This assumption falls apart with a firmware scheduler
>>> as a firmware scheduler has no concept of jobs and jobs can complete out
>>> of order. A novel solution for was originally thought of by Faith during
>>> the initial prototype of Xe, create a 1 to 1 relationship between scheduler
>>> and entity. I believe the AGX driver [3] is using this approach and
>>> Boris may use approach as well for the Mali driver [4].
>>>
>>> To support a 1 to 1 relationship we move the main execution function
>>> from a kthread to a work queue and add a new scheduling mode which
>>> bypasses code in the DRM which isn't needed in a 1 to 1 relationship.
>>> The new scheduling mode should unify all drivers usage with a 1 to 1
>>> relationship and can be thought of as using scheduler as a dependency /
>>> infligt job tracker rather than a true scheduler.
>>
>> Once you add capability for a more proper 1:1 via
>> DRM_SCHED_POLICY_SINGLE_ENTITY, do you still have further need to replace
>> kthreads with a wq?
>>
>> Or in other words, what purpose does the offloading of a job picking code to
>> a separate execution context serve? Could it be done directly in the 1:1
>> mode and leave kthread setup for N:M?
>>
> 
> Addressed the other two on my reply to Christian...
> 
> For this one basically the concept of a single entity point IMO is a
> very good concept which I'd like to keep. But most important reason
> being the main execution thread (now worker) is kicked when a dependency
> for a job is resolved, dependencies are dma-fences signaled via a
> callback, and these call backs can be signaled in IRQ contexts. We
> absolutely do not want to enter the backend in an IRQ context for a
> variety of reasons.

Sounds like a fair enough requirement but if drivers will not be 
comfortable with the wq conversion, it is probably possible to introduce 
some vfuncs for the 1:1 case which would allow scheduler users override 
the scheduler wakeup and select a special "pick one job" path. That 
could allow 1:1 users do their thing, leaving rest as is. I mean you 
already have the special single entity scheduler, you'd just need to add 
some more specialization on the init, wake up, etc paths.

And I will mention once more that I find a wq item with a loop such as:

	while (!READ_ONCE(sched->pause_run_wq)) {
	...

A bit dodgy. If you piggy back on any system_wq it smells of system wide 
starvation so for me any proposal with an option to use a system shared 
wq is a no go.

Regards,

Tvrtko

>> Apart from those design level questions, low level open IMO still is that
>> default fallback of using the system_wq has the potential to affect latency
>> for other drivers. But that's for those driver owners to approve.
>>
>> Regards,
>>
>> Tvrtko
>>
>>> - Generic messaging interface for DRM scheduler
>>>
>>> Idea is to be able to communicate to the submission backend with in band
>>> (relative to main execution function) messages. Messages are backend
>>> defined and flexable enough for any use case. In Xe we use these
>>> messages to clean up entites, set properties for entites, and suspend /
>>> resume execution of an entity [5]. I suspect other driver can leverage
>>> this messaging concept too as it a convenient way to avoid races in the
>>> backend.
>>>
>>> - Support for using TDR for all error paths of a scheduler / entity
>>>
>>> Fix a few races / bugs, add function to dynamically set the TDR timeout.
>>>
>>> - Annotate dma-fences for long running workloads.
>>>
>>> The idea here is to use dma-fences only as sync points within the
>>> scheduler and never export them for long running workloads. By
>>> annotating these fences as long running we ensure that these dma-fences
>>> are never used in a way that breaks the dma-fence rules. A benefit of
>>> thus approach is the scheduler can still safely flow control the
>>> execution ring buffer via the job limit without breaking the dma-fence
>>> rules.
>>>
>>> Again this a first draft and looking forward to feedback.
>>>
>>> Enjoy - Matt
>>>
>>> [1] https://gitlab.freedesktop.org/drm/xe/kernel
>>> [2] https://patchwork.freedesktop.org/series/112188/
>>> [3] https://patchwork.freedesktop.org/series/114772/
>>> [4] https://patchwork.freedesktop.org/patch/515854/?series=112188&rev=1
>>> [5] https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_guc_submit.c#L1031
>>>
>>> Matthew Brost (8):
>>>     drm/sched: Convert drm scheduler to use a work queue rather than
>>>       kthread
>>>     drm/sched: Move schedule policy to scheduler / entity
>>>     drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
>>>     drm/sched: Add generic scheduler message interface
>>>     drm/sched: Start run wq before TDR in drm_sched_start
>>>     drm/sched: Submit job before starting TDR
>>>     drm/sched: Add helper to set TDR timeout
>>>     drm/syncobj: Warn on long running dma-fences
>>>
>>> Thomas Hellström (2):
>>>     dma-buf/dma-fence: Introduce long-running completion fences
>>>     drm/sched: Support long-running sched entities
>>>
>>>    drivers/dma-buf/dma-fence.c                 | 142 +++++++---
>>>    drivers/dma-buf/dma-resv.c                  |   5 +
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  14 +-
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  15 +-
>>>    drivers/gpu/drm/drm_syncobj.c               |   5 +-
>>>    drivers/gpu/drm/etnaviv/etnaviv_sched.c     |   5 +-
>>>    drivers/gpu/drm/lima/lima_sched.c           |   5 +-
>>>    drivers/gpu/drm/msm/adreno/adreno_device.c  |   6 +-
>>>    drivers/gpu/drm/msm/msm_ringbuffer.c        |   5 +-
>>>    drivers/gpu/drm/panfrost/panfrost_job.c     |   5 +-
>>>    drivers/gpu/drm/scheduler/sched_entity.c    | 127 +++++++--
>>>    drivers/gpu/drm/scheduler/sched_fence.c     |   6 +-
>>>    drivers/gpu/drm/scheduler/sched_main.c      | 278 +++++++++++++++-----
>>>    drivers/gpu/drm/v3d/v3d_sched.c             |  25 +-
>>>    include/drm/gpu_scheduler.h                 | 130 +++++++--
>>>    include/linux/dma-fence.h                   |  60 ++++-
>>>    16 files changed, 649 insertions(+), 184 deletions(-)
>>>