[Intel-xe] [PATCH v2 1/9] drm/sched: Convert drm scheduler to use a work queue rather than kthread
Danilo Krummrich
dakr at redhat.com
Wed Aug 16 11:30:50 UTC 2023
Hi Matt,
On 8/11/23 04:31, Matthew Brost wrote:
> In XE, the new Intel GPU driver, a choice has made to have a 1 to 1
> mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
> seems a bit odd but let us explain the reasoning below.
>
> 1. In XE the submission order from multiple drm_sched_entity is not
> guaranteed to be the same completion even if targeting the same hardware
> engine. This is because in XE we have a firmware scheduler, the GuC,
> which allowed to reorder, timeslice, and preempt submissions. If a using
> shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
> apart as the TDR expects submission order == completion order. Using a
> dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.
>
> 2. In XE submissions are done via programming a ring buffer (circular
> buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
> limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
> control on the ring for free.
In XE, where does the limitation of MAX_SIZE_PER_JOB come from?
In Nouveau we currently do have such a limitation as well, but it is
derived from the RING_SIZE, hence RING_SIZE / MAX_SIZE_PER_JOB would
always be 1. However, I think most jobs won't actually utilize the whole
ring.
Given that, it seems like it would be better to let the scheduler keep
track of empty ring "slots" instead, such that the scheduler can deceide
whether a subsequent job will still fit on the ring and if not
re-evaluate once a previous job finished. Of course each submitted job
would be required to carry the number of slots it requires on the ring.
What to you think of implementing this as alternative flow control
mechanism? Implementation wise this could be a union with the existing
hw_submission_limit.
- Danilo
>
> A problem with this design is currently a drm_gpu_scheduler uses a
> kthread for submission / job cleanup. This doesn't scale if a large
> number of drm_gpu_scheduler are used. To work around the scaling issue,
> use a worker rather than kthread for submission / job cleanup.
>
> v2:
> - (Rob Clark) Fix msm build
> - Pass in run work queue
> v3:
> - (Boris) don't have loop in worker
> v4:
> - (Tvrtko) break out submit ready, stop, start helpers into own patch
>
> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
More information about the Intel-xe
mailing list