[Nouveau] [PATCH drm-misc-next] drm/nouveau: sched: avoid job races between entities

Faith Ekstrand faith at gfxstrand.net
Fri Aug 11 17:53:12 UTC 2023


On Thu, Aug 10, 2023 at 8:06 PM Danilo Krummrich <dakr at redhat.com> wrote:

> If a sched job depends on a dma-fence from a job from the same GPU
> scheduler instance, but a different scheduler entity, the GPU scheduler
> does only wait for the particular job to be scheduled, rather than for
> the job to fully complete. This is due to the GPU scheduler assuming
> that there is a scheduler instance per ring. However, the current
> implementation, in order to avoid arbitrary amounts of kthreads, has a
> single scheduler instance while scheduler entities represent rings.
>
> As a workaround, set the DRM_SCHED_FENCE_DONT_PIPELINE for all
> out-fences in order to force the scheduler to wait for full job
> completion for dependent jobs from different entities and same scheduler
> instance.
>
> There is some work in progress [1] to address the issues of firmware
> schedulers; once it is in-tree the scheduler topology in Nouveau should
> be re-worked accordingly.
>
> [1]
> https://lore.kernel.org/dri-devel/20230801205103.627779-1-matthew.brost@intel.com/
>
> Signed-off-by: Danilo Krummrich <dakr at redhat.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_sched.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c
> b/drivers/gpu/drm/nouveau/nouveau_sched.c
> index 3424a1bf6af3..88217185e0f3 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_sched.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
> @@ -292,6 +292,28 @@ nouveau_job_submit(struct nouveau_job *job)
>         if (job->sync)
>                 done_fence = dma_fence_get(job->done_fence);
>
> +       /* If a sched job depends on a dma-fence from a job from the same
> GPU
> +        * scheduler instance, but a different scheduler entity, the GPU
> +        * scheduler does only wait for the particular job to be scheduled,
>

s/does only wait/only waits/

Reviewed-by: Faith Ekstrand <faith.ekstrand at collaboralcom>

+        * rather than for the job to fully complete. This is due to the GPU
> +        * scheduler assuming that there is a scheduler instance per ring.
> +        * However, the current implementation, in order to avoid arbitrary
> +        * amounts of kthreads, has a single scheduler instance while
> scheduler
> +        * entities represent rings.
> +        *
> +        * As a workaround, set the DRM_SCHED_FENCE_DONT_PIPELINE for all
> +        * out-fences in order to force the scheduler to wait for full job
> +        * completion for dependent jobs from different entities and same
> +        * scheduler instance.
> +        *
> +        * There is some work in progress [1] to address the issues of
> firmware
> +        * schedulers; once it is in-tree the scheduler topology in Nouveau
> +        * should be re-worked accordingly.
> +        *
> +        * [1]
> https://lore.kernel.org/dri-devel/20230801205103.627779-1-matthew.brost@intel.com/
> +        */
> +       set_bit(DRM_SCHED_FENCE_DONT_PIPELINE, &job->done_fence->flags);
> +
>         if (job->ops->armed_submit)
>                 job->ops->armed_submit(job);
>
>
> base-commit: 68132cc6d1bcbc78ade524c6c6c226de42139f0e
> --
> 2.41.0
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20230811/b95cc9da/attachment.htm>


More information about the Nouveau mailing list