[RFC 0/4] DRM scheduler fixes, or not, or incorrect kind

Philipp Stanner pstanner at redhat.com
Mon Sep 9 08:47:35 UTC 2024


Hi,

On Fri, 2024-09-06 at 19:06 +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> 
> In a recent conversation with Christian there was a thought that
> drm_sched_entity_modify_sched() should start using the entity-
> >rq_lock to be
> safe against job submission and simultaneous priority changes.

There are also FIXMEs in gpu_scheduler.h that might be related.

> 
> The kerneldoc accompanying that function however is a bit unclear to
> me. For
> instance is amdgpu simply doing it wrongly by not serializing the two
> in the
> driver? Or is the comment referring to some other race condition than
> which is
> of concern in this series?
> 
> To cut the long story short, first three patches try to fix this race
> in three
> places I *think* can manifest in different ways.
> 
> Last patch is a trivial optimisation I spotted can be easily done.

I took a look and at least to me it doesn't appear to be that trivial,
mostly because it takes two locks.

Would you mind branching that out as a separate patch so that the
series would 100% address bugs?

P.	

> 
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Alex Deucher <alexander.deucher at amd.com>
> Cc: Luben Tuikov <ltuikov89 at gmail.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> 
> Tvrtko Ursulin (4):
>   drm/sched: Add locking to drm_sched_entity_modify_sched
>   drm/sched: Always wake up correct scheduler in
>     drm_sched_entity_push_job
>   drm/sched: Always increment correct scheduler score
>   drm/sched: Optimise drm_sched_entity_push_job
> 
>  drivers/gpu/drm/scheduler/sched_entity.c | 17 ++++++++++++-----
>  drivers/gpu/drm/scheduler/sched_main.c   | 21 ++++++++++++++-------
>  include/drm/gpu_scheduler.h              |  1 +
>  3 files changed, 27 insertions(+), 12 deletions(-)
> 



More information about the dri-devel mailing list