[RFC 0/4] DRM scheduler fixes, or not, or incorrect kind

Tvrtko Ursulin tursulin at igalia.com
Fri Sep 6 18:06:14 UTC 2024


From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>

In a recent conversation with Christian there was a thought that
drm_sched_entity_modify_sched() should start using the entity->rq_lock to be
safe against job submission and simultaneous priority changes.

The kerneldoc accompanying that function however is a bit unclear to me. For
instance is amdgpu simply doing it wrongly by not serializing the two in the
driver? Or is the comment referring to some other race condition than which is
of concern in this series?

To cut the long story short, first three patches try to fix this race in three
places I *think* can manifest in different ways.

Last patch is a trivial optimisation I spotted can be easily done.

Cc: Christian König <christian.koenig at amd.com>
Cc: Alex Deucher <alexander.deucher at amd.com>
Cc: Luben Tuikov <ltuikov89 at gmail.com>
Cc: Matthew Brost <matthew.brost at intel.com>

Tvrtko Ursulin (4):
  drm/sched: Add locking to drm_sched_entity_modify_sched
  drm/sched: Always wake up correct scheduler in
    drm_sched_entity_push_job
  drm/sched: Always increment correct scheduler score
  drm/sched: Optimise drm_sched_entity_push_job

 drivers/gpu/drm/scheduler/sched_entity.c | 17 ++++++++++++-----
 drivers/gpu/drm/scheduler/sched_main.c   | 21 ++++++++++++++-------
 include/drm/gpu_scheduler.h              |  1 +
 3 files changed, 27 insertions(+), 12 deletions(-)

-- 
2.46.0



More information about the amd-gfx mailing list