[PATCH v3] drm/sched: Call drm_sched_fence_set_parent() from drm_sched_fence_scheduled()

Boris Brezillon boris.brezillon at collabora.com
Mon Jun 26 07:47:12 UTC 2023


On Fri, 23 Jun 2023 14:37:57 -0400
Luben Tuikov <luben.tuikov at amd.com> wrote:

> On 2023-06-23 04:03, Boris Brezillon wrote:
> > On Fri, 23 Jun 2023 09:52:04 +0200
> > Boris Brezillon <boris.brezillon at collabora.com> wrote:
> >   
> >> Drivers that can delegate waits to the firmware/GPU pass the scheduled
> >> fence to drm_sched_job_add_dependency(), and issue wait commands to
> >> the firmware/GPU at job submission time. For this to be possible, they
> >> need all their 'native' dependencies to have a valid parent since this
> >> is where the actual HW fence information are encoded.
> >>
> >> In drm_sched_main(), we currently call drm_sched_fence_set_parent()
> >> after drm_sched_fence_scheduled(), leaving a short period of time
> >> during which the job depending on this fence can be submitted.
> >>
> >> Since setting parent and signaling the fence are two things that are
> >> kinda related (you can't have a parent if the job hasn't been scheduled),
> >> it probably makes sense to pass the parent fence to
> >> drm_sched_fence_scheduled() and let it call drm_sched_fence_set_parent()
> >> before it signals the scheduled fence.
> >>
> >> Here is a detailed description of the race we are fixing here:
> >>
> >> Thread A				Thread B
> >>
> >> - calls drm_sched_fence_scheduled()
> >> - signals s_fence->scheduled which
> >>   wakes up thread B
> >>
> >> 					- entity dep signaled, checking
> >> 					  the next dep
> >> 					- no more deps waiting
> >> 					- entity is picked for job
> >> 					  submission by drm_gpu_scheduler
> >> 					- run_job() is called
> >> 					- run_job() tries to
> >> 					  collect native fence info from
> >> 					  s_fence->parent, but it's
> >> 					  NULL =>
> >> 					  BOOM, we can't do our native
> >> 					  wait
> >>
> >> - calls drm_sched_fence_set_parent()
> >>
> >> v2:
> >> * Fix commit message
> >>
> >> v3:
> >> * Add a detailed description of the race to the commit message
> >> * Add Luben's R-b
> >>  
> > 
> > FYI, I didn't put a Fixes tag because the various moves/modifications
> > that happened on this file will make it hard to backport anyway, and no
> > one complained about it so far. But if we want to have one, it would
> > probably be:
> > 
> > Fixes: 754ce0fa55c4 ("drm/amd: add parent for sched fence")
> >   
> 
> I agree with your assessment--the race fix doesn't seem to be pointing to
> or introduced by one particular change. Plus that fixes change is from 2016...
> So, we're good to go as is.

Queued to drm-misc-fixes.



More information about the dri-devel mailing list