[PATCH v3] drm/sched: Call drm_sched_fence_set_parent() from drm_sched_fence_scheduled()

Luben Tuikov luben.tuikov at amd.com
Fri Jun 23 18:37:57 UTC 2023


On 2023-06-23 04:03, Boris Brezillon wrote:
> On Fri, 23 Jun 2023 09:52:04 +0200
> Boris Brezillon <boris.brezillon at collabora.com> wrote:
> 
>> Drivers that can delegate waits to the firmware/GPU pass the scheduled
>> fence to drm_sched_job_add_dependency(), and issue wait commands to
>> the firmware/GPU at job submission time. For this to be possible, they
>> need all their 'native' dependencies to have a valid parent since this
>> is where the actual HW fence information are encoded.
>>
>> In drm_sched_main(), we currently call drm_sched_fence_set_parent()
>> after drm_sched_fence_scheduled(), leaving a short period of time
>> during which the job depending on this fence can be submitted.
>>
>> Since setting parent and signaling the fence are two things that are
>> kinda related (you can't have a parent if the job hasn't been scheduled),
>> it probably makes sense to pass the parent fence to
>> drm_sched_fence_scheduled() and let it call drm_sched_fence_set_parent()
>> before it signals the scheduled fence.
>>
>> Here is a detailed description of the race we are fixing here:
>>
>> Thread A				Thread B
>>
>> - calls drm_sched_fence_scheduled()
>> - signals s_fence->scheduled which
>>   wakes up thread B
>>
>> 					- entity dep signaled, checking
>> 					  the next dep
>> 					- no more deps waiting
>> 					- entity is picked for job
>> 					  submission by drm_gpu_scheduler
>> 					- run_job() is called
>> 					- run_job() tries to
>> 					  collect native fence info from
>> 					  s_fence->parent, but it's
>> 					  NULL =>
>> 					  BOOM, we can't do our native
>> 					  wait
>>
>> - calls drm_sched_fence_set_parent()
>>
>> v2:
>> * Fix commit message
>>
>> v3:
>> * Add a detailed description of the race to the commit message
>> * Add Luben's R-b
>>
> 
> FYI, I didn't put a Fixes tag because the various moves/modifications
> that happened on this file will make it hard to backport anyway, and no
> one complained about it so far. But if we want to have one, it would
> probably be:
> 
> Fixes: 754ce0fa55c4 ("drm/amd: add parent for sched fence")
> 

I agree with your assessment--the race fix doesn't seem to be pointing to
or introduced by one particular change. Plus that fixes change is from 2016...
So, we're good to go as is.
-- 
Regards,
Luben



More information about the dri-devel mailing list