[PATCH v3] drm/sched: Call drm_sched_fence_set_parent() from drm_sched_fence_scheduled()
Luben Tuikov
luben.tuikov at amd.com
Fri Jun 23 18:37:57 UTC 2023
On 2023-06-23 04:03, Boris Brezillon wrote:
> On Fri, 23 Jun 2023 09:52:04 +0200
> Boris Brezillon <boris.brezillon at collabora.com> wrote:
>
>> Drivers that can delegate waits to the firmware/GPU pass the scheduled
>> fence to drm_sched_job_add_dependency(), and issue wait commands to
>> the firmware/GPU at job submission time. For this to be possible, they
>> need all their 'native' dependencies to have a valid parent since this
>> is where the actual HW fence information are encoded.
>>
>> In drm_sched_main(), we currently call drm_sched_fence_set_parent()
>> after drm_sched_fence_scheduled(), leaving a short period of time
>> during which the job depending on this fence can be submitted.
>>
>> Since setting parent and signaling the fence are two things that are
>> kinda related (you can't have a parent if the job hasn't been scheduled),
>> it probably makes sense to pass the parent fence to
>> drm_sched_fence_scheduled() and let it call drm_sched_fence_set_parent()
>> before it signals the scheduled fence.
>>
>> Here is a detailed description of the race we are fixing here:
>>
>> Thread A Thread B
>>
>> - calls drm_sched_fence_scheduled()
>> - signals s_fence->scheduled which
>> wakes up thread B
>>
>> - entity dep signaled, checking
>> the next dep
>> - no more deps waiting
>> - entity is picked for job
>> submission by drm_gpu_scheduler
>> - run_job() is called
>> - run_job() tries to
>> collect native fence info from
>> s_fence->parent, but it's
>> NULL =>
>> BOOM, we can't do our native
>> wait
>>
>> - calls drm_sched_fence_set_parent()
>>
>> v2:
>> * Fix commit message
>>
>> v3:
>> * Add a detailed description of the race to the commit message
>> * Add Luben's R-b
>>
>
> FYI, I didn't put a Fixes tag because the various moves/modifications
> that happened on this file will make it hard to backport anyway, and no
> one complained about it so far. But if we want to have one, it would
> probably be:
>
> Fixes: 754ce0fa55c4 ("drm/amd: add parent for sched fence")
>
I agree with your assessment--the race fix doesn't seem to be pointing to
or introduced by one particular change. Plus that fixes change is from 2016...
So, we're good to go as is.
--
Regards,
Luben
More information about the dri-devel
mailing list