[PATCH v6] drm/sched: Make sure we wait for all dependencies in kill_jobs_cb()
Luben Tuikov
luben.tuikov at amd.com
Wed Jun 21 13:56:40 UTC 2023
On 2023-06-19 03:19, Boris Brezillon wrote:
> drm_sched_entity_kill_jobs_cb() logic is omitting the last fence popped
> from the dependency array that was waited upon before
> drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
> so we're basically waiting for all dependencies except one.
>
> In theory, this wait shouldn't be needed because resources should have
> their users registered to the dma_resv object, thus guaranteeing that
> future jobs wanting to access these resources wait on all the previous
> users (depending on the access type, of course). But we want to keep
> these explicit waits in the kill entity path just in case.
>
> Let's make sure we keep all dependencies in the array in
> drm_sched_job_dependency(), so we can iterate over the array and wait
> in drm_sched_entity_kill_jobs_cb().
>
> We also make sure we wait on drm_sched_fence::finished if we were
> originally asked to wait on drm_sched_fence::scheduled. In that case,
> we assume the intent was to delegate the wait to the firmware/GPU or
> rely on the pipelining done at the entity/scheduler level, but when
> killing jobs, we really want to wait for completion not just scheduling.
>
> v6:
> - Back to v4 implementation
> - Add Christian's R-b
>
> v5:
> - Flag deps on which we should only wait for the scheduled event
> at insertion time
>
> v4:
> - Fix commit message
> - Fix a use-after-free bug
>
> v3:
> - Always wait for drm_sched_fence::finished fences in
> drm_sched_entity_kill_jobs_cb() when we see a sched_fence
>
> v2:
> - Don't evict deps in drm_sched_job_dependency()
Hmm, why is this in reverse chronological order?
It's very confusing.
>
> Signed-off-by: Boris Brezillon <boris.brezillon at collabora.com>
> Suggested-by: "Christian König" <christian.koenig at amd.com>
> Reviewed-by: "Christian König" <christian.koenig at amd.com>
These three lines would usually come after the CCs.
Regards,
Luben
> Cc: Frank Binns <frank.binns at imgtec.com>
> Cc: Sarah Walker <sarah.walker at imgtec.com>
> Cc: Donald Robson <donald.robson at imgtec.com>
> Cc: Luben Tuikov <luben.tuikov at amd.com>
> Cc: David Airlie <airlied at gmail.com>
> Cc: Daniel Vetter <daniel at ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal at linaro.org>
> Cc: "Christian König" <christian.koenig at amd.com>
> ---
> drivers/gpu/drm/scheduler/sched_entity.c | 41 +++++++++++++++++++-----
> 1 file changed, 33 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 68e807ae136a..ec41d82d0141 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -176,16 +176,32 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
> {
> struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
> finish_cb);
> - int r;
> + unsigned long index;
>
> dma_fence_put(f);
>
> /* Wait for all dependencies to avoid data corruptions */
> - while (!xa_empty(&job->dependencies)) {
> - f = xa_erase(&job->dependencies, job->last_dependency++);
> - r = dma_fence_add_callback(f, &job->finish_cb,
> - drm_sched_entity_kill_jobs_cb);
> - if (!r)
> + xa_for_each(&job->dependencies, index, f) {
> + struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
> +
> + if (s_fence && f == &s_fence->scheduled) {
> + /* The dependencies array had a reference on the scheduled
> + * fence, and the finished fence refcount might have
> + * dropped to zero. Use dma_fence_get_rcu() so we get
> + * a NULL fence in that case.
> + */
> + f = dma_fence_get_rcu(&s_fence->finished);
> +
> + /* Now that we have a reference on the finished fence,
> + * we can release the reference the dependencies array
> + * had on the scheduled fence.
> + */
> + dma_fence_put(&s_fence->scheduled);
> + }
> +
> + xa_erase(&job->dependencies, index);
> + if (f && !dma_fence_add_callback(f, &job->finish_cb,
> + drm_sched_entity_kill_jobs_cb))
> return;
>
> dma_fence_put(f);
> @@ -415,8 +431,17 @@ static struct dma_fence *
> drm_sched_job_dependency(struct drm_sched_job *job,
> struct drm_sched_entity *entity)
> {
> - if (!xa_empty(&job->dependencies))
> - return xa_erase(&job->dependencies, job->last_dependency++);
> + struct dma_fence *f;
> +
> + /* We keep the fence around, so we can iterate over all dependencies
> + * in drm_sched_entity_kill_jobs_cb() to ensure all deps are signaled
> + * before killing the job.
> + */
> + f = xa_load(&job->dependencies, job->last_dependency);
> + if (f) {
> + job->last_dependency++;
> + return dma_fence_get(f);
> + }
>
> if (job->sched->ops->prepare_job)
> return job->sched->ops->prepare_job(job, entity);
More information about the dri-devel
mailing list