[RFC PATCH] drm/sched: Wait for the currently popped dependency in kill_jobs_cb()

Boris Brezillon boris.brezillon at collabora.com
Thu Jun 8 07:05:12 UTC 2023


On Thu,  8 Jun 2023 08:55:51 +0200
Boris Brezillon <boris.brezillon at collabora.com> wrote:

> If I understand correctly, drm_sched_entity_kill_jobs_cb() is supposed
> to wait on all the external dependencies (those added to
> drm_sched_job::dependencies) before signaling the job finished fence.
> This is done this way to prevent jobs depending on these canceled jobs
> from considering the resources they want to access as ready, when
> they're actually still used by other components, thus leading to
> potential data corruptions.
> 
> The problem is, the kill_jobs logic is omitting the last fence popped
> from the dependencies array that was waited upon before
> drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
> so we're basically waiting for all dependencies except one.
> 
> This is an attempt at fixing that, but also an opportunity to make sure
> I understood the drm_sched_entity_kill(), because I'm not 100% sure if

               ^ the drm_sched_entity_kill() logic correctly, ...

> skipping this currently popped dependency was intentional or not. I can't
> see a good reason why we'd want to skip that one, but I might be missing
> something.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon at collabora.com>
> Cc: Frank Binns <frank.binns at imgtec.com>
> Cc: Sarah Walker <sarah.walker at imgtec.com>
> Cc: Donald Robson <donald.robson at imgtec.com>
> Cc: Luben Tuikov <luben.tuikov at amd.com>
> Cc: David Airlie <airlied at gmail.com>
> Cc: Daniel Vetter <daniel at ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal at linaro.org>
> Cc: "Christian König" <christian.koenig at amd.com>
> ---
> Stumbled across this issue while working on native dependency support
> with Donald (which will be posted soon). Flagged as RFC, because I'm
> not sure this is legit, and also not sure we want to fix it this way.
> I tried re-using drm_sched_entity::dependency, but it's a bit of a mess
> because of the asynchronousity of the wait, and the fact we use
> drm_sched_entity::dependency to know if we have a clear_dep()
> callback registered, so we can easily reset it without removing the

                          ^ we can't ...

> callback.


More information about the dri-devel mailing list