[PATCH 1/1] drm/lima: implement the file flush callback

Qiang Yu yuq825 at gmail.com
Thu Apr 10 13:56:42 UTC 2025


On Thu, Apr 10, 2025 at 8:35 PM Christian König
<christian.koenig at amd.com> wrote:
>
> Am 10.04.25 um 11:33 schrieb Qiang Yu:
> > On Tue, Apr 8, 2025 at 11:48 PM Erico Nunes <nunes.erico at gmail.com> wrote:
> >> With this callback implemented, a terminating application will wait for
> >> the sched entity to be flushed out to the hardware and cancel all other
> >> pending jobs before destroying its context.
> > We do flush when file release in lima_ctx_mgr_fini. Why do we wait here
> > in flush? What's the difference?
>
> Waiting for submissions when you release the file descriptor is actually a bad idea since that can prevent SIGKILL from acting immediately. For example the OOM killer absolutely doesn't like that and eventually calls panic().
>
> Flush is called either manually, on process termination or when you send a SIGTERM. This should then wait for any I/O to complete.
>
> The idea is now that you can then still send a SIGKILL to abort waiting for the I/O as well and so get pending GPU operations not submitted to the HW.
>
> The DRM scheduler helps doing that by providing the drm_sched_entity_flush() and drm_sched_entity_fini() functions.
>
> When there is still pending work when drm_sched_entity_fini() is called a callback to kill it is installed and the job just freed instead of executed.
>
So the difference is we can always wait in flush, but better not in
release when SIGKILL?

> >
> >> This prevents applications with multiple contexts from running into a
> >> race condition between running tasks and context destroy when
> >> terminating.
> >>
If implementing flush callback fix the problem, it must not be when SIGKILL.
Could you describe the problem and how this fix solves it? As I don't understand
how the above difference could fix a race bug.

> >> Signed-off-by: Erico Nunes <nunes.erico at gmail.com>
> >> ---
> >>  drivers/gpu/drm/lima/lima_ctx.c | 18 ++++++++++++++++++
> >>  drivers/gpu/drm/lima/lima_ctx.h |  1 +
> >>  drivers/gpu/drm/lima/lima_drv.c | 17 ++++++++++++++++-
> >>  3 files changed, 35 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/lima/lima_ctx.c b/drivers/gpu/drm/lima/lima_ctx.c
> >> index 0e668fc1e0f9..e8fb5788ca69 100644
> >> --- a/drivers/gpu/drm/lima/lima_ctx.c
> >> +++ b/drivers/gpu/drm/lima/lima_ctx.c
> >> @@ -100,3 +100,21 @@ void lima_ctx_mgr_fini(struct lima_ctx_mgr *mgr)
> >>         xa_destroy(&mgr->handles);
> >>         mutex_destroy(&mgr->lock);
> >>  }
> >> +
> >> +long lima_ctx_mgr_flush(struct lima_ctx_mgr *mgr, long timeout)
> >> +{
> >> +       struct lima_ctx *ctx;
> >> +       unsigned long id;
> >> +
> >> +       mutex_lock(&mgr->lock);
> >> +       xa_for_each(&mgr->handles, id, ctx) {
> >> +               for (int i = 0; i < lima_pipe_num; i++) {
> >> +                       struct lima_sched_context *context = &ctx->context[i];
> >> +                       struct drm_sched_entity *entity = &context->base;
> >> +
> >> +                       timeout = drm_sched_entity_flush(entity, timeout);
> >> +               }
> >> +       }
> >> +       mutex_unlock(&mgr->lock);
> >> +       return timeout;
> >> +}
> >> diff --git a/drivers/gpu/drm/lima/lima_ctx.h b/drivers/gpu/drm/lima/lima_ctx.h
> >> index 5b1063ce968b..ff133db6ae4c 100644
> >> --- a/drivers/gpu/drm/lima/lima_ctx.h
> >> +++ b/drivers/gpu/drm/lima/lima_ctx.h
> >> @@ -30,5 +30,6 @@ struct lima_ctx *lima_ctx_get(struct lima_ctx_mgr *mgr, u32 id);
> >>  void lima_ctx_put(struct lima_ctx *ctx);
> >>  void lima_ctx_mgr_init(struct lima_ctx_mgr *mgr);
> >>  void lima_ctx_mgr_fini(struct lima_ctx_mgr *mgr);
> >> +long lima_ctx_mgr_flush(struct lima_ctx_mgr *mgr, long timeout);
> >>
> >>  #endif
> >> diff --git a/drivers/gpu/drm/lima/lima_drv.c b/drivers/gpu/drm/lima/lima_drv.c
> >> index 11ace5cebf4c..08169b0d9c28 100644
> >> --- a/drivers/gpu/drm/lima/lima_drv.c
> >> +++ b/drivers/gpu/drm/lima/lima_drv.c
> >> @@ -254,7 +254,22 @@ static const struct drm_ioctl_desc lima_drm_driver_ioctls[] = {
> >>         DRM_IOCTL_DEF_DRV(LIMA_CTX_FREE, lima_ioctl_ctx_free, DRM_RENDER_ALLOW),
> >>  };
> >>
> >> -DEFINE_DRM_GEM_FOPS(lima_drm_driver_fops);
> >> +static int lima_drm_driver_flush(struct file *filp, fl_owner_t id)
> >> +{
> >> +       struct drm_file *file = filp->private_data;
> >> +       struct lima_drm_priv *priv = file->driver_priv;
> >> +       long timeout = MAX_WAIT_SCHED_ENTITY_Q_EMPTY;
> >> +
> >> +       timeout = lima_ctx_mgr_flush(&priv->ctx_mgr, timeout);
> >> +
> >> +       return timeout >= 0 ? 0 : timeout;
> >> +}
> >> +
> >> +static const struct file_operations lima_drm_driver_fops = {
> >> +       .owner = THIS_MODULE,
> >> +       .flush = lima_drm_driver_flush,
> >> +       DRM_GEM_FOPS,
> >> +};
> >>
> >>  /*
> >>   * Changelog:
> >> --
> >> 2.49.0
> >>
>


More information about the dri-devel mailing list