[PATCH v2] drm/xe: Use RCU free for exec queue, not GuC exec queue
Matthew Brost
matthew.brost at intel.com
Mon Jun 16 09:53:52 UTC 2025
On Mon, Jun 16, 2025 at 09:35:14AM +0100, Tvrtko Ursulin wrote:
>
> On 16/06/2025 08:27, Matthew Brost wrote:
> > The exec queue name is used in dma-fence timelines, not the GuC exec queue.
> > Therefore, the exec queue itself requires RCU freeing rather than the
> > GuC exec queue.
> >
> > Fixes: 6bd90e700b42 ("drm/xe: Make dma-fences compliant with the safe access rules")
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> > Cc: stable at vger.kernel.org
> > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_exec_queue.c | 7 ++++++-
> > drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 ++
> > drivers/gpu/drm/xe/xe_guc_exec_queue_types.h | 2 --
> > drivers/gpu/drm/xe/xe_guc_submit.c | 7 +------
> > 4 files changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> > index fee22358cc09..d8c3e983ffb6 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > @@ -47,7 +47,11 @@ static void __xe_exec_queue_free(struct xe_exec_queue *q)
> > if (q->xef)
> > xe_file_put(q->xef);
> > - kfree(q);
> > + /*
> > + * RCU free due sched being exported via DRM scheduler fences
> > + * (timeline name).
> > + */
> > + kfree_rcu(q, rcu);
> > }
> > static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
> > @@ -82,6 +86,7 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
> > INIT_LIST_HEAD(&q->multi_gt_link);
> > INIT_LIST_HEAD(&q->hw_engine_group_link);
> > INIT_LIST_HEAD(&q->pxp.link);
> > + init_rcu_head(&q->rcu);
> > q->sched_props.timeslice_us = hwe->eclass->sched_props.timeslice_us;
> > q->sched_props.preempt_timeout_us =
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > index cc1cffb5c87f..a17fe852fc0d 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > @@ -43,6 +43,8 @@ struct xe_exec_queue {
> > /** @gt: GT structure this exec queue can submit to */
> > struct xe_gt *gt;
> > + /** @rcu: For safe freeing of exported dma fences */
> > + struct rcu_head rcu;
> > /**
> > * @hwe: A hardware of the same class. May (physical engine) or may not
> > * (virtual engine) be where jobs actual engine up running. Should never
> > diff --git a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
> > index a3f421e2adc0..4c39f01e4f52 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
> > @@ -20,8 +20,6 @@ struct xe_exec_queue;
> > struct xe_guc_exec_queue {
> > /** @q: Backpointer to parent xe_exec_queue */
> > struct xe_exec_queue *q;
> > - /** @rcu: For safe freeing of exported dma fences */
> > - struct rcu_head rcu;
> > /** @sched: GPU scheduler for this xe_exec_queue */
> > struct xe_gpu_scheduler sched;
> > /** @entity: Scheduler entity for this xe_exec_queue */
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index df7a5a4eec74..7170e78e5b8e 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -1299,11 +1299,7 @@ static void __guc_exec_queue_fini_async(struct work_struct *w)
> > xe_sched_entity_fini(&ge->entity);
> > xe_sched_fini(&ge->sched);
> > - /*
> > - * RCU free due sched being exported via DRM scheduler fences
> > - * (timeline name).
> > - */
> > - kfree_rcu(ge, rcu);
> > + kfree(ge);
> > xe_exec_queue_fini(q);
>
> Hmm but before it gets to the name it needs to dereference the drm sched,
> and drm sched is embedded in the guc queue.
>
> So as soon as kfree(ge) runs here, drm_sched_fence_get_timeline_name() can
> oops on fence->sched dereference.
>
> Name is then one derefence further - fence->sched->name. Which indeed points
> to xe_exec_queue, but because that one is destroyed second, kfree_rcu on the
> guc queue adds the appropriate guarantee of one RCU period before signaling
> and ability to enter fence->get_timeline_name().
>
> In summary how I had it still looks correct to me. Does it explode somehow
Yea, I think you are right. No exploding. I was working on something
else and misconstrued what the RCU was protecting - the scheduler not
the name.
Matt
> or what prompted you to think it was wrong?
>
> Regards,
>
> Tvrtko
>
> > xe_pm_runtime_put(guc_to_xe(guc));
> > }
> > @@ -1486,7 +1482,6 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
> > q->guc = ge;
> > ge->q = q;
> > - init_rcu_head(&ge->rcu);
> > init_waitqueue_head(&ge->suspend_wait);
> > for (i = 0; i < MAX_STATIC_MSG_TYPE; ++i)
>
More information about the Intel-xe
mailing list