[PATCH v2] drm/xe: Use RCU free for exec queue, not GuC exec queue

Matthew Brost matthew.brost at intel.com
Mon Jun 16 09:53:52 UTC 2025


On Mon, Jun 16, 2025 at 09:35:14AM +0100, Tvrtko Ursulin wrote:
> 
> On 16/06/2025 08:27, Matthew Brost wrote:
> > The exec queue name is used in dma-fence timelines, not the GuC exec queue.
> > Therefore, the exec queue itself requires RCU freeing rather than the
> > GuC exec queue.
> > 
> > Fixes: 6bd90e700b42 ("drm/xe: Make dma-fences compliant with the safe access rules")
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> > Cc: stable at vger.kernel.org
> > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_exec_queue.c           | 7 ++++++-
> >   drivers/gpu/drm/xe/xe_exec_queue_types.h     | 2 ++
> >   drivers/gpu/drm/xe/xe_guc_exec_queue_types.h | 2 --
> >   drivers/gpu/drm/xe/xe_guc_submit.c           | 7 +------
> >   4 files changed, 9 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> > index fee22358cc09..d8c3e983ffb6 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > @@ -47,7 +47,11 @@ static void __xe_exec_queue_free(struct xe_exec_queue *q)
> >   	if (q->xef)
> >   		xe_file_put(q->xef);
> > -	kfree(q);
> > +	/*
> > +	 * RCU free due sched being exported via DRM scheduler fences
> > +	 * (timeline name).
> > +	 */
> > +	kfree_rcu(q, rcu);
> >   }
> >   static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
> > @@ -82,6 +86,7 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
> >   	INIT_LIST_HEAD(&q->multi_gt_link);
> >   	INIT_LIST_HEAD(&q->hw_engine_group_link);
> >   	INIT_LIST_HEAD(&q->pxp.link);
> > +	init_rcu_head(&q->rcu);
> >   	q->sched_props.timeslice_us = hwe->eclass->sched_props.timeslice_us;
> >   	q->sched_props.preempt_timeout_us =
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > index cc1cffb5c87f..a17fe852fc0d 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > @@ -43,6 +43,8 @@ struct xe_exec_queue {
> >   	/** @gt: GT structure this exec queue can submit to */
> >   	struct xe_gt *gt;
> > +	/** @rcu: For safe freeing of exported dma fences */
> > +	struct rcu_head rcu;
> >   	/**
> >   	 * @hwe: A hardware of the same class. May (physical engine) or may not
> >   	 * (virtual engine) be where jobs actual engine up running. Should never
> > diff --git a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
> > index a3f421e2adc0..4c39f01e4f52 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
> > +++ b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h
> > @@ -20,8 +20,6 @@ struct xe_exec_queue;
> >   struct xe_guc_exec_queue {
> >   	/** @q: Backpointer to parent xe_exec_queue */
> >   	struct xe_exec_queue *q;
> > -	/** @rcu: For safe freeing of exported dma fences */
> > -	struct rcu_head rcu;
> >   	/** @sched: GPU scheduler for this xe_exec_queue */
> >   	struct xe_gpu_scheduler sched;
> >   	/** @entity: Scheduler entity for this xe_exec_queue */
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index df7a5a4eec74..7170e78e5b8e 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -1299,11 +1299,7 @@ static void __guc_exec_queue_fini_async(struct work_struct *w)
> >   	xe_sched_entity_fini(&ge->entity);
> >   	xe_sched_fini(&ge->sched);
> > -	/*
> > -	 * RCU free due sched being exported via DRM scheduler fences
> > -	 * (timeline name).
> > -	 */
> > -	kfree_rcu(ge, rcu);
> > +	kfree(ge);
> >   	xe_exec_queue_fini(q);
> 
> Hmm but before it gets to the name it needs to dereference the drm sched,
> and drm sched is embedded in the guc queue.
> 
> So as soon as kfree(ge) runs here, drm_sched_fence_get_timeline_name() can
> oops on fence->sched dereference.
> 
> Name is then one derefence further - fence->sched->name. Which indeed points
> to xe_exec_queue, but because that one is destroyed second, kfree_rcu on the
> guc queue adds the appropriate guarantee of one RCU period before signaling
> and ability to enter fence->get_timeline_name().
> 
> In summary how I had it still looks correct to me. Does it explode somehow

Yea, I think you are right. No exploding. I was working on something
else and misconstrued what the RCU was protecting - the scheduler not
the name.

Matt

> or what prompted you to think it was wrong?
> 
> Regards,
> 
> Tvrtko
> 
> >   	xe_pm_runtime_put(guc_to_xe(guc));
> >   }
> > @@ -1486,7 +1482,6 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
> >   	q->guc = ge;
> >   	ge->q = q;
> > -	init_rcu_head(&ge->rcu);
> >   	init_waitqueue_head(&ge->suspend_wait);
> >   	for (i = 0; i < MAX_STATIC_MSG_TYPE; ++i)
> 


More information about the Intel-xe mailing list