[PATCH] drm/xe: Fix missing runtime outer protection for ggtt_remove_node

Fri May 31 16:36:10 UTC 2024

On Fri, May 31, 2024 at 12:31:34PM -0400, Rodrigo Vivi wrote:
> On Fri, May 31, 2024 at 04:15:31PM +0000, Matthew Brost wrote:
> > On Fri, May 31, 2024 at 12:02:05PM -0400, Rodrigo Vivi wrote:
> > > Defer the ggtt node removal to a thread if runtime_pm is not active.
> > > 
> > > The ggtt node removal can be called from multiple places, including
> > > places where we cannot protect with outer callers and places we are
> > > within other locks. So, try to grab the runtime reference if the
> > > device is already active, otherwise defer the removal to a separate
> > > thread from where we are sure we can wake the device up.
> > > 
> > > Cc: Paulo Zanoni <paulo.r.zanoni at intel.com>
> > > Cc: Francois Dugast <francois.dugast at intel.com>
> > > Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > > Cc: Matthew Brost <matthew.brost at intel.com>
> > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_ggtt.c | 56 ++++++++++++++++++++++++++++++++----
> > >  1 file changed, 51 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > > index b01a670fecb8..d63bf1a744b5 100644
> > > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > > @@ -443,16 +443,14 @@ int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
> > >  	return __xe_ggtt_insert_bo_at(ggtt, bo, 0, U64_MAX);
> > >  }
> > >  
> > > -void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
> > > -			 bool invalidate)
> > > +static void ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
> > > +			     bool invalidate)
> > >  {
> > >  	struct xe_device *xe = tile_to_xe(ggtt->tile);
> > >  	bool bound;
> > >  	int idx;
> > >  
> > >  	bound = drm_dev_enter(&xe->drm, &idx);
> > > -	if (bound)
> > > -		xe_pm_runtime_get_noresume(xe);
> > >  
> > >  	mutex_lock(&ggtt->lock);
> > >  	if (bound)
> > > @@ -467,10 +465,58 @@ void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
> > >  	if (invalidate)
> > >  		xe_ggtt_invalidate(ggtt);
> > >  
> > > -	xe_pm_runtime_put(xe);
> > >  	drm_dev_exit(idx);
> > >  }
> > >  
> > > +struct remove_node_work {
> > > +	struct work_struct work;
> > > +	struct xe_ggtt *ggtt;
> > > +	struct drm_mm_node *node;
> > > +	bool invalidate;
> > > +};
> > > +
> > > +static void ggtt_remove_node_work_func(struct work_struct *work)
> > > +{
> > > +	struct remove_node_work *remove_node = container_of(work, struct remove_node_work, work);
> > > +	struct xe_device *xe = tile_to_xe(remove_node->ggtt->tile);
> > > +
> > > +	xe_pm_runtime_get(xe);
> > > +	ggtt_remove_node(remove_node->ggtt, remove_node->node, remove_node->invalidate);
> > > +	xe_pm_runtime_put(xe);
> > > +
> > > +	kfree(remove_node);
> > > +}
> > > +
> > > +static void ggtt_queue_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
> > > +				   bool invalidate)
> > > +{
> > > +	struct remove_node_work *remove_node;
> > > +
> > > +	remove_node = kmalloc(sizeof(*remove_node), GFP_KERNEL);
> > 
> > Are we sure this code cannot be in an atomic context or in the path of a
> > dma-fence? If either of the former is true, then we cannot allocate
> > memory here. 
> 
> not sure tbh
> 

Me either, we need some deeper thought to answer. Given that is fairly
deep layer I say it is convincible the latter could be true or become
true at some point. The former likely isn't given that we use mutexes
in this path. I'd say best to design this with that in mind.

> > Alternatively, we could use GFP_ATOMIC or preallocate
> > 'remove_node_work' as part of the initial GGTT node allocation. The
> > latter requires a bit more memory, but GGTT allocations are heavyweight
> > objects, and using a bit more memory seems fine to me.
> 
> I had thought about simply going with GFP_ATOMIC.
> 
> The pre-allocation doesn't work. Unless we encapsulate the drm_mm_node
> into a xe_mm_node with the removal info in it.
> 

I was thinking a xe_mm_node subclass.

> > Also if we do the
> > later, maybe just add the node to a list and kick a dedicated work item
> > which process all nodes on the list.
> 
> The list with the single worker also sounds elegant solution here,
> to process all the removals in the same way. But for simplicity,
> if GFP_ATOMIC works I would prefer to go with this that minimizes
> the thread and it has 1:1 work:item.
> 

I'll leave GFP_ATOMIC vs. xe_mm_node subclass to you.

> > 
> > > +	if (!remove_node)
> > > +		return;
> > > +
> > > +	INIT_WORK(&remove_node->work, ggtt_remove_node_work_func);
> > > +	remove_node->ggtt = ggtt;
> > > +	remove_node->node = node;
> > > +	remove_node->invalidate = invalidate;
> > > +
> > > +	queue_work(system_unbound_wq, &remove_node->work);
> > 
> > I think we need to be careful with system wq usage. Recently we have had
> > two bugs [1][2] exposed in 6.9 in which we deadlocked by using system
> > wqs. I think it is likely safer to use a driver dedicated queue here.
> 
> ouch! probably good to create a dedicated wq for xe_ggtt so we don't
> interfeer with anything else.
>

+1 or use an existing Xe wq. Again I'll leave this to you.

Matt

> > 
> > Other than these questions, design of patch (try grabbing a PM, if we
> > can't defer to worker) LGTM.
> > 
> > Matt
> > 
> > [1] https://patchwork.freedesktop.org/series/133210/
> > [2] https://patchwork.freedesktop.org/patch/586095/?series=131904&rev=1
> > 
> > > +}
> > > +
> > > +void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
> > > +			 bool invalidate)
> > > +{
> > > +	struct xe_device *xe = tile_to_xe(ggtt->tile);
> > > +
> > > +	if (xe_pm_runtime_get_if_active(xe)) {
> > > +		ggtt_remove_node(ggtt, node, invalidate);
> > > +		xe_pm_runtime_put(xe);
> > > +	} else {
> > > +		ggtt_queue_remove_node(ggtt, node, invalidate);
> > > +	}
> > > +}
> > > +
> > >  void xe_ggtt_remove_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
> > >  {
> > >  	if (XE_WARN_ON(!bo->ggtt_node.size))
> > > -- 
> > > 2.45.1
> > >