[PATCH v2 2/2] drm/xe: Add mutex locking to devcoredump

Fri Nov 22 02:01:56 UTC 2024

On Thu, Nov 21, 2024 at 05:25:10PM -0800, John Harrison wrote:
> On 11/21/2024 15:44, Matthew Brost wrote:
> > On Thu, Nov 21, 2024 at 02:55:42PM -0800, John.C.Harrison at Intel.com wrote:
> > > From: John Harrison <John.C.Harrison at Intel.com>
> > > 
> > > There are now multiple places that can trigger a coredump. Some of
> > > which can happen in parallel. There is already a check against
> > > capturing multiple dumps sequentially, but without locking it doesn't
> > > guarantee to work against concurrent dumps. And if two dumps do happen
> > > in parallel, they can end up doing Bad Things such as one call stack
> > > freeing the data the other call stack is still processing. Which leads
> > > to a crashed kernel.
> > > 
> > > Further, it is possible for the DRM timeout to expire and trigger a
> > > free of the capture while a user is still reading that capture out
> > > through sysfs. Again leading to dodgy pointer problems.
> > > 
> > > So, add a mutext lock around the capture, read and free functions to
> > > prevent inteference.
> > > 
> > > v2: Swap tiny scope spin_lock for larger scope mutex and fix
> > > kernel-doc comment (review feedback from Matthe Brost)
> > > 
> > > Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/xe_devcoredump.c       | 26 +++++++++++++++++++++--
> > >   drivers/gpu/drm/xe/xe_devcoredump_types.h |  4 +++-
> > >   2 files changed, 27 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
> > > index dd48745a8a46..0621754ddfd2 100644
> > > --- a/drivers/gpu/drm/xe/xe_devcoredump.c
> > > +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
> > > @@ -202,21 +202,29 @@ static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
> > >   	if (!coredump)
> > >   		return -ENODEV;
> > > +	mutex_lock(&coredump->lock);
> > > +
> > I'll just explain my reclaim comment in the prior rev here.
> > 
> > 'coredump->lock' is the path of reclaim as it can be called from the TDR
> > which signals dma-fences. This is why most of the devcoredump core uses
> > GFP_ATOMIC to capture smaller state which could lost quickly. We also
> So the reason string allocation in patch #1 should also use GFP_ATOMIC
> rather than GFP_KERNEL?
> 

Yes.

> > have worker, ss->work, which opportunisticly captures larger VM state /w
> > GFP_KERNEL. The worker is not in the path reclaim. Thus you cannot flush
> > the worker under 'coredump->lock' without getting potentail deadlocks.
> > With proper annotations lockdep complain.
> Okay, that makes sense now. Was forgetting the captures are from the TDR /
> dma-fence paths which are reclaim requirements. Doh!
> 
> > 
> > e.g.
> > 
> > We should do this on driver load:
> > 
> > fs_reclaim_acquire();
> > might_lock();
> > fs_reclaim_recalim();
> I assume this should be fs_reclaim_release()?
> 

Yes, typo. Got a little distracted typing this.

> I see three separate instances of a local primelockdep() helper function to
> do this, two which do a might_lock() and one which does an actual
> lock/unlock (plus another which does a lock_map_acquire/release, but I
> assume that is very different). Plus another instance of the construct that
> is just inline with the rest of the init function. The helper versions all
> have a check against CONFIG_LOCKDEP but the unrolled version does not. Seems
> like we should have a generically accessible helper function for this? Maybe

A helper might be a good idea.

> even as a wrapper around drmm_mutex_init itself? Although the xe_ggtt.c and
> xe_migrate.c copies are not using the drmm version of mutex init. Should
> they be?
> 

Yes, all mutexes in Xe likely should use drmm_mutex_init. A prime
reclaim version isn't bad idea either given all drivers in DRM use
dma-fences and likely have mutexes that should be primed with reclaim.

IIRC priming with reclaim was a bit of a hack actually, using
dma_fence_begin_signaling/end is really what we likely want to do but
that annotation had some odd weakness which would give false lockdep
positives. Thomas may have fixed this recently though. If you post a
common drmm function, I think the correct annotation could be sorted out
on dri-devel.

Matt

> John.
> 
> > 
> > Our upper layers should also but may have gaps. Reguardless, priming
> > lockdep is a good practice and self-documenting.
> > 
> > >   	ss = &coredump->snapshot;
> > >   	/* Ensure delayed work is captured before continuing */
> > >   	flush_work(&ss->work);
> > So this is where the mutex should be locked.
> > 
> > > -	if (!ss->read.buffer)
> > > +	if (!ss->read.buffer) {
> > > +		mutex_unlock(&coredump->lock);
> > >   		return -ENODEV;
> > > +	}
> > > -	if (offset >= ss->read.size)
> > > +	if (offset >= ss->read.size) {
> > > +		mutex_unlock(&coredump->lock);
> > >   		return 0;
> > > +	}
> > >   	byte_copied = count < ss->read.size - offset ? count :
> > >   		ss->read.size - offset;
> > >   	memcpy(buffer, ss->read.buffer + offset, byte_copied);
> > > +	mutex_unlock(&coredump->lock);
> > > +
> > >   	return byte_copied;
> > >   }
> > > @@ -228,6 +236,8 @@ static void xe_devcoredump_free(void *data)
> > >   	if (!data || !coredump_to_xe(coredump))
> > >   		return;
> > > +	mutex_lock(&coredump->lock);
> > > +
> > >   	cancel_work_sync(&coredump->snapshot.work);
> > Likewise, lock the mutex here.
> > 
> > Matt
> > 
> > >   	xe_devcoredump_snapshot_free(&coredump->snapshot);
> > > @@ -238,6 +248,8 @@ static void xe_devcoredump_free(void *data)
> > >   	coredump->captured = false;
> > >   	drm_info(&coredump_to_xe(coredump)->drm,
> > >   		 "Xe device coredump has been deleted.\n");
> > > +
> > > +	mutex_unlock(&coredump->lock);
> > >   }
> > >   static void devcoredump_snapshot(struct xe_devcoredump *coredump,
> > > @@ -312,8 +324,11 @@ void xe_devcoredump(struct xe_exec_queue *q, struct xe_sched_job *job, const cha
> > >   	struct xe_devcoredump *coredump = &xe->devcoredump;
> > >   	va_list varg;
> > > +	mutex_lock(&coredump->lock);
> > > +
> > >   	if (coredump->captured) {
> > >   		drm_dbg(&xe->drm, "Multiple hangs are occurring, but only the first snapshot was taken\n");
> > > +		mutex_unlock(&coredump->lock);
> > >   		return;
> > >   	}
> > > @@ -332,6 +347,7 @@ void xe_devcoredump(struct xe_exec_queue *q, struct xe_sched_job *job, const cha
> > >   	dev_coredumpm_timeout(xe->drm.dev, THIS_MODULE, coredump, 0, GFP_KERNEL,
> > >   			      xe_devcoredump_read, xe_devcoredump_free,
> > >   			      XE_COREDUMP_TIMEOUT_JIFFIES);
> > > +	mutex_unlock(&coredump->lock);
> > >   }
> > >   static void xe_driver_devcoredump_fini(void *arg)
> > > @@ -343,6 +359,12 @@ static void xe_driver_devcoredump_fini(void *arg)
> > >   int xe_devcoredump_init(struct xe_device *xe)
> > >   {
> > > +	int err;
> > > +
> > > +	err = drmm_mutex_init(&xe->drm, &xe->devcoredump.lock);
> > > +	if (err)
> > > +		return err;
> > > +
> > >   	return devm_add_action_or_reset(xe->drm.dev, xe_driver_devcoredump_fini, &xe->drm);
> > >   }
> > > diff --git a/drivers/gpu/drm/xe/xe_devcoredump_types.h b/drivers/gpu/drm/xe/xe_devcoredump_types.h
> > > index e6234e887102..1a1d16a96b2d 100644
> > > --- a/drivers/gpu/drm/xe/xe_devcoredump_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_devcoredump_types.h
> > > @@ -80,7 +80,9 @@ struct xe_devcoredump_snapshot {
> > >    * for reading the information.
> > >    */
> > >   struct xe_devcoredump {
> > > -	/** @captured: The snapshot of the first hang has already been taken. */
> > > +	/** @lock: protects access to entire structure */
> > > +	struct mutex lock;
> > > +	/** @captured: The snapshot of the first hang has already been taken */
> > >   	bool captured;
> > >   	/** @snapshot: Snapshot is captured at time of the first crash */
> > >   	struct xe_devcoredump_snapshot snapshot;
> > > -- 
> > > 2.47.0
> > > 
>