[PATCH v2 2/2] drm/xe: Add mutex locking to devcoredump

Fri Nov 22 21:00:17 UTC 2024

On 11/21/2024 18:01, Matthew Brost wrote:
> On Thu, Nov 21, 2024 at 05:25:10PM -0800, John Harrison wrote:
>> On 11/21/2024 15:44, Matthew Brost wrote:
>>> On Thu, Nov 21, 2024 at 02:55:42PM -0800, John.C.Harrison at Intel.com wrote:
>>>> From: John Harrison <John.C.Harrison at Intel.com>
>>>>
>>>> There are now multiple places that can trigger a coredump. Some of
>>>> which can happen in parallel. There is already a check against
>>>> capturing multiple dumps sequentially, but without locking it doesn't
>>>> guarantee to work against concurrent dumps. And if two dumps do happen
>>>> in parallel, they can end up doing Bad Things such as one call stack
>>>> freeing the data the other call stack is still processing. Which leads
>>>> to a crashed kernel.
>>>>
>>>> Further, it is possible for the DRM timeout to expire and trigger a
>>>> free of the capture while a user is still reading that capture out
>>>> through sysfs. Again leading to dodgy pointer problems.
>>>>
>>>> So, add a mutext lock around the capture, read and free functions to
>>>> prevent inteference.
>>>>
>>>> v2: Swap tiny scope spin_lock for larger scope mutex and fix
>>>> kernel-doc comment (review feedback from Matthe Brost)
>>>>
>>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>>> ---
>>>>    drivers/gpu/drm/xe/xe_devcoredump.c       | 26 +++++++++++++++++++++--
>>>>    drivers/gpu/drm/xe/xe_devcoredump_types.h |  4 +++-
>>>>    2 files changed, 27 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>> index dd48745a8a46..0621754ddfd2 100644
>>>> --- a/drivers/gpu/drm/xe/xe_devcoredump.c
>>>> +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>> @@ -202,21 +202,29 @@ static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
>>>>    	if (!coredump)
>>>>    		return -ENODEV;
>>>> +	mutex_lock(&coredump->lock);
>>>> +
>>> I'll just explain my reclaim comment in the prior rev here.
>>>
>>> 'coredump->lock' is the path of reclaim as it can be called from the TDR
>>> which signals dma-fences. This is why most of the devcoredump core uses
>>> GFP_ATOMIC to capture smaller state which could lost quickly. We also
>> So the reason string allocation in patch #1 should also use GFP_ATOMIC
>> rather than GFP_KERNEL?
>>
> Yes.
>
>>> have worker, ss->work, which opportunisticly captures larger VM state /w
>>> GFP_KERNEL. The worker is not in the path reclaim. Thus you cannot flush
>>> the worker under 'coredump->lock' without getting potentail deadlocks.
>>> With proper annotations lockdep complain.
>> Okay, that makes sense now. Was forgetting the captures are from the TDR /
>> dma-fence paths which are reclaim requirements. Doh!
>>
>>> e.g.
>>>
>>> We should do this on driver load:
>>>
>>> fs_reclaim_acquire();
>>> might_lock();
>>> fs_reclaim_recalim();
>> I assume this should be fs_reclaim_release()?
>>
> Yes, typo. Got a little distracted typing this.
>
>> I see three separate instances of a local primelockdep() helper function to
>> do this, two which do a might_lock() and one which does an actual
>> lock/unlock (plus another which does a lock_map_acquire/release, but I
>> assume that is very different). Plus another instance of the construct that
>> is just inline with the rest of the init function. The helper versions all
>> have a check against CONFIG_LOCKDEP but the unrolled version does not. Seems
>> like we should have a generically accessible helper function for this? Maybe
> A helper might be a good idea.
>
>> even as a wrapper around drmm_mutex_init itself? Although the xe_ggtt.c and
>> xe_migrate.c copies are not using the drmm version of mutex init. Should
>> they be?
>>
> Yes, all mutexes in Xe likely should use drmm_mutex_init. A prime
> reclaim version isn't bad idea either given all drivers in DRM use
> dma-fences and likely have mutexes that should be primed with reclaim.
>
> IIRC priming with reclaim was a bit of a hack actually, using
> dma_fence_begin_signaling/end is really what we likely want to do but
> that annotation had some odd weakness which would give false lockdep
> positives. Thomas may have fixed this recently though. If you post a
> common drmm function, I think the correct annotation could be sorted out
> on dri-devel.
Are you thinking this would be a drmm_mutex_init_reclaim(dev, lock) 
function/macro at the end of drm_manage.h? Or should it still be a 
separate drmm_mutex_prep_for_reclaim() function to be called after init 
and in some other reclaim specific header?

John.

>
> Matt
>
>
>> John.
>>
>>> Our upper layers should also but may have gaps. Reguardless, priming
>>> lockdep is a good practice and self-documenting.
>>>
>>>>    	ss = &coredump->snapshot;
>>>>    	/* Ensure delayed work is captured before continuing */
>>>>    	flush_work(&ss->work);
>>> So this is where the mutex should be locked.
>>>
>>>> -	if (!ss->read.buffer)
>>>> +	if (!ss->read.buffer) {
>>>> +		mutex_unlock(&coredump->lock);
>>>>    		return -ENODEV;
>>>> +	}
>>>> -	if (offset >= ss->read.size)
>>>> +	if (offset >= ss->read.size) {
>>>> +		mutex_unlock(&coredump->lock);
>>>>    		return 0;
>>>> +	}
>>>>    	byte_copied = count < ss->read.size - offset ? count :
>>>>    		ss->read.size - offset;
>>>>    	memcpy(buffer, ss->read.buffer + offset, byte_copied);
>>>> +	mutex_unlock(&coredump->lock);
>>>> +
>>>>    	return byte_copied;
>>>>    }
>>>> @@ -228,6 +236,8 @@ static void xe_devcoredump_free(void *data)
>>>>    	if (!data || !coredump_to_xe(coredump))
>>>>    		return;
>>>> +	mutex_lock(&coredump->lock);
>>>> +
>>>>    	cancel_work_sync(&coredump->snapshot.work);
>>> Likewise, lock the mutex here.
>>>
>>> Matt
>>>
>>>>    	xe_devcoredump_snapshot_free(&coredump->snapshot);
>>>> @@ -238,6 +248,8 @@ static void xe_devcoredump_free(void *data)
>>>>    	coredump->captured = false;
>>>>    	drm_info(&coredump_to_xe(coredump)->drm,
>>>>    		 "Xe device coredump has been deleted.\n");
>>>> +
>>>> +	mutex_unlock(&coredump->lock);
>>>>    }
>>>>    static void devcoredump_snapshot(struct xe_devcoredump *coredump,
>>>> @@ -312,8 +324,11 @@ void xe_devcoredump(struct xe_exec_queue *q, struct xe_sched_job *job, const cha
>>>>    	struct xe_devcoredump *coredump = &xe->devcoredump;
>>>>    	va_list varg;
>>>> +	mutex_lock(&coredump->lock);
>>>> +
>>>>    	if (coredump->captured) {
>>>>    		drm_dbg(&xe->drm, "Multiple hangs are occurring, but only the first snapshot was taken\n");
>>>> +		mutex_unlock(&coredump->lock);
>>>>    		return;
>>>>    	}
>>>> @@ -332,6 +347,7 @@ void xe_devcoredump(struct xe_exec_queue *q, struct xe_sched_job *job, const cha
>>>>    	dev_coredumpm_timeout(xe->drm.dev, THIS_MODULE, coredump, 0, GFP_KERNEL,
>>>>    			      xe_devcoredump_read, xe_devcoredump_free,
>>>>    			      XE_COREDUMP_TIMEOUT_JIFFIES);
>>>> +	mutex_unlock(&coredump->lock);
>>>>    }
>>>>    static void xe_driver_devcoredump_fini(void *arg)
>>>> @@ -343,6 +359,12 @@ static void xe_driver_devcoredump_fini(void *arg)
>>>>    int xe_devcoredump_init(struct xe_device *xe)
>>>>    {
>>>> +	int err;
>>>> +
>>>> +	err = drmm_mutex_init(&xe->drm, &xe->devcoredump.lock);
>>>> +	if (err)
>>>> +		return err;
>>>> +
>>>>    	return devm_add_action_or_reset(xe->drm.dev, xe_driver_devcoredump_fini, &xe->drm);
>>>>    }
>>>> diff --git a/drivers/gpu/drm/xe/xe_devcoredump_types.h b/drivers/gpu/drm/xe/xe_devcoredump_types.h
>>>> index e6234e887102..1a1d16a96b2d 100644
>>>> --- a/drivers/gpu/drm/xe/xe_devcoredump_types.h
>>>> +++ b/drivers/gpu/drm/xe/xe_devcoredump_types.h
>>>> @@ -80,7 +80,9 @@ struct xe_devcoredump_snapshot {
>>>>     * for reading the information.
>>>>     */
>>>>    struct xe_devcoredump {
>>>> -	/** @captured: The snapshot of the first hang has already been taken. */
>>>> +	/** @lock: protects access to entire structure */
>>>> +	struct mutex lock;
>>>> +	/** @captured: The snapshot of the first hang has already been taken */
>>>>    	bool captured;
>>>>    	/** @snapshot: Snapshot is captured at time of the first crash */
>>>>    	struct xe_devcoredump_snapshot snapshot;
>>>> -- 
>>>> 2.47.0
>>>>