[Intel-xe] [PATCH 3/4] drm/xe: Add vm snapshot mutex for easily taking a vm snapshot during devcoredump

Tue Oct 17 10:15:27 UTC 2023

Hey,

On 2023-10-13 21:21, Matthew Brost wrote:
> On Fri, Oct 13, 2023 at 05:21:41PM +0200, maarten.lankhorst at linux.intel.com wrote:
>> From: Maarten Lankhorst <dev at lankhorst.se>
>>
>> Signed-off-by: Maarten Lankhorst <dev at lankhorst.se>
>> ---
>>   drivers/gpu/drm/xe/xe_vm.c       | 7 +++++++
>>   drivers/gpu/drm/xe/xe_vm_types.h | 5 +++++
>>   2 files changed, 12 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index da006249147e..544d998293d3 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -1163,7 +1163,9 @@ static int xe_vm_insert_vma(struct xe_vm *vm, struct xe_vma *vma)
>>   	xe_assert(vm->xe, xe_vma_vm(vma) == vm);
>>   	lockdep_assert_held(&vm->lock);
>>   
>> +	mutex_lock(&vm->snap_mutex);
> 
> I can't say I'm a fan of adding a new lock. Can you explain why this is
> needed? e.g. Why does the exisiting &vm->lock not work? We have lockdep
> annotations in both xe_vm_insert_vma & xe_vm_remove_vma that vm->lock is
> held. As is, the capture should be able to take vm->lock in write mode
> and this is safe.
We literally cannot take that lock at the time of crash capture. We are 
in a signaling context, so
lock vm->lock and then wait for signaling

would hang on
wait for signaling (snapshot context) locking vm->lock.

The snap_mutex only protects the walk of the VM, so it has very specific 
requirements. In the non-snapshot case, it will not ever be a contended 
lock.

> Side note the annotations should probably be in write mode for
> xe_vm_insert_vma & xe_vm_remove_vma and the capture should be able to
> take this in read mode. We should double check on this.

Cheers,
~Maarten