[PATCH 19/21] drm/xe/eudebug: Implement vm_bind_op discovery

Sat Jul 27 04:39:46 UTC 2024

On Fri, Jul 26, 2024 at 05:08:16PM +0300, Mika Kuoppala wrote:
> Follow the vm bind, vm_bind op sequence for
> discovery process of a vm with the vmas it has.
> Send events for ops and attach metadata if available.
> 
> v2: Fix bad op ref seqno (Christoph)
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_eudebug.c | 40 +++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c
> index f0bec8c8bba1..a80ffb64df15 100644
> --- a/drivers/gpu/drm/xe/xe_eudebug.c
> +++ b/drivers/gpu/drm/xe/xe_eudebug.c
> @@ -3070,6 +3070,42 @@ void xe_eudebug_debug_metadata_destroy(struct xe_file *xef, struct xe_debug_meta
>  	xe_eudebug_event_put(d, debug_metadata_destroy_event(d, xef, m));
>  }
>  
> +static int vm_discover_binds(struct xe_eudebug *d, struct xe_vm *vm)
> +{
> +	struct drm_gpuva *va;
> +	unsigned int num_ops = 0, send_ops = 0;
> +	u64 ref_seqno = 0;
> +	int err;
> +
> +	/* Currently only vm_bind_ioctl inserts vma's */
> +	drm_gpuvm_for_each_va(va, &vm->gpuvm)

This is not safe without at least vm->lock in read mode.

This function is called with xef->vm.lock & xe->files.lock and we likely
don't want to create locking dependency chain between these locks and
vm->lock either.

Hmm... How about when discovery_work_fn is called wait for executing
IOCTLs to complete and pause new IOCTLs until it is done. We should be
able to hook in xe_drm_ioctl to do this. Then we wouldn't need
xef->vm.lock & xe->files.lock locks either.

Also to be clear xef->exec_queue.lock & xef->vm.lock are not designed to
be held and do anything underneath expect to lookup something in the
xarray + take a ref to the object. Any other use is abusing these locks
and asking for trouble.

e.g. we do things like this.

241         mutex_lock(&xef->exec_queue.lock);
242         q = xa_load(&xef->exec_queue.xa, id);
243         if (q)
244                 xe_exec_queue_get(q);
245         mutex_unlock(&xef->exec_queue.lock);

If you need use xa_for_each I'd expect to look like this:

mutex_lock(&xef->exec_queue.lock);
xa_for_each(&xef->exec_queue.xa, i, q) {
	xe_exec_queue_get(q);
	mutex_unlock(&xef->exec_queue.lock);

	/* Do something with exec queue */

	xe_exec_queue_put(q);
	mutex_lock(&xef->exec_queue.lock);
}
mutex_unlock(&xef->exec_queue.lock);

It is prefectly fine in the above loop for exec queue to be removed from
the xa, xa_for_each is safe for that. The lock protects the lookup and
reference, nothing more.

Does this make sense? Feel free to ping me directly to discuss further.

Matt

> +		num_ops++;
> +
> +	if (!num_ops)
> +		return 0;
> +
> +	err = vm_bind_event(d, vm, num_ops, &ref_seqno);
> +	if (err)
> +		return err;
> +
> +	drm_gpuvm_for_each_va(va, &vm->gpuvm) {
> +		struct xe_vma *vma = container_of(va, struct xe_vma, gpuva);
> +
> +		if (send_ops >= num_ops)
> +			break;
> +
> +		err = vm_bind_op(d, vm, DRM_XE_EUDEBUG_EVENT_CREATE, ref_seqno,
> +				 xe_vma_start(vma), xe_vma_size(vma),
> +				 &vma->debug_metadata);
> +		if (err)
> +			return err;
> +
> +		send_ops++;
> +	}
> +
> +	return num_ops == send_ops ? 0 : -EINVAL;
> +}
> +
>  static int discover_client(struct xe_eudebug *d, struct xe_file *xef)
>  {
>  	struct xe_debug_metadata *m;
> @@ -3095,6 +3131,10 @@ static int discover_client(struct xe_eudebug *d, struct xe_file *xef)
>  		err = vm_create_event(d, xef, vm);
>  		if (err)
>  			break;
> +
> +		err = vm_discover_binds(d, vm);
> +		if (err)
> +			break;
>  	}
>  	mutex_unlock(&xef->vm.lock);
>  
> -- 
> 2.34.1
>