[PATCH] drm/xe: Return -ENOBUFS if a kmalloc fails which is tied to an array of binds

Sat Jul 20 23:14:36 UTC 2024

On Sun, Jul 21, 2024 at 12:34:27AM +0530, Ghimiray, Himal Prasad wrote:
> 
> 
> On 19-07-2024 22:53, Matthew Brost wrote:
> > The size of an array of binds is directly tied to several kmalloc in the
> > KMD, thus making these kmalloc more likely to fail. Return -ENOBUFS in
> > the case of these failures.
> > 
> > The expected UMD behavior upon returning -ENOBUFS is to split an array
> > of binds into a series of single binds.
> 
> Would it be appropriate to have some doc/guidelines in the form of drm_err
> or kernel doc regarding expected behavior from UMD if the ioctl returns a
> -ENOBUFS error ?
> 

Yes, this on the todo list as part of error handling cleanup for both
exec and bind IOCTLs. I think kernel doc should go in xe_drm.h with a
list of errno returned and expected UMD actions. Eventually I'd like to
get this in place for all IOCTLs. I was going to work on getting exec
and bind IOCTLs fixed up in the next couple of weeks (we have an
internal doc of required changes) to have it ready for when Thomas is
back (2 more weeks).

I made this change as we already have -ENOBUFS implemented in a
different failure point for array of binds (BB being to large, see
xe_migrate.c) so might as well just finish up this error code to make it
complete as it is fairly simple change. Also Mesa has a MR [1] inflight
to handle -ENOBUFs situations.

Matt

[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30276 

> > 
> > Cc: Paulo Zanoni <paulo.r.zanoni at intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_vm.c | 12 ++++++------
> >   1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index 3fde2c8292ad..b715883f40d8 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -718,7 +718,7 @@ int xe_vm_userptr_check_repin(struct xe_vm *vm)
> >   		list_empty_careful(&vm->userptr.invalidated)) ? 0 : -EAGAIN;
> >   }
> > -static int xe_vma_ops_alloc(struct xe_vma_ops *vops)
> > +static int xe_vma_ops_alloc(struct xe_vma_ops *vops, bool array_of_binds)
> >   {
> >   	int i;
> > @@ -731,7 +731,7 @@ static int xe_vma_ops_alloc(struct xe_vma_ops *vops)
> >   				      sizeof(*vops->pt_update_ops[i].ops),
> >   				      GFP_KERNEL);
> >   		if (!vops->pt_update_ops[i].ops)
> > -			return -ENOMEM;
> > +			return array_of_binds ? -ENOBUFS : -ENOMEM;
> >   	}
> >   	return 0;
> > @@ -824,7 +824,7 @@ int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
> >   			goto free_ops;
> >   	}
> > -	err = xe_vma_ops_alloc(&vops);
> > +	err = xe_vma_ops_alloc(&vops, false);
> >   	if (err)
> >   		goto free_ops;
> > @@ -871,7 +871,7 @@ struct dma_fence *xe_vma_rebind(struct xe_vm *vm, struct xe_vma *vma, u8 tile_ma
> >   	if (err)
> >   		return ERR_PTR(err);
> > -	err = xe_vma_ops_alloc(&vops);
> > +	err = xe_vma_ops_alloc(&vops, false);
> >   	if (err) {
> >   		fence = ERR_PTR(err);
> >   		goto free_ops;
> > @@ -2765,7 +2765,7 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
> >   					   sizeof(struct drm_xe_vm_bind_op),
> >   					   GFP_KERNEL | __GFP_ACCOUNT);
> >   		if (!*bind_ops)
> > -			return -ENOMEM;
> > +			return args->num_binds > 1 ? -ENOBUFS : -ENOMEM;
> >   		err = __copy_from_user(*bind_ops, bind_user,
> >   				       sizeof(struct drm_xe_vm_bind_op) *
> > @@ -3104,7 +3104,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >   		goto unwind_ops;
> >   	}
> > -	err = xe_vma_ops_alloc(&vops);
> > +	err = xe_vma_ops_alloc(&vops, args->num_binds > 1);
> >   	if (err)
> >   		goto unwind_ops;