[PATCH] drm/xe: Fix exec IOCTL long running exec queue ring full condition

Fri Jan 5 03:32:26 UTC 2024

On Thu, Jan 04, 2024 at 01:55:48PM -0800, Welty, Brian wrote:
> 
> On 1/4/2024 12:09 AM, Matthew Brost wrote:
> > The intent is to return -EWOULDBLOCK to the user if a long running exec
> > queue is full during the exec IOCTL. -EWOULDBLOCK aliases to -EAGAIN
> > which results in the exec IOCTL doing a retry loop. Fix this by ensuring
> > the retry loop is broken when returning -EWOULDBLOCK.
> > 
> > Fixes: 8ae8a2e8dd21 ("drm/xe: Long running job update")
> > Reported-by: Sai Gowtham Ch <sai.gowtham.ch at intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_exec.c | 7 ++++---
> >   1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_exec.c b/drivers/gpu/drm/xe/xe_exec.c
> > index d30c0d0689bc..c68e1bd15e6a 100644
> > --- a/drivers/gpu/drm/xe/xe_exec.c
> > +++ b/drivers/gpu/drm/xe/xe_exec.c
> > @@ -115,7 +115,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >   	struct xe_sched_job *job;
> >   	struct dma_fence *rebind_fence;
> >   	struct xe_vm *vm;
> > -	bool write_locked;
> > +	bool write_locked, skip_eagain = false;
> >   	ktime_t end = 0;
> >   	int err = 0;
> > @@ -227,7 +227,8 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >   	}
> >   	if (xe_exec_queue_is_lr(q) && xe_exec_queue_ring_full(q)) {
> > -		err = -EWOULDBLOCK;
> > +		err = -EWOULDBLOCK;	/* Aliased to -EAGAIN */
> > +		skip_eagain = true;
> 
> Would using another error code be more clean and avoid confusion in
> user-space as well?   Such as -EBUSY here?
> 
> But if you feel strongly about using EWOULDBLOCK, fix looks good to me.

I think after quite a bit of review we landed on EWOULDBLOCK as this
means user space should retry the IOCTL.

Matt 

> Reviewed-by: Brian Welty <brian.welty at intel.com>
> 
> >   		goto err_exec;
> >   	}
> > @@ -337,7 +338,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >   		up_write(&vm->lock);
> >   	else
> >   		up_read(&vm->lock);
> > -	if (err == -EAGAIN)
> > +	if (err == -EAGAIN && !skip_eagain)
> >   		goto retry;
> >   err_syncs:
> >   	for (i = 0; i < num_syncs; i++)