[PATCH] xf86drm.c: add counter for ioctl restarting

Fri Apr 13 07:45:34 PDT 2012

On Fri, Apr 13, 2012 at 03:42:16PM +0200, Daniel Vetter wrote:
> On Fri, Apr 13, 2012 at 05:26:42PM +0400, Anton V. Boyarshinov wrote:
> > In some cases ioclt->alarm->ioctl loop can be infinite:
> > ioctl(7, 0x40086482, 0xbfb62738)        = ? ERESTARTSYS (To be restarted)
> > --- SIGALRM (Alarm clock) @ 0 (0) ---
> > sigreturn()                             = ? (mask now [])
> > ioctl(7, 0x40086482, 0xbfb62738)        = ? ERESTARTSYS (To be restarted)
> > and forever.
> > 
> > It seems, that limiting ioctl restarting by some resonable number of trys
> > is a dirty but working way to prevent Xorg lockups.
> > 
> > Signed-off-by: Anton V. Boyarshinov <boyarsh at altlinux.org>
> > ---
> >  xf86drm.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/xf86drm.c b/xf86drm.c
> > index 6ea068f..9663f21 100644
> > --- a/xf86drm.c
> > +++ b/xf86drm.c
> > @@ -162,10 +162,11 @@ int
> >  drmIoctl(int fd, unsigned long request, void *arg)
> >  {
> >      int	ret;
> > +    int count=0;
> >  
> >      do {
> >  	ret = ioctl(fd, request, arg);
> > -    } while (ret == -1 && (errno == EINTR || errno == EAGAIN));
> > +    } while (ret == -1 && (errno == EINTR || errno == EAGAIN) && ++count < 100 );
> 
> We rely on restarting after signals when blocking for the gpu, busy gpu
> plus mouse wiggling can easily reach that.
> 
> NACKed-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> 
> Obviously if we have a dead gpu, we need to break out of this loop. But
> detecting a dead gpu (and returning an appropriate error like EIO) is the
> kernel's job.

The problem with EINTR is that someone else could be poking at the
device at the same time, causing the restarted ioctls to wait some
more, and again be interrupted by a signal. Or the hardware could
be itself responsible for this problem.

I ran into this issue once with a "wait for vblank" ioctl,
which had a fixed relative timeout value. So every time I would
start to wait, a signal would arrive causing the vblank to be
missed. The restarted ioctl would then start waiting using the
original timeout, allowing the signal to interrupt it again.

Any syscall which has a relative timeout should be desgigned so that
the timeout gets updated by the kernel when interrupted, so that the
originally specified timeout would really be the total timeout.

-- 
Ville Syrjälä
Intel OTC