[Intel-gfx] [PATCH] dma-buf: Update docs for SYNC ioctl

Wed Mar 23 15:42:23 UTC 2016

On Wed, Mar 23, 2016 at 04:32:59PM +0100, David Herrmann wrote:
> Hi
> 
> On Wed, Mar 23, 2016 at 12:56 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > On Wed, Mar 23, 2016 at 12:30:42PM +0100, David Herrmann wrote:
> >> My question was rather about why we do this? Semantics for EINTR are
> >> well defined, and with SA_RESTART (default on linux) user-space can
> >> ignore it. However, looping on EAGAIN is very uncommon, and it is not
> >> at all clear why it is needed?
> >>
> >> Returning an error to user-space makes sense if user-space has a
> >> reason to react to it. I fail to see how EAGAIN on a cache-flush/sync
> >> operation helps user-space at all? As someone without insight into the
> >> driver implementation, it is hard to tell why.. Any hints?
> >
> > The reason we return EAGAIN is to workaround a deadlock we face when
> > blocking on the GPU holding the struct_mutex (inside the client's
> > process), but the GPU is dead. As our locking is very, very coarse we
> > cannot restart the GPU without acquiring the struct_mutex being held by
> > the client so we wake the client up and tell them the resource they are
> > waiting on (the flush of the object from the GPU into the CPU domain) is
> > temporarily unavailable. If they try to immediately wait upon the ioctl
> > again, they are blocked waiting for the reset to occur before they may
> > complete their flush. There are a few other possible deadlocks that are
> > also avoided with EAGAIN (again, the issue is more or less the lack of
> > fine grained locking).
> 
> ...so you hijacked EAGAIN for all DRM ioctls just for a driver
> workaround?

No, we utilized the fact that EAGAIN was already enshrined by libdrm as
the defacto mechanism for repeating the ioctl in order to repeat the
ioctl for a driver workaround.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre