[Intel-gfx] [PATCH] dma-buf: Update docs for SYNC ioctl

Wed Mar 23 15:32:59 UTC 2016

Hi

On Wed, Mar 23, 2016 at 12:56 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> On Wed, Mar 23, 2016 at 12:30:42PM +0100, David Herrmann wrote:
>> My question was rather about why we do this? Semantics for EINTR are
>> well defined, and with SA_RESTART (default on linux) user-space can
>> ignore it. However, looping on EAGAIN is very uncommon, and it is not
>> at all clear why it is needed?
>>
>> Returning an error to user-space makes sense if user-space has a
>> reason to react to it. I fail to see how EAGAIN on a cache-flush/sync
>> operation helps user-space at all? As someone without insight into the
>> driver implementation, it is hard to tell why.. Any hints?
>
> The reason we return EAGAIN is to workaround a deadlock we face when
> blocking on the GPU holding the struct_mutex (inside the client's
> process), but the GPU is dead. As our locking is very, very coarse we
> cannot restart the GPU without acquiring the struct_mutex being held by
> the client so we wake the client up and tell them the resource they are
> waiting on (the flush of the object from the GPU into the CPU domain) is
> temporarily unavailable. If they try to immediately wait upon the ioctl
> again, they are blocked waiting for the reset to occur before they may
> complete their flush. There are a few other possible deadlocks that are
> also avoided with EAGAIN (again, the issue is more or less the lack of
> fine grained locking).

...so you hijacked EAGAIN for all DRM ioctls just for a driver
workaround? EAGAIN is universally used to signal the caller about a
blocking resource. It is very much linked to O_NONBLOCK. Why not use
EBUSY, ECANCELED, ECOMM, EDEADLOCK, EIO, EL3RST, ...

Anyhow, I guess that ship has sailed. But just mentioning EAGAIN in a
kernel-doc is way to vague for user-space to figure out they should
loop on it.

Thanks
David