[Xcb] [PATCH] Handle EAGAIN errno from poll(2) or select(2)

Josh Triplett josh at joshtriplett.org
Sun Aug 23 00:32:16 PDT 2015

On Sat, Aug 22, 2015 at 07:11:41PM -0700, Jeremy Sequoia wrote:
> > On Aug 22, 2015, at 18:43, Josh Triplett <josh at joshtriplett.org> wrote:
> >> On Sat, Aug 22, 2015 at 10:52:17AM -0700, Jeremy Huddleston Sequoia wrote:
> >>> On Aug 22, 2015, at 10:30, Josh Triplett <josh at joshtriplett.org> wrote:
> >>> On Sat, Aug 22, 2015 at 02:33:46AM -0700, Jeremy Huddleston Sequoia wrote:
> >>>>> On Aug 20, 2015, at 09:21, Josh Triplett <josh at joshtriplett.org> wrote:
> >>>>> 
> >>>>> On Thu, Aug 20, 2015 at 12:18:41AM -0700, Jeremy Sequoia wrote:
> >>>>>> Yeah, I thought about sleeping before retrying in the EAGAIN case to
> >>>>>> avoid a possible busy loop.  I can do that if you prefer.
> >>>>>> 
> >>>>>> As I indicated in the commit message, there is know known fallout from
> >>>>>> the lack of EAGAIN handling.  There is no behavioral problem.  Indeed
> >>>>>> the only time someone should ever get back EAGAIN from poll or select
> >>>>>> on darwin is under resource pressure, and its likely the user would
> >>>>>> have bigger concerns than this at that point.
> >>>>>> 
> >>>>>> I just happened to notice this while tracing code to figure out why
> >>>>>> someone on stackoverflow was seeing recv() of the DISPLAY socket
> >>>>>> erring out with EAGAIN and then hanging.
> >>>>> 
> >>>>> If Darwin/OSX returns EAGAIN to a blocking call under *any*
> >>>>> circumstances, including "resource pressure", that's a serious bug.
> >>>>> Don't work around it in XCB or any other library, *especially* because
> >>>>> no other platform should behave the same way.  EAGAIN means "The socket
> >>>>> is marked nonblocking and the receive operation would block, or a
> >>>>> receive timeout had been set and the timeout expired before data was
> >>>>> received."  
> >>>> 
> >>>> No, that is not what EAGAIN means.  From SUSv4 at https://urldefense.proofpoint.com/v2/url?u=http-3A__pubs.opengroup.org_onlinepubs_9699919799_functions_poll.html&d=BQIBAg&c=eEvniauFctOgLOKGJOplqw&r=UaoPsU3Wgwl0YJPmjBVM0jyEVkD-hIP4wNFk_7YgTEE&m=b79atDQl6jtM7bQJnkNie1ThegJwAhDJkHqH6ZBsmeQ&s=8rN43F7_wUVFVOedp3SA7SqafUll4tbQU32iZKnmHM0&e= 
> >>>> 
> >>>> """
> >>>> The poll() function shall fail if:
> >>>> 
> >>>> [EAGAIN]
> >>>> The allocation of internal data structures failed but a subsequent request may succeed.
> >>>> ...
> >>>> """
> >>> 
> >>> Ah, I see; I'd forgotten that the spec actually allows EAGAIN and
> >>> EWOULDBLOCK to be different.  EWOULDBLOCK definitely has the semantics I
> >>> had in mind and that the Linux manpage documents; from
> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__pubs.opengroup.org_onlinepubs_9699919799_functions_V2-5Fchap02.html-23tag-5F15-5F03&d=BQIBAg&c=eEvniauFctOgLOKGJOplqw&r=UaoPsU3Wgwl0YJPmjBVM0jyEVkD-hIP4wNFk_7YgTEE&m=b79atDQl6jtM7bQJnkNie1ThegJwAhDJkHqH6ZBsmeQ&s=T2bl08Kgddw2duANE9MM75ZPc0SHqKhrvCy9gKYMFPE&e= 
> >>> 
> >>>> Operation would block. An operation on a socket marked as non-blocking has encountered a situation such as no data available that otherwise would have caused the function to suspend execution.
> >>> 
> >>> But sure enough, for EAGAIN it says "Resource temporarily unavailable.
> >>> This is a temporary condition and later calls to the same routine may
> >>> complete normally."  So if an implementation ignores the spec language
> >>> saying "A conforming implementation may assign the same values for
> >>> [EWOULDBLOCK] and [EAGAIN]." and makes them separate, EAGAIN can indeed
> >>> mean the kernel is making its internal problems the application's
> >>> problems and requiring the application to try again.  Sigh.
> >>> 
> >>>>> A blocking call with no timeout should never return EAGAIN;
> >>>>> it should either block or return some fatal error.
> >>>> 
> >>>> Not according to UNIX.
> >>> 
> >>> s/EAGAIN/EWOULDBLOCK/ and the statement holds.
> >> 
> >> Yep!
> >> 
> >>>>> Libraries should *definitely* not have to include "wait a bit and try
> >>>>> again" logic; that's the kernel's job.
> >>> 
> >>> I stand by this statement, but evidently the spec allows this particular
> >>> bit of ridiculosity.  Personally, I'd argue that if the kernel has a
> >>> resource allocation failure, it should be returning -ENOMEM.
> >> 
> >> I agree, but sadly nobody consulted either you or I when writing the SUS.
> >> 
> >>> Could I talk you into adding a "EAGAIN != EWOULDBLOCK && " before
> >>> checking for EAGAIN?  That way, the "retry immediately on EAGAIN" logic
> >>> will only run on platforms where EAGAIN *doesn't* have the same meaning
> >>> as EWOULDBLOCK's "this is non-blocking and would block".  On platforms
> >>> that define those two identically, the extra logic will constant-fold
> >>> away.
> >> 
> >> They won't constant fold because we're not checking for EWOULDBLOCK
> >> because it doesn't really make sense in this case.  I don't think any
> >> implementation of poll(2) or select(2) would return EWOULDBLOCK
> >> because it doesn't really make sense to have non-blocking
> >> implementations of those syscalls.  The whole point of those syscalls
> >> is to block until data is available.
> > 
> > That's not what I mean.  "EAGAIN != EWOULDBLOCK" constant-folds into 0
> > on a system where those two are equal.  So, something like "if (EAGAIN
> > != EWOULDBLOCK && errno == EAGAIN) { loop and try again }" will fold
> > away to nothing except on a system that has EAGAIN as a separate error
> > from EWOULDBLOCK, which conveniently matches those systems where
> > retrying on EAGAIN makes sense.
> I'm not sure how you are concluding that this has anything to do with
> whether or not EAGAIN and EWOULDBLOCK are the same value, but that is
> not the case.
> POSIX allows compliant implementations to define those two errnos to
> the same value and it also defines the conditions in which poll(2) can
> return EAGAIN.  There's nothing about the first which has any bearing
> on the second.
> For example, darwin defines the two errnos to the same value, and I
> think most Linux and BSDs do the same, but we still have to deal with
> the possibility of poll EAGAINing.

Sigh.  Apparently I was still underestimating how unusual an
implementation can be and still technically comply with the spec.  I had
assumed that if an implementation was going to use EAGAIN as a special
"try again later, my internal failure is now your problem" value, it
wouldn't simultaneously use the same errno value (under the name
EWOULDBLOCK) to mean "you asked me not to block so I didn't".  But if
Darwin equates the two errno values, then that check won't work.

Is this issue limited to poll() and select(), or can Darwin also return
EAGAIN from functions that can return EWOULDBLOCK if called on a
non-blocking file descriptor that isn't ready?

> This is also on the slow path, so I'm not sure it is worth making use
> of platform specific knowledge instead of coding to the standard.  If
> you prefer, I can keep the EAGAIN bits out of the select(2) path and
> keep them only in the poll(2) path.

Standards-compliant or not, it's *odd* behavior, and not particularly
sensible.  I was trying to find a way to have this not affect systems
other than those with the problem.  However, I'm now out of ideas for
how to do so, so go ahead and apply them.  To both select and poll, if
they both can spuriously return EAGAIN.

For the benefit of Linux developers who are used to an entirely
different meaning of EAGAIN, please do include a comment next to the
conditional, specifically explaining that it has nothing to do with
non-blocking descriptors in this case, that Darwin was observed to
return it from poll or select when it fails to allocate kernel-internal
resources, and that the spec allows it (citing
http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html )
and says that a subsequent call may succeed, hence the retry.  That way,
nobody will come across that line of the source and get confused about
why poll is returning EAGAIN when
http://man7.org/linux/man-pages/man2/poll.2.html doesn't mention EAGAIN.

I've also submitted a request to the Linux man-pages project to add a
portability note about this.

- Josh Triplett

More information about the Xcb mailing list