Spinning in _XReply

Tue Mar 15 17:19:12 PDT 2011

Short version: I can't maintain Xlib's correctness conditions and also
allow error handlers to call _XReply. In various places, Xlib assumes
that it processes all responses from the server in the order the server
generated those responses. If the server has generated an error and then
another response, and while processing the error we issue a request, the
existing response must be processed before the new request's reply can
be processed.

So I don't see anything we can do except require that callers meet the
documented requirements for XSetErrorHandler. I did try to find a kludge
to let existing apps survive this bug, but anything I could think of
probably breaks multi-threaded clients.

However, I couldn't find any reason for the Display lock to be held
while calling the user-supplied error handler, so I've pushed a patch to
revert that much, at least, to the pre-XCB behavior.

Detailed responses to various parts of this thread follow.

On Sat, Jan 29, 2011 at 09:54:37AM -0800, Jeremy Huddleston wrote:
> 2) emacs' error handler seems bugged.  Is it legal to call XSync()
> within the error handler?  It certainly seems like it shouldn't.  Did
> we used to actually support this with the xtrans version of libX11?

I have to pawn this off as a bug in callers.

> Why are we spinning in _XReply rather than entering a deadlock here?
> It seems to me that we shouldn't wake from this ConditionWait because
> the _XReply we're nested in won't ever ConditionBroadcast()... so why
> is he seeing us spin through _XReply?  It's almost as if this
> ConditionWait() is a no-op:

Indeed, ConditionWait is a no-op unless the application has called
XInitThreads. And I bet Emacs is single-threaded.

On Sat, Jan 29, 2011 at 09:24:42PM -0600, Pat Kane wrote:
> I just took a quick look at xcb_io.c to see what _XReply is doing,
> and I doubt that it is reentrant.
> The only way to break out of that "while(1)" loop is if "req == current".
> 
> I wonder why the  test at the top:
>        if(dpy->flags & XlibDisplayIOError)
> 		return 0;
> did not work.

Because that flag is set only on I/O errors, not on X protocol errors.
But you had me going for a minute. :-)

On Tue, Feb 22, 2011 at 10:14:07AM +0100, Julien Cristau wrote:
> Not as far as I can tell.  From XSetErrorHandler(3):
> 
>        Because this condition is not assumed to be fatal, it is acceptable for
>        your error handler to return; the returned value is ignored.  However,
>        the error handler should not call any functions (directly or indi‐
>        rectly) on the display that will generate protocol requests or that
>        will look for input events.
> 
> That's not a new requirement.

Thanks for digging that up, Julien.

On Wed, Feb 23, 2011 at 10:26:21AM +1300, xmail at karlt.net wrote:
> Rami Ylimäki writes:
> > Today I found out that even Mozilla code is
> > suffering from this problem
> > (http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/nsX11ErrorHandler.cpp).
> >
> > Adding Karl into CC. It looks like nsX11ErrorHandler.cpp is also
> > calling Xlib functions resulting in protocol requests, which is
> > not allowed.
> 
> Yes, you and Julian are right that clients should not generate
> protocol requests from within an error handler, and we should
> address this in our client.
> 
> Our handler makes the error fatal but the goal was to get as much
> information as practical abort the error for error reporting.
> Raw request code numbers are harder to interpret as they depend on
> which extensions the server is using.
> 
> Although wrong, our code has been working well enough for us to get
> useful information.  If someone is able to point out a reason why
> this now works less successfully than previously, then that would
> bump up the priority at our end.
> 
> The information we want is already on display->ext_procs->codes
> but "This structure is private to the library."  I'm not aware of
> a public API to get this information from Xlib.

Eh, that hasn't stopped anyone else who's using Xlib. :-/

One thing you could do if you don't want to use private APIs is to call
XListExtensions (and XQueryExtension, if you need event and error
numbers) when you open the Display. The extension list won't change over
the lifetime of the connection, and then you have all the data you need
if your error handler does get called.

Otherwise, screw it, I'd just include Xlibint.h and dig out
ext_procs->codes.

On Wed, Feb 23, 2011 at 12:15:47PM +0200, Rami Ylimäki wrote:
> By superficially looking at the Xlib code, the behavior in older
> Xlib version (1.3.3) is different depending on whether Xlib is
> configured --with-xcb or --without-xcb. When Xcb is disabled, error
> handler is called with display unlocked and _XReply seems to be
> prepared to handle requests from error handler. With Xcb, which is
> the default now with Xlib 1.4, the display remains locked during
> error handling and nested requests aren't allowed.

Since I've concluded that lock handling was a bug, I've fixed it.

On Fri, Mar 04, 2011 at 12:20:12PM +1300, Karl Tomlinson wrote:
> > http://lists.x.org/archives/xorg-devel/2011-February/019533.html
> > does seem to indicate another potential issue (breaking from the
> > loop only when "req == current").  I guess that hasn't bitten us
> > because we don't return to the first _XReply.
> 
> That seems to have been the situation until
> http://cgit.freedesktop.org/xorg/lib/libX11/commit/?id=933aee1d5c53b0cc7d608011a29188b594c8d70b
> 
> I see that we now spin even without XInitThreads.
> That is version 1.3.4 --with-xcb.

I'm not sure what this means. Do you mean you spin due to re-entering
_XReply from _XError, or because of something involving the "req ==
current" condition? In the latter case, please provide more information,
as I don't know of any bugs there.

On Fri, Mar 04, 2011 at 10:32:58AM +0200, Rami Ylimäki wrote:
> If XInitThreads is used, the call to XSynchronize in the error
> handler is also problematic and I don't think it's possible to
> determine whether the Xlib connection was synchronous or not from
> the error handler with enabled locks.

With the patch I just pushed, I think calling XSynchronize should once
again be safe from an error handler.

Jamey
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.x.org/archives/xorg-devel/attachments/20110315/0561f9c5/attachment.pgp>