[Xcb] Invalid XIDs

Barton C Massey bart at cs.pdx.edu
Sat Jan 24 17:42:04 PST 2009


Wow.  Thanks hugely for the very detailed report.  Yes, this
could be your problem.

For those playing along at home, the code in question is
this:

  uint32_t xcb_generate_id(xcb_connection_t *c)
  {
      uint32_t ret;
      if(c->has_error)
	  return -1;
      pthread_mutex_lock(&c->xid.lock);
      if(c->xid.last == c->xid.max)
      {
	  xcb_xc_misc_get_xid_range_reply_t *range;
	  range = xcb_xc_misc_get_xid_range_reply(c,
                    xcb_xc_misc_get_xid_range(c), 0);
	  if(!range)
	  {
	      pthread_mutex_unlock(&c->xid.lock);
	      return -1;
	  }
	  c->xid.last = range->start_id;
	  c->xid.max = range->start_id + (range->count - 1) * c->xid.inc;
	  free(range);
      }
      ret = c->xid.last | c->xid.base;
      c->xid.last += c->xid.inc;
      pthread_mutex_unlock(&c->xid.lock);
      return ret;
  }


Note that once xid.last becomes greater than xid.max
xcb_generate_id() will never again request a new xid block
but will happily keep allocating bogus xids, which is
apparently what has happened to you.

Aside from the obvious way this could happen (something
elsewhere setting xid.last deliberately or accidentally),
there's at least one code path that could do it here, namely
if range->count comes back as 1 (or 0, but presumably that
would be illegal).  In that case, xid.last == xid.max and
then we increment xid.last.  I think the - 1 in this line

  c->xid.max = range->start_id + (range->count - 1) * c->xid.inc;

is just a bug as the code stands.  The invariant the code
maintains is that outside of the function xid.last (bad
name) contains the next xid to be returned.  Thus, in
addition to being potentially buggy the code in question
"wastes" an xid out of every block.

Why don't you try this code instead?  I changed the
invariant so that outside of the function xid.last really is
the xid allocated on the last call and xid.max really is the
maximum xid to be allocated in this block.  I also changed
the block allocation test and added a couple of assertions
so that if this code fails it will fail early.

  uint32_t xcb_generate_id(xcb_connection_t *c)
  {
      uint32_t ret;
      if(c->has_error)
	  return -1;
      pthread_mutex_lock(&c->xid.lock);
      if(c->xid.last >= c->xid.max)
      {
	  xcb_xc_misc_get_xid_range_reply_t *range;
	  assert(c->xid.last == c->xid.max);
	  range = xcb_xc_misc_get_xid_range_reply(c,
                    xcb_xc_misc_get_xid_range(c), 0);
	  if(!range)
	  {
	      pthread_mutex_unlock(&c->xid.lock);
	      return -1;
	  }
	  assert(range->count > 0);
	  c->xid.last = range->start_id;
	  c->xid.max = range->start_id + (range->count - 1) * c->xid.inc;
	  free(range);
      } else {
	  c->xid.last += c->xid.inc;
      }
      ret = c->xid.last | c->xid.base;
      pthread_mutex_unlock(&c->xid.lock);
      return ret;
  }

Let us know how that works for you---if it solves your
problem or at least doesn't break anything I'll go ahead and
commit a patch.

	Bart

In message <961fa2b40901240459i24d73193v1743716f8238951 at mail.gmail.com> you wrote:
> On Ubuntu 8.10, using packages as shipped, my firefox crashes every
> few days with a BadIDChoice X error, and I've been trying to figure
> out why.  One hypothesis was that this was due to the xlib race
> condition fixed in
>   http://cgit.freedesktop.org/xorg/lib/libX11/commit/?id=cc19618d2eb3ed92a0b574aee26a7da8b4aed5d2
> but I've been informed that firefox makes all X calls from a single
> thread (and some tracing seems to confirm that), so that can't be it.
> 
> Catching it in gdb while running with --sync, though, I have noticed
> something that looks odd down in XCB's guts.  At the time of the
> crash, xlib has basically just done (in _XIDHandler):
>   dpy->next = xcb_generate_id(dpy->xcb->connection);
>   SyncHandle();
> and the SyncHandle() is where the error is detected.  According to
> gdb, dpy->next here is 57206503.
> 
> At the same time, dpy->xcb->connection->xid.max is 57206499 (and .last
> is 57206504), i.e., it looks like xcb_generate_id has overrun our XID
> range without noticing.
> 
> 1) Does this explain the firefox crashes?  (I'm assuming yes.)
> 2) Any ideas how how we could be getting into this state?  I'm staring
> at the code and don't see how it's possible...
> 
> libxcb version "1.1-1.1", with xlib "2:1.1.5-2ubuntu1.1"; I have a
> core file if that's useful for further debugging (I can also provide
> it on request, but be warned that it's 250 MB compressed, and needs
> the Ubuntu debug packages to be useful).  Firefox bug is
> https://bugzilla.mozilla.org/show_bug.cgi?id=458092 .
> 
> -- Nathaniel
> _______________________________________________
> Xcb mailing list
> Xcb at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/xcb


More information about the Xcb mailing list