[Xcb] Question about xcb performance and threading support

Wed Feb 18 11:39:27 PST 2009

In message <200902180002.26052.gambas at users.sourceforge.net> you wrote:
> > At 1234876657 time_t, Beno=EEt Minisini wrote:
> > > I have currently using valgrind / callgrind &
> > > kcachegrind with the Gambas IDE to try to understand
> > > why QT4.4 is slower than QT3.
> > >
> > > I noticed in the Gambas/QT4 kcachegrind output that pthread_mutex_lock
> > > and pthread_mutex_unlock take respectively 1.41% and 1.52% of the
> > > execution time.

> Actually most of this time is taken by two functions from
> xcb-xlib: xcb_xlib_lock() and xcb_xlib_unlock().

Xlib and XCB are structured in such a way that
pthread_mutex_lock() and pthread_mutex_unlock() should be
no-ops if you use -lpthread-stubs instead of -lpthread in
the compilation.  Thus, you could get rid of this overhead
entirely by compiling QT4.4 in this fashion.  Of course,
then the cool multi-threaded features of XCB would go away.

> Here is a screenshot from kcachegrind of the xcb/xcb-xlib
> part of my test.

> Approximatively 2/3 of the _xcb_lock_io() and
> _xcb_unlock_io() calls comes from xcb_xlib_lock() and
> xcb_xlib_unlock(). The rest being the effective stuff
> (xcb_send_request, xcb_poll_from_event...)

Is QT4.4 compiled against the latest version of XCB with the
handoff patches?  These may mostly eliminate this overhead.

> So it seems that a lot of time is consumed in locking the
> display from the XLib.

You should also be cautious about these profile results that
are trying to capture the work percentage from 800K calls to
small functions.  There's significant possibilities for
sampling and accumulation error here.

	Bart