[RFC] Making tracing/debugging of client requests easier

Wed Jun 16 04:29:27 PDT 2010

Hi,

This is a second version of an RFC that was originally discussed
internally at Nokia. Some ideas in the original RFC were found useful
and based on the feedback it makes sense to send a corrected RFC
version to this upstream list also.

We'd like to make debugging problems in X server and clients
easier. Most importantly we want to make it easy to determine the
process names of clients. To accomplish this some minor modifications
are suggested to XRES extension and X server tracing code.

1. Tracing client requests with another client

It's often necessary to trace what requests clients are doing in order
to detect sub-optimal behaviour. For example, it might be necessary to
monitor clients to ensure that they aren't using core protocol drawing
requests but use DRI2 instead.

Currently it's possible to monitor client requests with the XRECORD
extension. For example, something like "cnee --record --request-range
55-77" does the job. However, XRECORD is only publishing the XID of
clients that did some particular request. We'd like to map this XID to
a PID, since in practice all clients on our system are local.

Therefore we propose adding a new request to XRES extension for
mapping a client XID to a PID. The library function would look
something like this:

   typedef enum {
     XRES_UNKNOWN_ID,
     XRES_LOCAL_PID
   } XResClientIdType;

   typedef long XResClientIdValue;

   typedef struct {
     XResClientIdType type;
     XResClientIdValue value;
   } XResClientId;

   Status XResQueryClientId(Display *dpy, XID client, XResClientId *id);

XResClientId is some generic type capable of holding a PID. On Linux,
a PID would be returned if the client doing the request and the client
specified by XID are both local. The PID can then be mapped to a
process name on Linux using /proc/PID/cmdline. Otherwise some
alternative ID making sense for remote clients is returned (or just an
unknown ID indicating unsuccessful query).

Some existing clients using XRES are iterating window hierarchy and
searching for _NET_WM_PID, but that approach is too slow and
unreliable for finding out client PIDs. At least the xres and cnee
clients would be able take advantage of this new request and make it
easier to identify local clients.

2. Determining client process names in X server

Sometimes it's useful to restrict debugging to some specific client
when investigating problems in X server side. For example, your video
driver might have some problem in a busy execution path used by a lot
of different clients. However, there might be only one client that is
exposing the problem, but only if other clients such as WM are running
as well.

X server should provide some helper functions to easily determine
which client is currently served. Below is an example list of some
helper functions that have been found useful when debugging problems.

   /* mark served client in Dispatch */
   void setServedClient(ClientPtr client);

   /* determine served client somewhere deeper
    * where ClientPtr isn't directly available */
   ClientPtr getServedClient();

   /* determine local process name from /proc/PID/cmdline */
   pid_t getClientPid(ClientPtr);
   Bool getPidName(pid_t pid, int len, char *name);
   Bool getClientName(ClientPtr client, int len, char *name);

   /* use in debugging code to make client based breakpoints
    * example:
    *   if (isServedClient("/usr/bin/problem-app"))
    *      add break point;
    */
   Bool isServedClient(char *name);

SELinuxLabelClient already provides code to implement these
functions. However, this extension is not necessarily available so the
functionality should be moved to a more generic place to be used even
without XSELINUX. I'll propose in the next section the place where to
call and define these helper functions in X server.

3. Collecting client information in X server

Whenever a client connects to X server, its PID and process name (for
local clients) would be determined by the functions described in
previous section and stored in a private data structure. This would
make it easy to quickly determine a sensible name for any connected
client for debugging purposes. Also having a map from client XIDs to
PIDs would make implementation of the new XRES query described in
first section fast.

In request dispatch, the opcodes of requests would be saved to a
circular buffer. Each request in the circular buffer would be linked
to the client name for easier identification of request
sources. Linking requests to client process names also has the
advantage that the process name still makes sense even after the
client has disconnected. This request buffer would be maintained so
that it'd be a little bit easier to determine recent requests and
originating client names from a core dump when a crash occurs. The
buffer would give at least some chance of guessing how to reproduce
rarely occurring crashes. This buffer would also make it trivial to
determine the currently served client for client based debugging.

In order to collect that information, you need to add calls to pretty
much the same places that currently hold the DTRACE calls. For this
reason, I propose that we add generic callbacks to those locations
(client connect/disconnect/request), so that the DTRACE calls and the
calls to functions described in this RFC can be moved to their own
extension, which then registers to the generic callbacks. In other
words, we'd create a new internal trace/debug X server extension and
move tracing/debugging related helper code there. The extension would
contain useful helper functions to aid debugging so that people
wouldn't have to maintain so much local hacks and patches (determining
process name of a client being a prime example of code that is sitting
in different forms on the machines of several developers).

4. Summary

Create an internal X server extension for tracing/debugging code. Make
it possible to map client XID to its PID in client side. Make it easy
to determine client process name in X server side. Collect a recent
history of requests and originating client names for determining currently
served client and to give more insight about crashes.

Thanks for any comments,
     Rami