RFC - libglvnd and GLXVND vendor enumeration to facilitate GLX multi-vendor PRIME GPU offload

Fri Feb 8 18:19:35 UTC 2019

(I'll omit EGL and Vulkan for the moment, for the sake of focus, and those
APIs have programmatic ways to enumerate and select GPUs.  Though, some
of what we decide here for GLX we may want to leverage for other APIs.)

Today, GLX implementations loaded into the X server register themselves
on a per-screen basis, GLXVND in the server dispatches GLX requests to
the registered vendor per screen, and libglvnd determines the client-side
vendor library to use by querying the per-screen GLX_VENDOR_NAMES_EXT
string from the X server (e.g., "mesa" or "nvidia").

The GLX_VENDOR_NAMES_EXT string can be overridden within libglvnd
through the __GLX_VENDOR_LIBRARY_NAME environment variable, though I
don't believe that is used much currently.

To enable GLX to be used in a multi-vendor PRIME GPU offload environment,
it seems there are several desirable user-visible behaviors:

* By default, users should get the same behavior we have today (i.e.,
  the GLX implementation used within the client and the server, for an X
  screen, is dictated by the X driver of the X screen).

* The user should be able to request a different GLX vendor for use on a
  per-process basis through either an environment variable (potentially
  reusing __GLX_VENDOR_LIBRARY_NAME) or possibly a future application
  profile mechanism in libglvnd.

* To make configuration optionally more "portable", the selection override
  mechanism should be able to refer to more generic names like
  "performance" or "battery", and those generic names should be mapped
  to specific GPUs/vendors on a per-system basis.

* To make configuration optionally more explicit, the selection override
  mechanism should be able to distinguish between individual GPUs by
  using hardware specific identifiers such as PCI BusID-based names like
  what DRI_PRIME currently honors (e.g., "pci-0000_03_00_0").

Do those behaviors seem reasonable?

If so, it seems like there are two general directions we could take to
implement that infrastructure in client-side libglvnd and GLXVND within
the X server, if the user or application profile requests a particular
vendor, either by vendor name (e.g., "mesa"/"nvidia"), functional
name (e.g., "battery"/"performance"), or hardware-based name (e.g.,
"pci-0000_03_00_0"/pci-0000_01_00_0"):

(1) If configured for PRIME GPU offloading (environment variable or
    application profile), client-side libglvnd could load the possible
    libGLX_${vendor}.so libraries it finds, and call into each to
    find which vendor (and possibly which GPU) matches the specified
    string. Once a vendor is selected, the vendor library could optionally
    tell the X server which GLX vendor to use server-side for this
    client connection.

(2) The GLX implementations within the X server could, when registering
    with GLXVND, tell GLXVND which screens they can support for PRIME
    GPU offloading.  That list could be queried by client-side libglvnd,
    and then used to interpret __GLX_VENDOR_LIBRARY_NAME and pick the
    corresponding vendor library to load.  Client-side would tell the X
    server which GLX vendor to use server-side for this client connection.

In either direction, if the user-requested string is a hardware-based
name ("pci-0000_03_00_0"), the GLX vendor library presumably needs to be
told that GPU, so that the vendor implementation can use the right GPU
(in the case that the vendor supports multiple GPUs in the system).

But, both (1) and (2) are really just points on a continuum.  I suppose
the more general question is: how much of the implementation should go
in the server and how much should go in the client?

At one extreme, the client could do nearly all the work (with the
practical downside of potentially loading multiple vendor libraries in
order to interpret __GLX_VENDOR_LIBRARY_NAME).

At the other extreme, the server could do nearly all the work of
generating the possible __GLX_VENDOR_LIBRARY_NAME strings (with the
practical downside of each server-side GLX vendor needing to enumerate
the GPUs it can drive, in order to generate the hardware-specific
identifiers).

I'm not sure where on that spectrum it makes the most sense to land,
and I'm curious what others think.

Thanks,
- Andy