RFC - libglvnd and GLXVND vendor enumeration to facilitate GLX multi-vendor PRIME GPU offload

Wed Feb 13 18:19:38 UTC 2019

On 02/08/2019 11:19 AM, Andy Ritger wrote:
> (I'll omit EGL and Vulkan for the moment, for the sake of focus, and those
> APIs have programmatic ways to enumerate and select GPUs.  Though, some
> of what we decide here for GLX we may want to leverage for other APIs.)
>
>
> Today, GLX implementations loaded into the X server register themselves
> on a per-screen basis, GLXVND in the server dispatches GLX requests to
> the registered vendor per screen, and libglvnd determines the client-side
> vendor library to use by querying the per-screen GLX_VENDOR_NAMES_EXT
> string from the X server (e.g., "mesa" or "nvidia").
>
> The GLX_VENDOR_NAMES_EXT string can be overridden within libglvnd
> through the __GLX_VENDOR_LIBRARY_NAME environment variable, though I
> don't believe that is used much currently.
>
> To enable GLX to be used in a multi-vendor PRIME GPU offload environment,
> it seems there are several desirable user-visible behaviors:
>
> * By default, users should get the same behavior we have today (i.e.,
>    the GLX implementation used within the client and the server, for an X
>    screen, is dictated by the X driver of the X screen).
>
> * The user should be able to request a different GLX vendor for use on a
>    per-process basis through either an environment variable (potentially
>    reusing __GLX_VENDOR_LIBRARY_NAME) or possibly a future application
>    profile mechanism in libglvnd.
>
> * To make configuration optionally more "portable", the selection override
>    mechanism should be able to refer to more generic names like
>    "performance" or "battery", and those generic names should be mapped
>    to specific GPUs/vendors on a per-system basis.
>
> * To make configuration optionally more explicit, the selection override
>    mechanism should be able to distinguish between individual GPUs by
>    using hardware specific identifiers such as PCI BusID-based names like
>    what DRI_PRIME currently honors (e.g., "pci-0000_03_00_0").
>
> Do those behaviors seem reasonable?
>
> If so, it seems like there are two general directions we could take to
> implement that infrastructure in client-side libglvnd and GLXVND within
> the X server, if the user or application profile requests a particular
> vendor, either by vendor name (e.g., "mesa"/"nvidia"), functional
> name (e.g., "battery"/"performance"), or hardware-based name (e.g.,
> "pci-0000_03_00_0"/pci-0000_01_00_0"):
>
> (1) If configured for PRIME GPU offloading (environment variable or
>      application profile), client-side libglvnd could load the possible
>      libGLX_${vendor}.so libraries it finds, and call into each to
>      find which vendor (and possibly which GPU) matches the specified
>      string. Once a vendor is selected, the vendor library could optionally
>      tell the X server which GLX vendor to use server-side for this
>      client connection.
>
> (2) The GLX implementations within the X server could, when registering
>      with GLXVND, tell GLXVND which screens they can support for PRIME
>      GPU offloading.  That list could be queried by client-side libglvnd,
>      and then used to interpret __GLX_VENDOR_LIBRARY_NAME and pick the
>      corresponding vendor library to load.  Client-side would tell the X
>      server which GLX vendor to use server-side for this client connection.
>
> In either direction, if the user-requested string is a hardware-based
> name ("pci-0000_03_00_0"), the GLX vendor library presumably needs to be
> told that GPU, so that the vendor implementation can use the right GPU
> (in the case that the vendor supports multiple GPUs in the system).
>
> But, both (1) and (2) are really just points on a continuum.  I suppose
> the more general question is: how much of the implementation should go
> in the server and how much should go in the client?
>
> At one extreme, the client could do nearly all the work (with the
> practical downside of potentially loading multiple vendor libraries in
> order to interpret __GLX_VENDOR_LIBRARY_NAME).
>
> At the other extreme, the server could do nearly all the work of
> generating the possible __GLX_VENDOR_LIBRARY_NAME strings (with the
> practical downside of each server-side GLX vendor needing to enumerate
> the GPUs it can drive, in order to generate the hardware-specific
> identifiers).
>
> I'm not sure where on that spectrum it makes the most sense to land,
> and I'm curious what others think.
>
> Thanks,
> - Andy
>

For a more concrete example, this is what I've been working on for a 
client-based interface:
https://github.com/kbrenneman/libglvnd/tree/libglx-gpu-offloading

For this design, I've tried to keep the interface as simple as possible 
and to impose as few requirements or assumptions as possible. The basic 
idea behind it is that the only thing that a GLX application has to care 
about is calling GLX functions, and the only thing that libglvnd has to 
care about is forwarding those functions to the correct vendor library.

The general design is this:
* Libglvnd gets a list of alternate vendor libraries from an app profile 
(config file, environment variable, whatever)
* For each vendor in that list, libglvnd will load the vendor and call a 
new callback function, which asks the vendor to set up offloading. This 
call applies to the whole display, so the vendor can do all of its 
display initialization here.
* If that callback succeeds, then libglvnd calls a second function to 
check which screens the vendor actually supports offloading on. Libglvnd 
assigns the vendor to those screens.
* For any remaining screens, libglvnd will use its current selection logic.

The entire interface is defined in an extension to the libglvnd GLX 
vendor library interface. That means the interface itself is entirely 
client-side, but a driver is free to use whatever combination of client- 
and server-side logic it wants. For example, a driver can implement 
device enumeration in the client vendor library (like Mesa does), or it 
could do that in the server and communicate the results back to the client.

You could also have a multiple client vendor libraries that all work 
with a single server-side library, or even a client vendor library that 
doesn't have a server-side counterpart at all.

The profile that libglvnd uses is just a list of vendor library names 
that libglvnd should try before it falls back to its normal vendor 
selection. Along with each vendor name, the profile can also optionally 
have some vendor-specific configuration data. That extra data can be 
used to select a specific device. For Mesa, for example, you could use 
the same string that you'd otherwise specify by setting the DRI_PRIME 
environment variable.

The config file format I put together is JSON-based. It contains a list 
of profiles (selected based on the executable name of the process), and 
each profile contains a list of vendors. In addition to naming a vendor 
directly, a profile can list a generic descriptor, which acts like a 
macro that expands out to a list of vendors. Drivers can install config 
files to provide profiles and to provide definitions for those 
descriptors. Libglvnd will merge the vendor lists in profiles and 
descriptors from different files so that multiple drivers don't clobber 
each other. As a result, it should be possible for vendors (or distros) 
to provide reasonable default behavior, but still allow a user to 
override any profile or descriptor if they want to.

I think a client-based interface like this is a strict functional 
superset of anything that requires server-side device enumeration. 
GLXVND would have to rely on the server-side vendor libraries to do that 
enumeration, and that same logic could just as easily be an 
implementation detail between a client and server library.

The one exception is that this interface doesn't allow offloading to 
different vendors on different screens if no single vendor can handle 
all of them, but in order to run into that case, you'd need at least two 
X screens and at least four different GPU vendors. That's still not a 
client versus server limitation, though, that's just a limitation of 
libglvnd selecting a single offload vendor and letting it initialize the 
whole display all at once.

Since this seems to be a sticking point, there's also an option to avoid 
unnecessarily loading extra client-side vendor libraries. If a client 
vendor needs a server-side counterpart, then libglvnd can filter it out 
based on a really simple server query. Right now, it just checks the 
GLX_VENDOR_NAMES_EXT string (since that was easy to test), but we may 
want to define some new string for this. This is the closest that 
libglvnd gets to the server at any point in this process, and even this 
part is optional and should be a pretty trivial extension to the GLXVND 
interface in the server.

-Kyle