Diagnosing first vs subsequent performance
lloyd_brown at byu.edu
Tue Jan 19 08:03:49 PST 2016
I hope this isn't too dumb of a question, but I'm having trouble finding
anything on it so far. Not sure if my google-fu is just not up to the
task today, or if it's genuinely an obscure problem.
I'm in the middle of setting up an HPC node with 2 NVIDIA Tesla K80s (4
total GPUs), for some remote rendering tasks via VirtualGL. But I've
got some strange behavior I can't yet account for, and I'm hoping
someone can point me in the right direction for diagnosing it.
In short, for accounting reasons, we'd prefer to have each GPU be
attached to a separate Xorg PID. So I've built some very simple
xorg.conf files (example attached), and I can launch Xorg instances with
a simple syntax like this:
> Xorg :0 -config /etc/X11/xorg.conf.gpu0
When I run my tests, I'm also watching the output of "nvidia-smi" so I
can see which Xorg and application PIDs, are using which GPUs.
The first time I do something like "DISPLAY=:0.0 glxgears", I do *not*
see that process (eg. glxgears) show up in the output of "nvidia-smi",
and I see performance numbers consistent with CPU-based rendering. If I
cancel (Ctrl-C), and run the exact same command again, I *do* see the
process in the output of "nvidia-smi", on the correct GPU, and I see
faster performance numbers consistent with GPU rendering.
If I switch to a different display (eg "DISPLAY=:3.0"), I see the same
behavior: slow the first time, fast on 2nd and subsequent instances.
The same behavior even repeats when I switch back to a previously-used,
but not most-recently-used, DISPLAY.
I see similar behavior with other benchmarks (eg. glxspheres64,
glmark2): slow first time on a display, faster after that.
I have a sneaking suspicion that I'm just doing something really stupid
with my configs, but right now I can't find it. I don't see anything
relevant in the Xorg.log files, or stdout/stderr from the servers, but I
can post those too, if needed.
Any pointers where to go from here, would be appreciated.
Other (possibly relevant) Info:
OS Release: RHEL 6.6
Xorg server 1.10.4 (from RHEL RPM)
NVIDIA Driver 352.55
Note: The attached example is for only one GPU. The others configs are
exactly the same, with the exception of the PCI BusID, inside the GPU
device section. I can verify via nvidia-smi, that the separate Xorg
PIDs are attached to the correct GPUs.
Fulton Supercomputing Lab
Brigham Young University
-------------- next part --------------
Screen 0 "Screen0" 0 0
InputDevice "Keyboard0" "CoreKeyboard"
InputDevice "Mouse0" "CorePointer"
# generated from default
Option "Protocol" "auto"
Option "Device" "/dev/input/mice"
Option "Emulate3Buttons" "no"
Option "ZAxisMapping" "4 5"
# generated from data in "/etc/sysconfig/keyboard"
Option "XkbLayout" "us"
Option "XkbModel" "pc105"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
VendorName "NVIDIA Corporation"
Option "UseDisplayDevice" "none"
More information about the xorg