Diagnosing first vs subsequent performance

Wed Jan 20 14:09:11 PST 2016

My guess is that each X server you start is switching to its own VT.
Since you're running Xorg by itself, there are initially no clients
connected.  When you run an application such as glxinfo that exits
immediately, or kill your copy of glxgears, it causes the server to
reset, which makes it initiate a VT switch to itself.  Only the X server
on the active VT is allowed to touch any of the hardware, so the other X
servers revoke GPU access whenever the one you touched last grabs the VT.

You can work around this problem somewhat by using the -sharevts and
-novtswitch options to make the X servers be active simultaneously, but
please be aware that this configuration is not officially supported so
you might run into strange and unexpected behavior.

On 01/19/2016 08:03 AM, Lloyd Brown wrote:
> Hi, all. 
> 
> I hope this isn't too dumb of a question, but I'm having trouble finding
> anything on it so far.  Not sure if my google-fu is just not up to the
> task today, or if it's genuinely an obscure problem.
> 
> I'm in the middle of setting up an HPC node with 2 NVIDIA Tesla K80s (4
> total GPUs), for some remote rendering tasks via VirtualGL.  But I've
> got some strange behavior I can't yet account for, and I'm hoping
> someone can point me in the right direction for diagnosing it.
> 
> In short, for accounting reasons, we'd prefer to have each GPU be
> attached to a separate Xorg PID.  So I've built some very simple
> xorg.conf files (example attached), and I can launch Xorg instances with
> a simple syntax like this:
> 
>> Xorg :0 -config /etc/X11/xorg.conf.gpu0
> 
> When I run my tests, I'm also watching the output of "nvidia-smi" so I
> can see which Xorg and application PIDs, are using which GPUs.
> 
> The first time I do something like "DISPLAY=:0.0 glxgears", I do *not*
> see that process (eg. glxgears) show up in the output of "nvidia-smi",
> and I see performance numbers consistent with CPU-based rendering.  If I
> cancel (Ctrl-C), and run the exact same command again, I *do* see the
> process in the output of "nvidia-smi", on the correct GPU, and I see
> faster performance numbers consistent with GPU rendering.
> 
> If I switch to a different display (eg "DISPLAY=:3.0"), I see the same
> behavior: slow the first time, fast on 2nd and subsequent instances. 
> The same behavior even repeats when I switch back to a previously-used,
> but not most-recently-used, DISPLAY.
> 
> I see similar behavior with other benchmarks (eg. glxspheres64,
> glmark2): slow first time on a display, faster after that.
> 
> I have a sneaking suspicion that I'm just doing something really stupid
> with my configs, but right now I can't find it.  I don't see anything
> relevant in the Xorg.log files, or stdout/stderr from the servers, but I
> can post those too, if needed.
> 
> Any pointers where to go from here, would be appreciated.
> 
> Thanks,
> Lloyd
> 
> 
> Other (possibly relevant) Info:
> OS Release: RHEL 6.6
> Kernel: 2.6.32-504.16.2.el6.x86_64
> Xorg server 1.10.4 (from RHEL RPM)
> NVIDIA Driver 352.55
> 
> Note: The attached example is for only one GPU.  The others configs are
> exactly the same, with the exception of the PCI BusID, inside the GPU
> device section.  I can verify via nvidia-smi, that the separate Xorg
> PIDs are attached to the correct GPUs.

-- 
Aaron
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------