Diagnosing first vs subsequent performance
christopher.barry at earborg.com
Tue Jan 19 10:19:15 PST 2016
On Tue, 19 Jan 2016 09:03:49 -0700
Lloyd Brown <lloyd_brown at byu.edu> wrote:
>I hope this isn't too dumb of a question, but I'm having trouble
>finding anything on it so far. Not sure if my google-fu is just not
>up to the task today, or if it's genuinely an obscure problem.
>I'm in the middle of setting up an HPC node with 2 NVIDIA Tesla K80s (4
>total GPUs), for some remote rendering tasks via VirtualGL. But I've
>got some strange behavior I can't yet account for, and I'm hoping
>someone can point me in the right direction for diagnosing it.
>In short, for accounting reasons, we'd prefer to have each GPU be
>attached to a separate Xorg PID. So I've built some very simple
>xorg.conf files (example attached), and I can launch Xorg instances
>with a simple syntax like this:
>> Xorg :0 -config /etc/X11/xorg.conf.gpu0
>When I run my tests, I'm also watching the output of "nvidia-smi" so I
>can see which Xorg and application PIDs, are using which GPUs.
>The first time I do something like "DISPLAY=:0.0 glxgears", I do *not*
>see that process (eg. glxgears) show up in the output of "nvidia-smi",
>and I see performance numbers consistent with CPU-based rendering. If
>I cancel (Ctrl-C), and run the exact same command again, I *do* see the
>process in the output of "nvidia-smi", on the correct GPU, and I see
>faster performance numbers consistent with GPU rendering.
>If I switch to a different display (eg "DISPLAY=:3.0"), I see the same
>behavior: slow the first time, fast on 2nd and subsequent instances.
>The same behavior even repeats when I switch back to a previously-used,
>but not most-recently-used, DISPLAY.
>I see similar behavior with other benchmarks (eg. glxspheres64,
>glmark2): slow first time on a display, faster after that.
>I have a sneaking suspicion that I'm just doing something really stupid
>with my configs, but right now I can't find it. I don't see anything
>relevant in the Xorg.log files, or stdout/stderr from the servers, but
>I can post those too, if needed.
>Any pointers where to go from here, would be appreciated.
>Other (possibly relevant) Info:
>OS Release: RHEL 6.6
>Xorg server 1.10.4 (from RHEL RPM)
>NVIDIA Driver 352.55
>Note: The attached example is for only one GPU. The others configs are
>exactly the same, with the exception of the PCI BusID, inside the GPU
>device section. I can verify via nvidia-smi, that the separate Xorg
>PIDs are attached to the correct GPUs.
This may or may not be useful, so sorry if I'm speaking out of turn,
but is it not possible to supply only the snippet of config that is
unique (e.g. the Device section), and allow X to do the rest? If it is,
this might be a way to remove other config data that could be part of
the issue. Again, just guessing, and might be a quick and easy test.
More information about the xorg