[Mesa-dev] Introducing OpenSWR: High performance software rasterizer
Rowley, Timothy O
timothy.o.rowley at intel.com
Tue Oct 20 15:16:14 PDT 2015
> On Oct 20, 2015, at 4:23 PM, Jose Fonseca <jfonseca at vmware.com> wrote:
> I tried it on my i7-5500U, but I run into two issues:
> - OpenSWR seems to only use 2 threads (even though my system support 4 threads)
> - and even when I compensate llvmpipe to only use 2 rasterizer threads, I still only get half the framerate of llvmpipe with the "gloss" Mesa demo (a very simple texturing demo):
> $ ./gloss
> SWR create screen!
> This processor supports AVX2.
> 720 frames in 5.004 seconds = 143.885 FPS
> 737 frames in 5.005 seconds = 147.253 FPS
> 729 frames in 5.004 seconds = 145.683 FPS
> 732 frames in 5.002 seconds = 146.341 FPS
> 735 frames in 5.001 seconds = 146.971 FPS
> $ GALLIUM_DRIVER=llvmpipe LP_NUM_THREADS=2 ./gloss
> 1539 frames in 5.002 seconds = 307.677 FPS
> 1719 frames in 5 seconds = 343.8 FPS
> 1780 frames in 5.002 seconds = 355.858 FPS
> 1497 frames in 5.002 seconds = 299.28 FPS
> 1548 frames in 5.001 seconds = 309.538 FPS
> I see similar ratio with more complex workload with the trace from:
> (you'll need to download https://github.com/apitrace/apitrace and build)
> My questions are:
> - Is this the expected performance when texturing is used? Or is there something wrong with my setup?
Two things are happening here to cause the behavior you’re seeing. First, OpenSWR only generates threads equal to the number of physical cores. On our workloads, going beyond that and using hyperthreads was a minimal or negative performance increase. Second, one thread is reserved for the API thread, which does not participate in either frontend (geometry) or backend (fragment) work. Thus on your two core 5500U OpenSWR only had one raster thread versus llvmpipe’s two, giving half the performance. If you want to switch OpenSWR to using hyperthreads, set the environment variable KNOB_MAX_THREADS_PER_CORE=0.
> I understand that OpenSWR actually leverages llvmpipe (well gallivm's) code for texture sampling, so I was expecting a smaller gap.
Yes, we use gallivm’s texture sampler so our performance should be similar on texture-limited workloads. I tried a quick test of openarena on a 4-core machine and the performance delta was about 6% (default N-1 OpenSWR worker threads).
> - What exactly was the benchmark used for SWR_Sept15.pdf's figures ? Was there any texture sampling used on it, or was it just simple lighting?
I don’t have the apitrace in front of me, but I believe the turbulence data was two-sided lit, with a textured plane.
More information about the mesa-dev