[Mesa-dev] Introducing OpenSWR: High performance software rasterizer
jfonseca at vmware.com
Tue Oct 20 14:23:03 PDT 2015
On 20/10/15 18:11, Rowley, Timothy O wrote:
> Hi. I'd like to introduce the Mesa3D community to a software project
> that we hope to upstream. We're a small team at Intel working on
> software defined visualization (http://sdvis.org/), and have
> opensource projects in both the raytracing (Embree, OSPRay) and
> rasterization (OpenSWR) realms.
> We're a different Intel team from that of i965 fame, with a different
> type of customer and workloads. Our customers have large clusters of
> compute nodes that for various reasons do not have GPUs, and are
> working with extremely large geometry models.
> We've been working on a high performance, highly scalable rasterizer
> and driver to interface with Mesa3D. Our rasterizer functions as a
> "software gpu", relying on the mature well-supported Mesa3D to provide
> API and state tracking layers.
> We would like to contribute this code to Mesa3D and continue doing
> active development in your source repository. We welcome discussion
> about how this will happen and questions about the project itself.
> Below are some answers to what we think might be frequently asked
> Bruce and I will be the public contacts for this project, but this
> project isn't solely our work - there's a dedicated group of people
> working on the core SWR code.
> Tim Rowley
> Bruce Cherniak
> Intel Corporation
> Why another software rasterizer?
> Good question, given there are already three (swrast, softpipe,
> llvmpipe) in the Mesa3D tree. Two important reasons for this:
> * Architecture - given our focus on scientific visualization, our
> workloads are much different than the typical game; we have heavy
> vertex load and relatively simple shaders. In addition, the core
> counts of machines we run on are much higher. These parameters led
> to design decisions much different than llvmpipe.
> * Historical - Intel had developed a high performance software
> graphics stack for internal purposes. Later we adapted this
> graphics stack for use in visualization and decided to move forward
> with Mesa3D to provide a high quality API layer while at the same
> time benefiting from the excellent performance the software
> rasterizerizer gives us.
It wouldn't be too dificult to make llvmpipe's vertex-shading
distributed across threads.
> What's the architecture?
> SWR is a tile based immediate mode renderer with a sort-free threading
> model which is arranged as a ring of queues. Each entry in the ring
> represents a draw context that contains all of the draw state and work
> queues. An API thread sets up each draw context and worker threads
> will execute both the frontend (vertex/geometry processing) and
> backend (fragment) work as required. The ring allows for backend
> threads to pull work in order. Large draws are split into chunks to
> allow vertex processing to happen in parallel, with the backend work
> pickup preserving draw ordering.
> Our pipeline uses just-in-time compiled code for the fetch shader that
> does vertex attribute gathering and AOS to SOA conversions, the vertex
> shader and fragment shaders, streamout, and fragment blending. SWR
> core also supports geometry and compute shaders but we haven't exposed
> them through our driver yet. The fetch shader, streamout, and blend is
> built internally to swr core using LLVM directly, while for the vertex
> and pixel shaders we reuse bits of llvmpipe from
> gallium/auxiliary/gallivm to build the kernels, which we wrap
> differently than llvmpipe's auxiliary/draw code.
> What's the performance?
> For the types of high-geometry workloads we're interested in, we are
> significantly faster than llvmpipe. This is to be expected, as
> llvmpipe only threads the fragment processing and not the geometry
> The linked slide below shows some performance numbers from a benchmark
> dataset and application. On a 36 total core dual E5-2699v3 we see
> performance 29x to 51x that of llvmpipe.
> While our current performance is quite good, we know there is more
> potential in this architecture. When we switched from a prototype
> OpenGL driver to Mesa we regressed performance severely, some due to
> interface issues that need tuning, some differences in shader code
> generation, and some due to conformance and feature additions to the
> core swr. We are looking to recovering most of this performance back.
I tried it on my i7-5500U, but I run into two issues:
- OpenSWR seems to only use 2 threads (even though my system support 4
- and even when I compensate llvmpipe to only use 2 rasterizer threads,
I still only get half the framerate of llvmpipe with the "gloss" Mesa
demo (a very simple texturing demo):
SWR create screen!
This processor supports AVX2.
720 frames in 5.004 seconds = 143.885 FPS
737 frames in 5.005 seconds = 147.253 FPS
729 frames in 5.004 seconds = 145.683 FPS
732 frames in 5.002 seconds = 146.341 FPS
735 frames in 5.001 seconds = 146.971 FPS
$ GALLIUM_DRIVER=llvmpipe LP_NUM_THREADS=2 ./gloss
1539 frames in 5.002 seconds = 307.677 FPS
1719 frames in 5 seconds = 343.8 FPS
1780 frames in 5.002 seconds = 355.858 FPS
1497 frames in 5.002 seconds = 299.28 FPS
1548 frames in 5.001 seconds = 309.538 FPS
I see similar ratio with more complex workload with the trace from:
(you'll need to download https://github.com/apitrace/apitrace and build)
My questions are:
- Is this the expected performance when texturing is used? Or is there
something wrong with my setup?
I understand that OpenSWR actually leverages llvmpipe (well
gallivm's) code for texture sampling, so I was expecting a smaller gap.
- What exactly was the benchmark used for SWR_Sept15.pdf's figures ? Was
there any texture sampling used on it, or was it just simple lighting?
More information about the mesa-dev