[Mesa-dev] Introducing OpenSWR: High performance software rasterizer

Tue Oct 20 14:23:03 PDT 2015

On 20/10/15 18:11, Rowley, Timothy O wrote:
> Hi.  I'd like to introduce the Mesa3D community to a software project
> that we hope to upstream.  We're a small team at Intel working on
> software defined visualization (http://sdvis.org/), and have
> opensource projects in both the raytracing (Embree, OSPRay) and
> rasterization (OpenSWR) realms.
>
> We're a different Intel team from that of i965 fame, with a different
> type of customer and workloads.  Our customers have large clusters of
> compute nodes that for various reasons do not have GPUs, and are
> working with extremely large geometry models.
>
> We've been working on a high performance, highly scalable rasterizer
> and driver to interface with Mesa3D.  Our rasterizer functions as a
> "software gpu", relying on the mature well-supported Mesa3D to provide
> API and state tracking layers.
>
> We would like to contribute this code to Mesa3D and continue doing
> active development in your source repository.  We welcome discussion
> about how this will happen and questions about the project itself.
> Below are some answers to what we think might be frequently asked
> questions.
>
> Bruce and I will be the public contacts for this project, but this
> project isn't solely our work - there's a dedicated group of people
> working on the core SWR code.
>
>    Tim Rowley
>    Bruce Cherniak
>
>    Intel Corporation
>
> Why another software rasterizer?
> --------------------------------
>
> Good question, given there are already three (swrast, softpipe,
> llvmpipe) in the Mesa3D tree. Two important reasons for this:
>
>   * Architecture - given our focus on scientific visualization, our
>     workloads are much different than the typical game; we have heavy
>     vertex load and relatively simple shaders.  In addition, the core
>     counts of machines we run on are much higher.  These parameters led
>     to design decisions much different than llvmpipe.
>
>   * Historical - Intel had developed a high performance software
>     graphics stack for internal purposes.  Later we adapted this
>     graphics stack for use in visualization and decided to move forward
>     with Mesa3D to provide a high quality API layer while at the same
>     time benefiting from the excellent performance the software
>     rasterizerizer gives us.

It wouldn't be too dificult to make llvmpipe's vertex-shading 
distributed across threads.

> What's the architecture?
> ------------------------
>
> SWR is a tile based immediate mode renderer with a sort-free threading
> model which is arranged as a ring of queues.  Each entry in the ring
> represents a draw context that contains all of the draw state and work
> queues.  An API thread sets up each draw context and worker threads
> will execute both the frontend (vertex/geometry processing) and
> backend (fragment) work as required.  The ring allows for backend
> threads to pull work in order.  Large draws are split into chunks to
> allow vertex processing to happen in parallel, with the backend work
> pickup preserving draw ordering.
>
> Our pipeline uses just-in-time compiled code for the fetch shader that
> does vertex attribute gathering and AOS to SOA conversions, the vertex
> shader and fragment shaders, streamout, and fragment blending. SWR
> core also supports geometry and compute shaders but we haven't exposed
> them through our driver yet. The fetch shader, streamout, and blend is
> built internally to swr core using LLVM directly, while for the vertex
> and pixel shaders we reuse bits of llvmpipe from
> gallium/auxiliary/gallivm to build the kernels, which we wrap
> differently than llvmpipe's auxiliary/draw code.
>
> What's the performance?
> -----------------------
>
> For the types of high-geometry workloads we're interested in, we are
> significantly faster than llvmpipe.  This is to be expected, as
> llvmpipe only threads the fragment processing and not the geometry
> frontend.
>
> The linked slide below shows some performance numbers from a benchmark
> dataset and application.  On a 36 total core dual E5-2699v3 we see
> performance 29x to 51x that of llvmpipe.
>
> 	http://openswr.org/slides/SWR_Sept15.pdf
>
> While our current performance is quite good, we know there is more
> potential in this architecture.  When we switched from a prototype
> OpenGL driver to Mesa we regressed performance severely, some due to
> interface issues that need tuning, some differences in shader code
> generation, and some due to conformance and feature additions to the
> core swr.  We are looking to recovering most of this performance back.

I tried it on my i7-5500U, but I run into two issues:

- OpenSWR seems to only use 2 threads (even though my system support 4 
threads)

- and even when I compensate llvmpipe to only use 2 rasterizer threads, 
I still only get half the framerate of llvmpipe with the "gloss" Mesa 
demo (a very simple texturing demo):

$ ./gloss
SWR create screen!
This processor supports AVX2.
720 frames in 5.004 seconds = 143.885 FPS
737 frames in 5.005 seconds = 147.253 FPS
729 frames in 5.004 seconds = 145.683 FPS
732 frames in 5.002 seconds = 146.341 FPS
735 frames in 5.001 seconds = 146.971 FPS
[...]
$ GALLIUM_DRIVER=llvmpipe LP_NUM_THREADS=2 ./gloss
1539 frames in 5.002 seconds = 307.677 FPS
1719 frames in 5 seconds = 343.8 FPS
1780 frames in 5.002 seconds = 355.858 FPS
1497 frames in 5.002 seconds = 299.28 FPS
1548 frames in 5.001 seconds = 309.538 FPS
[..]

I see similar ratio with more complex  workload with the trace from:

   http://people.freedesktop.org/~jrfonseca/traces/furmark-1.8.2-svga.trace

(you'll need to download https://github.com/apitrace/apitrace and build)

My questions are:

- Is this the expected performance when texturing is used? Or is there 
something wrong with my setup?

   I understand that OpenSWR actually leverages llvmpipe (well 
gallivm's) code for texture sampling, so I was expecting a smaller gap.

- What exactly was the benchmark used for SWR_Sept15.pdf's figures ? Was 
there any texture sampling used on it, or was it just simple lighting?

Jose