[Mesa-dev] Introducing OpenSWR: High performance software rasterizer

Rowley, Timothy O timothy.o.rowley at intel.com
Tue Oct 20 10:11:26 PDT 2015

Hi.  I'd like to introduce the Mesa3D community to a software project
that we hope to upstream.  We're a small team at Intel working on
software defined visualization (http://sdvis.org/), and have
opensource projects in both the raytracing (Embree, OSPRay) and
rasterization (OpenSWR) realms.

We're a different Intel team from that of i965 fame, with a different
type of customer and workloads.  Our customers have large clusters of
compute nodes that for various reasons do not have GPUs, and are
working with extremely large geometry models.

We've been working on a high performance, highly scalable rasterizer
and driver to interface with Mesa3D.  Our rasterizer functions as a
"software gpu", relying on the mature well-supported Mesa3D to provide
API and state tracking layers.

We would like to contribute this code to Mesa3D and continue doing
active development in your source repository.  We welcome discussion
about how this will happen and questions about the project itself.
Below are some answers to what we think might be frequently asked

Bruce and I will be the public contacts for this project, but this
project isn't solely our work - there's a dedicated group of people
working on the core SWR code.

  Tim Rowley
  Bruce Cherniak

  Intel Corporation

Why another software rasterizer?

Good question, given there are already three (swrast, softpipe,
llvmpipe) in the Mesa3D tree. Two important reasons for this:

 * Architecture - given our focus on scientific visualization, our
   workloads are much different than the typical game; we have heavy
   vertex load and relatively simple shaders.  In addition, the core
   counts of machines we run on are much higher.  These parameters led
   to design decisions much different than llvmpipe.

 * Historical - Intel had developed a high performance software
   graphics stack for internal purposes.  Later we adapted this
   graphics stack for use in visualization and decided to move forward
   with Mesa3D to provide a high quality API layer while at the same
   time benefiting from the excellent performance the software
   rasterizerizer gives us.

What's the architecture?

SWR is a tile based immediate mode renderer with a sort-free threading
model which is arranged as a ring of queues.  Each entry in the ring
represents a draw context that contains all of the draw state and work
queues.  An API thread sets up each draw context and worker threads
will execute both the frontend (vertex/geometry processing) and
backend (fragment) work as required.  The ring allows for backend
threads to pull work in order.  Large draws are split into chunks to
allow vertex processing to happen in parallel, with the backend work
pickup preserving draw ordering.

Our pipeline uses just-in-time compiled code for the fetch shader that
does vertex attribute gathering and AOS to SOA conversions, the vertex
shader and fragment shaders, streamout, and fragment blending. SWR
core also supports geometry and compute shaders but we haven't exposed
them through our driver yet. The fetch shader, streamout, and blend is
built internally to swr core using LLVM directly, while for the vertex
and pixel shaders we reuse bits of llvmpipe from
gallium/auxiliary/gallivm to build the kernels, which we wrap
differently than llvmpipe's auxiliary/draw code.

What's the performance?

For the types of high-geometry workloads we're interested in, we are
significantly faster than llvmpipe.  This is to be expected, as
llvmpipe only threads the fragment processing and not the geometry

The linked slide below shows some performance numbers from a benchmark
dataset and application.  On a 36 total core dual E5-2699v3 we see
performance 29x to 51x that of llvmpipe.  


While our current performance is quite good, we know there is more
potential in this architecture.  When we switched from a prototype
OpenGL driver to Mesa we regressed performance severely, some due to
interface issues that need tuning, some differences in shader code
generation, and some due to conformance and feature additions to the
core swr.  We are looking to recovering most of this performance back.

What's the conformance?

The major applications we are targeting are all based on the
Visualization Toolkit (VTK), and as such our development efforts have
been focused on making sure these work as best as possible.  Our
current code passes vtk's rendering tests with their new "OpenGL2"
(really OpenGL 3.2) backend at 99%.

piglit testing shows a much lower pass rate, roughly 80% at the time
of writing.  Core SWR undergoes rigorous unit testing and we are quite
confident in the rasterizer, and understand the areas where it
currently has issues (example: line rendering is done with triangles,
so doesn't match the strict line rendering rules).  The majority of
the piglit failures are errors in our driver layer interfacing Mesa
and SWR.  Fixing these issues is one of our major future development

Why are you open sourcing this?

 * Our customers prefer open source, and allowing them to simply
   download the Mesa source and enable our driver makes life much
   easier for them.

 * The internal gallium APIs are not stable, so we'd like our driver
   to be visible for changes.

 * It's easier to work with the Mesa community when the source we're
   working with can be used as reference.

What are your development plans?

 * Performance - see the performance section earlier for details.

 * Conformance - see the conformance section earlier for details.

 * Features - core SWR has a lot of functionality we have yet to
   expose through our driver, such as MSAA, geometry shaders, compute
   shaders, and tesselation.

 * AVX512 support

What is the licensing of the code?

 * All code is under the normal Mesa MIT license.

How will contributions be handled?

This is our current thinking about how this will be handled.  We
welcome input from those who might have more experience with handling
this type of contribution.

 * The OpenSWR project consists of two codebases, the swr core and the
   driver.  The swr core is a copy of an internal code repository, so
   we (the Intel team) will deal with the headache of moving changes
   back and forth.  While not ideal, it's the cost of sharing this

 * Non-intel changes to swr core (src/gallium/drivers/swr/rasterizer)
   will go through mesa-dev review.  Intel team will merge changes
   from our internal repository without the review process.

 * The swr driver master repository will be the one in Mesa3D.  All
   changes will go through mesa-dev review, save for those needed for
   swr core updates.

Will this work on AMD?

 * If using an AMD processor with AVX or AVX2, it should work though
   we don't have that hardware around to test.  Patches if needed
   would be welcome.

Will this work on ARM, MIPS, POWER, <other non-x86 architecture>?

 * Not without a lot of work.  We make extensive use of AVX and AVX2
   intrinsics in our code and the in-tree JIT creation.  It is not the
   intention for this codebase to support non-x86 architectures.

What hardware do I need?

 * Any x86 processor with at least AVX (introduced in the Intel
   SandyBridge and AMD Bulldozer microarchitectures in 2011) will

 * You don't need a fire-breathing Xeon machine to work on SWR - we do
   day-to-day development with laptops and desktop CPUs.

Does one build work on both AVX and AVX2?

 * Unfortunately, no.  The architecture support is fixed at compile
   time.  While the AVX version of course will run on AVX2 machines
   and the jitted code will use AVX2, the overall performance will
   suffer relative to a full AVX2 build.

 * There is some idea that if we move some code from the driver back
   to SWR core, we could build two versions of libSWR and dynamically
   load the correct version at runtime.  Unfortunately this mechanism
   would not work with AVX512, as some of the SWR state structures
   would change size.

Is the code ready to go?

 * Yes, a version for review feedback is located at:


 * This requires llvm-3.6.  To build with configure, enable the swr
   driver and also pass in --enable-swr-native.  SCons will
   automatically build swr.

 * This version is based on Mesa 11.0, and the thought is to keep to
   that version while we incorporate community feedback prior up

More information about the mesa-dev mailing list