[Mesa-dev] Introducing OpenSWR: High performance software rasterizer

Tue Oct 20 12:03:43 PDT 2015

Am 20.10.2015 um 19:11 schrieb Rowley, Timothy O:
> Hi.  I'd like to introduce the Mesa3D community to a software project
> that we hope to upstream.  We're a small team at Intel working on
> software defined visualization (http://sdvis.org/), and have
> opensource projects in both the raytracing (Embree, OSPRay) and
> rasterization (OpenSWR) realms.
> 
> We're a different Intel team from that of i965 fame, with a different
> type of customer and workloads.  Our customers have large clusters of
> compute nodes that for various reasons do not have GPUs, and are
> working with extremely large geometry models.
> 
> We've been working on a high performance, highly scalable rasterizer
> and driver to interface with Mesa3D.  Our rasterizer functions as a
> "software gpu", relying on the mature well-supported Mesa3D to provide
> API and state tracking layers.
> 
> We would like to contribute this code to Mesa3D and continue doing
> active development in your source repository.  We welcome discussion
> about how this will happen and questions about the project itself.
> Below are some answers to what we think might be frequently asked
> questions.
> 
> Bruce and I will be the public contacts for this project, but this
> project isn't solely our work - there's a dedicated group of people
> working on the core SWR code.
> 
>   Tim Rowley
>   Bruce Cherniak
> 
>   Intel Corporation
> 
> Why another software rasterizer?
> --------------------------------
> 
> Good question, given there are already three (swrast, softpipe,
> llvmpipe) in the Mesa3D tree. Two important reasons for this:
> 
>  * Architecture - given our focus on scientific visualization, our
>    workloads are much different than the typical game; we have heavy
>    vertex load and relatively simple shaders.  In addition, the core
>    counts of machines we run on are much higher.  These parameters led
>    to design decisions much different than llvmpipe.
> 
>  * Historical - Intel had developed a high performance software
>    graphics stack for internal purposes.  Later we adapted this
>    graphics stack for use in visualization and decided to move forward
>    with Mesa3D to provide a high quality API layer while at the same
>    time benefiting from the excellent performance the software
>    rasterizerizer gives us.
> 
> What's the architecture?
> ------------------------
> 
> SWR is a tile based immediate mode renderer with a sort-free threading
> model which is arranged as a ring of queues.  Each entry in the ring
> represents a draw context that contains all of the draw state and work
> queues.  An API thread sets up each draw context and worker threads
> will execute both the frontend (vertex/geometry processing) and
> backend (fragment) work as required.  The ring allows for backend
> threads to pull work in order.  Large draws are split into chunks to
> allow vertex processing to happen in parallel, with the backend work
> pickup preserving draw ordering.
> 
> Our pipeline uses just-in-time compiled code for the fetch shader that
> does vertex attribute gathering and AOS to SOA conversions, the vertex
> shader and fragment shaders, streamout, and fragment blending. SWR
> core also supports geometry and compute shaders but we haven't exposed
> them through our driver yet. The fetch shader, streamout, and blend is
> built internally to swr core using LLVM directly, while for the vertex
> and pixel shaders we reuse bits of llvmpipe from
> gallium/auxiliary/gallivm to build the kernels, which we wrap
> differently than llvmpipe's auxiliary/draw code.
> 
> What's the performance?
> -----------------------
> 
> For the types of high-geometry workloads we're interested in, we are
> significantly faster than llvmpipe.  This is to be expected, as
> llvmpipe only threads the fragment processing and not the geometry
> frontend.
> 
> The linked slide below shows some performance numbers from a benchmark
> dataset and application.  On a 36 total core dual E5-2699v3 we see
> performance 29x to 51x that of llvmpipe.  
> 
> 	http://openswr.org/slides/SWR_Sept15.pdf
> 
> While our current performance is quite good, we know there is more
> potential in this architecture.  When we switched from a prototype
> OpenGL driver to Mesa we regressed performance severely, some due to
> interface issues that need tuning, some differences in shader code
> generation, and some due to conformance and feature additions to the
> core swr.  We are looking to recovering most of this performance back.
> 
> What's the conformance?
> -----------------------
> 
> The major applications we are targeting are all based on the
> Visualization Toolkit (VTK), and as such our development efforts have
> been focused on making sure these work as best as possible.  Our
> current code passes vtk's rendering tests with their new "OpenGL2"
> (really OpenGL 3.2) backend at 99%.
> 
> piglit testing shows a much lower pass rate, roughly 80% at the time
> of writing.  Core SWR undergoes rigorous unit testing and we are quite
> confident in the rasterizer, and understand the areas where it
> currently has issues (example: line rendering is done with triangles,
> so doesn't match the strict line rendering rules).  The majority of
> the piglit failures are errors in our driver layer interfacing Mesa
> and SWR.  Fixing these issues is one of our major future development
> goals.
> 
> Why are you open sourcing this?
> -------------------------------
> 
>  * Our customers prefer open source, and allowing them to simply
>    download the Mesa source and enable our driver makes life much
>    easier for them.
> 
>  * The internal gallium APIs are not stable, so we'd like our driver
>    to be visible for changes.
> 
>  * It's easier to work with the Mesa community when the source we're
>    working with can be used as reference.
> 
> What are your development plans?
> --------------------------------
> 
>  * Performance - see the performance section earlier for details.
> 
>  * Conformance - see the conformance section earlier for details.
> 
>  * Features - core SWR has a lot of functionality we have yet to
>    expose through our driver, such as MSAA, geometry shaders, compute
>    shaders, and tesselation.
> 
>  * AVX512 support
> 
> What is the licensing of the code?
> ----------------------------------
> 
>  * All code is under the normal Mesa MIT license.
> 
> How will contributions be handled?
> ----------------------------------
> 
> This is our current thinking about how this will be handled.  We
> welcome input from those who might have more experience with handling
> this type of contribution.
> 
>  * The OpenSWR project consists of two codebases, the swr core and the
>    driver.  The swr core is a copy of an internal code repository, so
>    we (the Intel team) will deal with the headache of moving changes
>    back and forth.  While not ideal, it's the cost of sharing this
>    code.
> 
>  * Non-intel changes to swr core (src/gallium/drivers/swr/rasterizer)
>    will go through mesa-dev review.  Intel team will merge changes
>    from our internal repository without the review process.
> 
>  * The swr driver master repository will be the one in Mesa3D.  All
>    changes will go through mesa-dev review, save for those needed for
>    swr core updates.
> 
> Will this work on AMD?
> ----------------------
> 
>  * If using an AMD processor with AVX or AVX2, it should work though
>    we don't have that hardware around to test.  Patches if needed
>    would be welcome.
> 
> Will this work on ARM, MIPS, POWER, <other non-x86 architecture>?
> -------------------------------------------------------------------------
> 
>  * Not without a lot of work.  We make extensive use of AVX and AVX2
>    intrinsics in our code and the in-tree JIT creation.  It is not the
>    intention for this codebase to support non-x86 architectures.
> 
> What hardware do I need?
> ------------------------
> 
>  * Any x86 processor with at least AVX (introduced in the Intel
>    SandyBridge and AMD Bulldozer microarchitectures in 2011) will
>    work.
> 
>  * You don't need a fire-breathing Xeon machine to work on SWR - we do
>    day-to-day development with laptops and desktop CPUs.
> 
> Does one build work on both AVX and AVX2?
> -----------------------------------------
> 
>  * Unfortunately, no.  The architecture support is fixed at compile
>    time.  While the AVX version of course will run on AVX2 machines
>    and the jitted code will use AVX2, the overall performance will
>    suffer relative to a full AVX2 build.
> 
>  * There is some idea that if we move some code from the driver back
>    to SWR core, we could build two versions of libSWR and dynamically
>    load the correct version at runtime.  Unfortunately this mechanism
>    would not work with AVX512, as some of the SWR state structures
>    would change size.
> 
> Is the code ready to go?
> ------------------------
> 
>  * Yes, a version for review feedback is located at:
> 
>      https://github.com/OpenSWR/openswr-mesa
> 
>  * This requires llvm-3.6.  To build with configure, enable the swr
>    driver and also pass in --enable-swr-native.  SCons will
>    automatically build swr.
> 
>  * This version is based on Mesa 11.0, and the thought is to keep to
>    that version while we incorporate community feedback prior up
>    upstreaming.
> 

Certainly looks interesting...
>From a high level point of view, seems quite similar to llvmpipe (both
tile based, using llvm for jitting shaders, ...). Of course llvmpipe
isn't well suited for these kind of workloads (the most important use
case is desktop compositing, so a couple dozen vertices per frame but
millions of pixels...). Making vertex loads scale is something which
just wasn't worth the effort so far (there's not actually that many
people working on llvmpipe), albeit we realize that the completely
non-parallel nature of it currently actually can hinder scaling quite a
bit even for "typical" workloads (not desktop compositing, but "simple"
3d apps) once you've got enough cores/threads (8 or so), but that's
something we're not worried too much about.
I think requiring llvm 3.6 probably isn't going to work if you want to
upstream this, a minimum version of 3.6 is fine but the general rule is
things should still work with newer versions (including current
development version, seems like you're using c++ interface of llvm quite
a bit so that's probably going to require some #ifdef mess). Albeit I
guess if you just don't try to build the driver with non-released
versions that's probably ok (but will limit the ability for some people
to try out your driver).

Roland