[Mesa-dev] Introducing OpenSWR: High performance software rasterizer

Wed Oct 21 15:41:06 PDT 2015

> On Oct 20, 2015, at 2:03 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> 
> Certainly looks interesting...
> From a high level point of view, seems quite similar to llvmpipe (both
> tile based, using llvm for jitting shaders, ...). Of course llvmpipe
> isn't well suited for these kind of workloads (the most important use
> case is desktop compositing, so a couple dozen vertices per frame but
> millions of pixels...). Making vertex loads scale is something which
> just wasn't worth the effort so far (there's not actually that many
> people working on llvmpipe), albeit we realize that the completely
> non-parallel nature of it currently actually can hinder scaling quite a
> bit even for "typical" workloads (not desktop compositing, but "simple"
> 3d apps) once you've got enough cores/threads (8 or so), but that's
> something we're not worried too much about.
> I think requiring llvm 3.6 probably isn't going to work if you want to
> upstream this, a minimum version of 3.6 is fine but the general rule is
> things should still work with newer versions (including current
> development version, seems like you're using c++ interface of llvm quite
> a bit so that's probably going to require some #ifdef mess). Albeit I
> guess if you just don't try to build the driver with non-released
> versions that's probably ok (but will limit the ability for some people
> to try out your driver).

Some differences between llvmpipe and swr based on my understanding of llvmpipe’s architecture:

threading model
	llvmpipe: single threaded vertex processing, up to 16 rasterization threads
	swr: common thread pool that pick up frontend or backend work as available
vertex processing
	llvmpipe: entire draw call processed in a single pass
	swr: large draws chopped into chunks that can be processed in parallel
frontend/backend coupling
	llvmpipe: separate binning pass in single threaded frontend
	swr: frontend vertex processing and binning combined in a single pass
primitive assembly and binning
	llvmpipe: scalar c code
	swr: x86 avx/avx2 working on vector of primitives
fragment processing
	llvmpipe: single jitted shader combining depth/fragment/stencil/blend on16x16 block
	swr: separate jitted fragment and blend shaders, plus templated depth test
in-memory representation
	llvmpipe: direct access to render targets
	swr: hot-tile working representation with load and/or store at required times

As you say, we do use LLVM’s C++ API.  While that has some advantages, it’s not guaranteed to be stable and can/does make nontrivial changes.  3.6 to 3.7 made some change to at least the GEP instruction which we could work around if necessary for upstreaming.

-Tim