[Mesa-dev] Introducing OpenSWR: High performance software rasterizer
Rowley, Timothy O
timothy.o.rowley at intel.com
Wed Oct 21 15:41:06 PDT 2015
> On Oct 20, 2015, at 2:03 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Certainly looks interesting...
> From a high level point of view, seems quite similar to llvmpipe (both
> tile based, using llvm for jitting shaders, ...). Of course llvmpipe
> isn't well suited for these kind of workloads (the most important use
> case is desktop compositing, so a couple dozen vertices per frame but
> millions of pixels...). Making vertex loads scale is something which
> just wasn't worth the effort so far (there's not actually that many
> people working on llvmpipe), albeit we realize that the completely
> non-parallel nature of it currently actually can hinder scaling quite a
> bit even for "typical" workloads (not desktop compositing, but "simple"
> 3d apps) once you've got enough cores/threads (8 or so), but that's
> something we're not worried too much about.
> I think requiring llvm 3.6 probably isn't going to work if you want to
> upstream this, a minimum version of 3.6 is fine but the general rule is
> things should still work with newer versions (including current
> development version, seems like you're using c++ interface of llvm quite
> a bit so that's probably going to require some #ifdef mess). Albeit I
> guess if you just don't try to build the driver with non-released
> versions that's probably ok (but will limit the ability for some people
> to try out your driver).
Some differences between llvmpipe and swr based on my understanding of llvmpipe’s architecture:
llvmpipe: single threaded vertex processing, up to 16 rasterization threads
swr: common thread pool that pick up frontend or backend work as available
llvmpipe: entire draw call processed in a single pass
swr: large draws chopped into chunks that can be processed in parallel
llvmpipe: separate binning pass in single threaded frontend
swr: frontend vertex processing and binning combined in a single pass
primitive assembly and binning
llvmpipe: scalar c code
swr: x86 avx/avx2 working on vector of primitives
llvmpipe: single jitted shader combining depth/fragment/stencil/blend on16x16 block
swr: separate jitted fragment and blend shaders, plus templated depth test
llvmpipe: direct access to render targets
swr: hot-tile working representation with load and/or store at required times
As you say, we do use LLVM’s C++ API. While that has some advantages, it’s not guaranteed to be stable and can/does make nontrivial changes. 3.6 to 3.7 made some change to at least the GEP instruction which we could work around if necessary for upstreaming.
More information about the mesa-dev