[Mesa-dev] software implementation of vulkan for gsoc/evoc

Mon Feb 13 21:37:18 UTC 2017

Am 13.02.2017 um 03:17 schrieb Jacob Lifshay:
> forgot to add mesa-dev when I sent.
> ---------- Forwarded message ----------
> From: "Jacob Lifshay" <programmerjake at gmail.com
> <mailto:programmerjake at gmail.com>>
> Date: Feb 12, 2017 6:16 PM
> Subject: Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc
> To: "Dave Airlie" <airlied at gmail.com <mailto:airlied at gmail.com>>
> Cc:
> 
> 
> 
> On Feb 12, 2017 5:34 PM, "Dave Airlie" <airlied at gmail.com
> <mailto:airlied at gmail.com>> wrote:
> 
>     > I'm assuming that control barriers in Vulkan are identical to barriers
>     > across a work-group in opencl. I was going to have a work-group be
>     a single
>     > OS thread, with the different work-items mapped to SIMD lanes. If
>     we need to
>     > have additional scheduling, I have written a javascript compiler that
>     > supports generator functions, so I mostly know how to write a llvm
>     pass to
>     > implement that. I was planning on writing the shader compiler
>     using llvm,
>     > using the whole-function-vectorization pass I will write, and
>     using the
>     > pre-existing spir-v to llvm translation layer. I would also write
>     some llvm
>     > passes to translate from texture reads and stuff to basic vector ops.
> 
>     Well the problem is number of work-groups that gets launched could be
>     quite high, and this can cause a large overhead in number of host
>     threads
>     that have to be launched. There was some discussion on this in mesa-dev
>     archives back when I added softpipe compute shaders.
> 
> 
> I would start a thread for each cpu, then have each thread run the
> compute shader a number of times instead of having a thread per shader
> invocation.
> 
> 
>     > I have a prototype rasterizer, however I haven't implemented
>     binning for
>     > triangles yet or implemented interpolation. currently, it can handle
>     > triangles in 3D homogeneous and calculate edge equations.
>     > https://github.com/programmerjake/tiled-renderer
>     <https://github.com/programmerjake/tiled-renderer>
>     > A previous 3d renderer that doesn't implement any vectorization
>     and has
>     > opengl 1.x level functionality:
>     > https://github.com/programmerjake/lib3d/blob/master/softrender.cpp
>     <https://github.com/programmerjake/lib3d/blob/master/softrender.cpp>
> 
>     Well I think we already have a completely fine rasterizer and binning
>     and whatever
>     else in the llvmpipe code base. I'd much rather any Mesa based
>     project doesn't
>     throw all of that away, there is no reason the same swrast backend
>     couldn't
>     be abstracted to be used for both GL and Vulkan and introducing another
>     just because it's interesting isn't a great fit for long term project
>     maintenance..
> 
>     If there are improvements to llvmpipe that need to be made, then that
>     is something
>     to possibly consider, but I'm not sure why a swrast vulkan needs a
>     from scratch
>     raster implemented. For a project that is so large in scope, I'd think
>     reusing that code
>     would be of some use. Since most of the fun stuff is all the texture
>     sampling etc.
> 
> 
> I actually think implementing the rasterization algorithm is the best
> part. I wanted the rasterization algorithm to be included in the
> shaders, eg. triangle setup and binning would be tacked on to the end of
> the vertex shader and parameter interpolation and early z tests would be
> tacked on to the beginning of the fragment shader and blending on to the
> end. That way, llvm could do more specialization and instruction
> scheduling than is possible in llvmpipe now.
> 
> so the tile rendering function would essentially be:
> 
> for(i = 0; i < triangle_count; i+= vector_width)
>     jit_functions[i](tile_x, tile_y, &triangle_setup_results[i]);
> 
> as opposed to the current llvmpipe code where there is a large amount of
> fixed code that isn't optimized with the shaders.

That isn't true for llvmpipe for the fragment side at least.
parameter interpolation, early z (if possible, otherwise late z), blend
etc. are all part of the fragment jit function in the end. The actual
edge function evaluation is not, albeit they use optimized assembly as
well (though this isn't quite as universal, only specifically for x86
sse2 and powerpc altivec, on other archs rasterization might take quite
noticeable cpu time with the scalar edge function evaluation).

On the vertex side though, llvmpipe can't do threaded setup or binning
(nor vertex shader execution itself for that matter). Clearly, this is
suboptimal, as is the inability to do vertex and fragment side in
parallel. (The latter is sort of a bug as it was designed to be able to
do it, for the former it wasn't even designed.)
The reason for that is basically that workloads interesting for llvmpipe
were thought to be pretty light on the vertex side - this is especially
true if you use it as a fallback for real hw to render your desktop
environment or so. (And that's the area where openSWR has big advantages
over llvmpipe, since its threading model allows that, so if you have
lots of vertices to process but not much work on the fragment side it
will easily win - plus llvmpipe multi-thread scaling is a bit limited
usually due to that).

So, imho there's not much point of adding even a third (aside from
softpipe which doesn't really count) full tile-based optimized sw
rasterizer - in fact I'd rather see llvmpipe and openSWR "merged" for
that matter. (Not that it's really a realistical short term goal, as
they are very different indeed despite both being tile based renderers,
although they do at least share actual jit vertex/fragment shaders.)

Roland