[Mesa-dev] software implementation of vulkan for gsoc/evoc

Sun Jun 11 06:59:57 UTC 2017

On Sat, Jun 10, 2017 at 3:25 PM Jose Fonseca <jfonseca at vmware.com> wrote:

> I don't see how to effectively tack triangle setup into the vertex
> shader: vertex shader applies to vertices, where as triangle setup and
> bining applies to primitives.  Usually, each vertex gets transformed
> only once with llvmpipe, no matter how many triangles refer that vertex.
>   The only way to tack triangle setup into vertex shading would be if
> you processed vertices a primitive at a time.  Of course one could put
> an if-statement to skip reprocessing a vertex that already was
> processed, but then you have race conditions, and no benefit of inlining.
>
I was mostly thinking of non-indexed vertices.

And I'm afraid that tacking rasterization too is one those things that
> sound great on paper, quite bad in practice.  And I speak from
> experience: in fact llvmpipe had the last step of rasterization bolted
> on the fragment shaders for some time.  But we took it out because it
> was _slower_.
>
> The issue is that if you bolt on to the shader body, you either:
>
> - inline in the shader body code for the maxmimum number of planes that
> (which are 7, 3 sides of triangle, plus 4 sides of a scissor rect), and
> waste cpu cicles going through all of those tests, even when most of the
> time many of those tests aren't needed
>
> - or you generate if/for blocks for each place, so you only do the
> needed tests, but then you have branch prediction issues...
>
> Whereas if you keep rasterization _outside_ the shader you can have
> specialized functions to do the rasterization based on the primitive
> itself: (is the triangle fully inside the scissor, you need 3 planes, if
> the stamp is fully inside the triangle you need zero).  Essentially you
> can "compose" by coupling two functions calls: you call a rasterization
> function that's especiallized for the primitive, then a shading function
> that's specialized for the state (but not depends on the primitive).
>
> It makes sense: rasterization needs to be specialized for the primitive,
> not the graphics state; where as the shader needs to be specialized for
> the state.
>
I am planning on generating a function for each primitive type and state
combination, or I can convert all primitives into triangles and just have a
function for each state. The state includes stuff like if a particular
clipping/scissor equation needs to be checked. I did it that way in my
proof-of-concept code by using c++ templates to do the code duplication:
https://github.com/programmerjake/tiled-renderer/blob/47e09f5d711803b8e899c3669fbeae3e62c9e32c/main.cpp#L366

And this is just one of those non-intuitive things that's not obvious
> until one actually does a lot of profiling, a lot of experimentation.
> And trust me, lot of time was spent fine tuning this for llvmpipe (not
> be me -- most of rasterization was done by Keith Whitwell.)  And by
> throwing llvmpipe out of the window and starting a new software
> rendering from scratch you'd be just subscribing to do it all over again.
>
> Whereas if instead of starting from scratch, you take llvmpipe, and you
> rewrite/replace one component at a time, you can reach exactly the same
> destination you want to reach, however you'll have something working
> every step of the way, so when you take a bad step, you can measure
> performance impact, and readjust.  Plus if you run out of time, you have
> something useful -- not yet another half finished project, which quickly
> will rot away.
>
In the case that the project is not finished this summer, I'm still
planning on working on it, just at a reduced rate. If all else fails, we
will at least have a up-to-date spir-v to llvm converter that handles the
glsl spir-v extensions.

Regarding generating the spir-v -> scalar llvm, then do whole function
> vectorization, I don't think it's a bad idea per se.  If was I writing
> llvmpipe from scratch today I'd do something like that.  Especially
> because (scalar) LLVM IR is so pervasive in the graphics ecosistem anyway.
>
> It was only after I had tgsi -> llvm ir all done that I stumbled into
> http://compilers.cs.uni-saarland.de/projects/wfv/ .
>
> I think the important thing here is that, once you've vectorized the
> shader, and you converted your "texture_sample" to
> "texture_sample.vector8", and your "output_merger" intrinsics to
> "output_merger.vector8", or you log2/exp2, you then slot the fine tuned
> llvmpipe code for texture sampling and blending and math, as that's were
> your bottle necks tend to be.  Because if you plan to write all texture
> sampling from scratch then you need a time/clone machine to complete
> this in a summer; and if just use LLVM's / standard C runtime's
> sqrt/log2/exp2/sin/cos then it would be dead slow.
>
I am planning on using c++ templates to help with a lot of the texture
sampler code generation -- clang can convert it to llvm ir and then I can
inline it into the appropriate places. I think that all of the
non-compressed image formats should be pretty easy to handle that way, as
they are all pretty similar (bits packed into a long word or members of a
struct). I can implement interpolation on top of the functions to load and
unpack the image elements from memory. I'd estimate that, excluding the
compressed texture formats, I'd need less than 10k lines and maybe a week
or two to implement it all. (Glad I don't have to implement that in C.) I
am planning on compiling fdlibm with clang into llvm ir, then running my
vectorization algorithm on all the functions. LLVM has a spot where you can
tell it that you have optimized vectorized math intrinsics, I could add
them there, or implement another lowering pass to convert the intrinsics to
function calls, which can then be inlined. Hopefully, that will save most
of the work needed to implement vectorized math functions. Also, llvm is
already pretty good at converting vectorized sqrt intrinsics to vector sqrt
instructions, which x86 sse/avx and (i think) arm neon already have.

>
>
> Anyway, I hope this helps.  Best of luck.
>
Thanks,
Jacob Lifshay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170611/9375eb37/attachment-0001.html>