[Mesa-dev] software implementation of vulkan for gsoc/evoc
Jose Fonseca
jfonseca at vmware.com
Sat Jun 10 22:24:55 UTC 2017
I know this is an old thread. I completely missed it the first time,
but recently rediscovered after reading
http://www.phoronix.com/scan.php?page=news_item&px=Vulkan-CPU-Repository
, and perhaps it's not too late for a couple comments FWIW.
On 13/02/17 02:17, Jacob Lifshay wrote:
> forgot to add mesa-dev when I sent.
> ---------- Forwarded message ----------
> From: "Jacob Lifshay" <programmerjake at gmail.com
> <mailto:programmerjake at gmail.com>>
> Date: Feb 12, 2017 6:16 PM
> Subject: Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc
> To: "Dave Airlie" <airlied at gmail.com <mailto:airlied at gmail.com>>
> Cc:
>
>
>
> On Feb 12, 2017 5:34 PM, "Dave Airlie" <airlied at gmail.com
> <mailto:airlied at gmail.com>> wrote:
>
> > I'm assuming that control barriers in Vulkan are identical to
> barriers
> > across a work-group in opencl. I was going to have a work-group
> be a single
> > OS thread, with the different work-items mapped to SIMD lanes. If
> we need to
> > have additional scheduling, I have written a javascript compiler that
> > supports generator functions, so I mostly know how to write a
> llvm pass to
> > implement that. I was planning on writing the shader compiler
> using llvm,
> > using the whole-function-vectorization pass I will write, and
> using the
> > pre-existing spir-v to llvm translation layer. I would also write
> some llvm
> > passes to translate from texture reads and stuff to basic vector ops.
>
> Well the problem is number of work-groups that gets launched could be
> quite high, and this can cause a large overhead in number of host
> threads
> that have to be launched. There was some discussion on this in mesa-dev
> archives back when I added softpipe compute shaders.
>
>
> I would start a thread for each cpu, then have each thread run the
> compute shader a number of times instead of having a thread per shader
> invocation.
At least for llvmpipe, last time I looked into this, using OS green
threads seemed a simple non-intrusive method of dealing with this --
https://lists.freedesktop.org/archives/mesa-dev/2016-April/114790.html
-- but it sounds like LLVM coroutines can handle this more effectively.
>
>
> > I have a prototype rasterizer, however I haven't implemented
> binning for
> > triangles yet or implemented interpolation. currently, it can handle
> > triangles in 3D homogeneous and calculate edge equations.
> > https://github.com/programmerjake/tiled-renderer
> <https://github.com/programmerjake/tiled-renderer>
> > A previous 3d renderer that doesn't implement any vectorization
> and has
> > opengl 1.x level functionality:
> >
> https://github.com/programmerjake/lib3d/blob/master/softrender.cpp
> <https://github.com/programmerjake/lib3d/blob/master/softrender.cpp>
>
> Well I think we already have a completely fine rasterizer and binning
> and whatever
> else in the llvmpipe code base. I'd much rather any Mesa based
> project doesn't
> throw all of that away, there is no reason the same swrast backend
> couldn't
> be abstracted to be used for both GL and Vulkan and introducing another
> just because it's interesting isn't a great fit for long term project
> maintenance..
>
> If there are improvements to llvmpipe that need to be made, then that
> is something
> to possibly consider, but I'm not sure why a swrast vulkan needs a
> from scratch
> raster implemented. For a project that is so large in scope, I'd think
> reusing that code
> would be of some use. Since most of the fun stuff is all the texture
> sampling etc.
>
>
> I actually think implementing the rasterization algorithm is the best
> part. I wanted the rasterization algorithm to be included in the
> shaders, eg. triangle setup and binning would be tacked on to the end of
> the vertex shader and parameter interpolation and early z tests would be
> tacked on to the beginning of the fragment shader and blending on to the
> end. That way, llvm could do more specialization and instruction
> scheduling than is possible in llvmpipe now.
Parameter interpolation, early z test, and blending *is* tacked to
llmvpipe's fragment shaders.
I don't see how to effectively tack triangle setup into the vertex
shader: vertex shader applies to vertices, where as triangle setup and
bining applies to primitives. Usually, each vertex gets transformed
only once with llvmpipe, no matter how many triangles refer that vertex.
The only way to tack triangle setup into vertex shading would be if
you processed vertices a primitive at a time. Of course one could put
an if-statement to skip reprocessing a vertex that already was
processed, but then you have race conditions, and no benefit of inlining.
And I'm afraid that tacking rasterization too is one those things that
sound great on paper, quite bad in practice. And I speak from
experience: in fact llvmpipe had the last step of rasterization bolted
on the fragment shaders for some time. But we took it out because it
was _slower_.
The issue is that if you bolt on to the shader body, you either:
- inline in the shader body code for the maxmimum number of planes that
(which are 7, 3 sides of triangle, plus 4 sides of a scissor rect), and
waste cpu cicles going through all of those tests, even when most of the
time many of those tests aren't needed
- or you generate if/for blocks for each place, so you only do the
needed tests, but then you have branch prediction issues...
Whereas if you keep rasterization _outside_ the shader you can have
specialized functions to do the rasterization based on the primitive
itself: (is the triangle fully inside the scissor, you need 3 planes, if
the stamp is fully inside the triangle you need zero). Essentially you
can "compose" by coupling two functions calls: you call a rasterization
function that's especiallized for the primitive, then a shading function
that's specialized for the state (but not depends on the primitive).
It makes sense: rasterization needs to be specialized for the primitive,
not the graphics state; where as the shader needs to be specialized for
the state.
And this is just one of those non-intuitive things that's not obvious
until one actually does a lot of profiling, a lot of experimentation.
And trust me, lot of time was spent fine tuning this for llvmpipe (not
be me -- most of rasterization was done by Keith Whitwell.) And by
throwing llvmpipe out of the window and starting a new software
rendering from scratch you'd be just subscribing to do it all over again.
Whereas if instead of starting from scratch, you take llvmpipe, and you
rewrite/replace one component at a time, you can reach exactly the same
destination you want to reach, however you'll have something working
every step of the way, so when you take a bad step, you can measure
performance impact, and readjust. Plus if you run out of time, you have
something useful -- not yet another half finished project, which quickly
will rot away.
Regarding generating the spir-v -> scalar llvm, then do whole function
vectorization, I don't think it's a bad idea per se. If was I writing
llvmpipe from scratch today I'd do something like that. Especially
because (scalar) LLVM IR is so pervasive in the graphics ecosistem anyway.
It was only after I had tgsi -> llvm ir all done that I stumbled into
http://compilers.cs.uni-saarland.de/projects/wfv/ .
I think the important thing here is that, once you've vectorized the
shader, and you converted your "texture_sample" to
"texture_sample.vector8", and your "output_merger" intrinsics to
"output_merger.vector8", or you log2/exp2, you then slot the fine tuned
llvmpipe code for texture sampling and blending and math, as that's were
your bottle necks tend to be. Because if you plan to write all texture
sampling from scratch then you need a time/clone machine to complete
this in a summer; and if just use LLVM's / standard C runtime's
sqrt/log2/exp2/sin/cos then it would be dead slow.
Anyway, I hope this helps. Best of luck.
Jose
More information about the mesa-dev
mailing list