[Mesa-dev] software implementation of vulkan for gsoc/evoc

Mon Feb 13 15:54:49 UTC 2017

On 13.02.2017 03:17, Jacob Lifshay wrote:
> On Feb 12, 2017 5:34 PM, "Dave Airlie" <airlied at gmail.com
> <mailto:airlied at gmail.com>> wrote:
>
>     > I'm assuming that control barriers in Vulkan are identical to barriers
>     > across a work-group in opencl. I was going to have a work-group be
>     a single
>     > OS thread, with the different work-items mapped to SIMD lanes. If
>     we need to
>     > have additional scheduling, I have written a javascript compiler that
>     > supports generator functions, so I mostly know how to write a llvm
>     pass to
>     > implement that. I was planning on writing the shader compiler
>     using llvm,
>     > using the whole-function-vectorization pass I will write, and
>     using the
>     > pre-existing spir-v to llvm translation layer. I would also write
>     some llvm
>     > passes to translate from texture reads and stuff to basic vector ops.
>
>     Well the problem is number of work-groups that gets launched could be
>     quite high, and this can cause a large overhead in number of host
>     threads
>     that have to be launched. There was some discussion on this in mesa-dev
>     archives back when I added softpipe compute shaders.
>
>
> I would start a thread for each cpu, then have each thread run the
> compute shader a number of times instead of having a thread per shader
> invocation.

This will not work.

Please, read again what the barrier() instruction does: When the 
barrier() call is reached, _all_ threads within the workgroup are 
supposed to be run until they reach that barrier() call.

So you need a way of suspending and resuming shader threads when they 
reach the barrier() call.

The brute-force way of doing this would be to have one OS thread per 
shader thread (or per N shader threads, where N is a fixed number 
corresponding to SIMD lanes), but that gives you a giant number of OS 
threads to contend with.

The alternative is to do "threads" in user space, and there are a bunch 
of options for that. LLVM coroutines are worth checking out, since I 
think they're more or less designed for that kind of thing. Another 
option is user space stack switching, or perhaps something entirely 
different.

Nicolai

>
>
>     > I have a prototype rasterizer, however I haven't implemented
>     binning for
>     > triangles yet or implemented interpolation. currently, it can handle
>     > triangles in 3D homogeneous and calculate edge equations.
>     > https://github.com/programmerjake/tiled-renderer
>     <https://github.com/programmerjake/tiled-renderer>
>     > A previous 3d renderer that doesn't implement any vectorization
>     and has
>     > opengl 1.x level functionality:
>     > https://github.com/programmerjake/lib3d/blob/master/softrender.cpp
>     <https://github.com/programmerjake/lib3d/blob/master/softrender.cpp>
>
>     Well I think we already have a completely fine rasterizer and binning
>     and whatever
>     else in the llvmpipe code base. I'd much rather any Mesa based
>     project doesn't
>     throw all of that away, there is no reason the same swrast backend
>     couldn't
>     be abstracted to be used for both GL and Vulkan and introducing another
>     just because it's interesting isn't a great fit for long term project
>     maintenance..
>
>     If there are improvements to llvmpipe that need to be made, then that
>     is something
>     to possibly consider, but I'm not sure why a swrast vulkan needs a
>     from scratch
>     raster implemented. For a project that is so large in scope, I'd think
>     reusing that code
>     would be of some use. Since most of the fun stuff is all the texture
>     sampling etc.
>
>
> I actually think implementing the rasterization algorithm is the best
> part. I wanted the rasterization algorithm to be included in the
> shaders, eg. triangle setup and binning would be tacked on to the end of
> the vertex shader and parameter interpolation and early z tests would be
> tacked on to the beginning of the fragment shader and blending on to the
> end. That way, llvm could do more specialization and instruction
> scheduling than is possible in llvmpipe now.
>
> so the tile rendering function would essentially be:
>
> for(i = 0; i < triangle_count; i+= vector_width)
>     jit_functions[i](tile_x, tile_y, &triangle_setup_results[i]);
>
> as opposed to the current llvmpipe code where there is a large amount of
> fixed code that isn't optimized with the shaders.
>
>
>     > The scope that I intended to complete is the bare minimum to be vulkan
>     > conformant (i.e. no tessellation and no geometry shaders), so
>     implementing a
>     > loadable ICD for linux and windows that implements a single queue,
>     vertex,
>     > fragment, and compute shaders, implementing events, semaphores,
>     and fences,
>     > implementing images with the minimum requirements, supporting a
>     f32 depth
>     > buffer or a f24 with 8bit stencil, and supporting a
>     yet-to-be-determined
>     > compressed format. For the image optimal layouts, I will probably
>     use the
>     > same chunked layout I use in
>     >
>     https://github.com/programmerjake/tiled-renderer/blob/master2/image.h#L59
>     <https://github.com/programmerjake/tiled-renderer/blob/master2/image.h#L59>
>     ,
>     > where I have a linear array of chunks where each chunk has a
>     linear array of
>     > texels. If you think that's too big, we could leave out all of the
>     image
>     > formats except the two depth-stencil formats, the 8-bit and 32-bit
>     integer
>     > and 32-bit float formats.
>     >
>
>     Seems like a quite large scope, possibly a bit big for a GSoC though,
>     esp one that
>     intends to not use any existing Mesa code.
>
>
> most of the vulkan functions have a simple implementation when we don't
> need to worry about building stuff for a gpu and synchronization
> (because we have only one queue), and llvm implements most of the rest
> of the needed functionality. If we leave out most of the image formats,
> that would probably cut the amount of code by a third.
>
>
>     Dave.
>
>
>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>