[Mesa-dev] software implementation of vulkan for gsoc/evoc
Nicolai Hähnle
nhaehnle at gmail.com
Mon Feb 13 15:54:49 UTC 2017
On 13.02.2017 03:17, Jacob Lifshay wrote:
> On Feb 12, 2017 5:34 PM, "Dave Airlie" <airlied at gmail.com
> <mailto:airlied at gmail.com>> wrote:
>
> > I'm assuming that control barriers in Vulkan are identical to barriers
> > across a work-group in opencl. I was going to have a work-group be
> a single
> > OS thread, with the different work-items mapped to SIMD lanes. If
> we need to
> > have additional scheduling, I have written a javascript compiler that
> > supports generator functions, so I mostly know how to write a llvm
> pass to
> > implement that. I was planning on writing the shader compiler
> using llvm,
> > using the whole-function-vectorization pass I will write, and
> using the
> > pre-existing spir-v to llvm translation layer. I would also write
> some llvm
> > passes to translate from texture reads and stuff to basic vector ops.
>
> Well the problem is number of work-groups that gets launched could be
> quite high, and this can cause a large overhead in number of host
> threads
> that have to be launched. There was some discussion on this in mesa-dev
> archives back when I added softpipe compute shaders.
>
>
> I would start a thread for each cpu, then have each thread run the
> compute shader a number of times instead of having a thread per shader
> invocation.
This will not work.
Please, read again what the barrier() instruction does: When the
barrier() call is reached, _all_ threads within the workgroup are
supposed to be run until they reach that barrier() call.
So you need a way of suspending and resuming shader threads when they
reach the barrier() call.
The brute-force way of doing this would be to have one OS thread per
shader thread (or per N shader threads, where N is a fixed number
corresponding to SIMD lanes), but that gives you a giant number of OS
threads to contend with.
The alternative is to do "threads" in user space, and there are a bunch
of options for that. LLVM coroutines are worth checking out, since I
think they're more or less designed for that kind of thing. Another
option is user space stack switching, or perhaps something entirely
different.
Nicolai
>
>
> > I have a prototype rasterizer, however I haven't implemented
> binning for
> > triangles yet or implemented interpolation. currently, it can handle
> > triangles in 3D homogeneous and calculate edge equations.
> > https://github.com/programmerjake/tiled-renderer
> <https://github.com/programmerjake/tiled-renderer>
> > A previous 3d renderer that doesn't implement any vectorization
> and has
> > opengl 1.x level functionality:
> > https://github.com/programmerjake/lib3d/blob/master/softrender.cpp
> <https://github.com/programmerjake/lib3d/blob/master/softrender.cpp>
>
> Well I think we already have a completely fine rasterizer and binning
> and whatever
> else in the llvmpipe code base. I'd much rather any Mesa based
> project doesn't
> throw all of that away, there is no reason the same swrast backend
> couldn't
> be abstracted to be used for both GL and Vulkan and introducing another
> just because it's interesting isn't a great fit for long term project
> maintenance..
>
> If there are improvements to llvmpipe that need to be made, then that
> is something
> to possibly consider, but I'm not sure why a swrast vulkan needs a
> from scratch
> raster implemented. For a project that is so large in scope, I'd think
> reusing that code
> would be of some use. Since most of the fun stuff is all the texture
> sampling etc.
>
>
> I actually think implementing the rasterization algorithm is the best
> part. I wanted the rasterization algorithm to be included in the
> shaders, eg. triangle setup and binning would be tacked on to the end of
> the vertex shader and parameter interpolation and early z tests would be
> tacked on to the beginning of the fragment shader and blending on to the
> end. That way, llvm could do more specialization and instruction
> scheduling than is possible in llvmpipe now.
>
> so the tile rendering function would essentially be:
>
> for(i = 0; i < triangle_count; i+= vector_width)
> jit_functions[i](tile_x, tile_y, &triangle_setup_results[i]);
>
> as opposed to the current llvmpipe code where there is a large amount of
> fixed code that isn't optimized with the shaders.
>
>
> > The scope that I intended to complete is the bare minimum to be vulkan
> > conformant (i.e. no tessellation and no geometry shaders), so
> implementing a
> > loadable ICD for linux and windows that implements a single queue,
> vertex,
> > fragment, and compute shaders, implementing events, semaphores,
> and fences,
> > implementing images with the minimum requirements, supporting a
> f32 depth
> > buffer or a f24 with 8bit stencil, and supporting a
> yet-to-be-determined
> > compressed format. For the image optimal layouts, I will probably
> use the
> > same chunked layout I use in
> >
> https://github.com/programmerjake/tiled-renderer/blob/master2/image.h#L59
> <https://github.com/programmerjake/tiled-renderer/blob/master2/image.h#L59>
> ,
> > where I have a linear array of chunks where each chunk has a
> linear array of
> > texels. If you think that's too big, we could leave out all of the
> image
> > formats except the two depth-stencil formats, the 8-bit and 32-bit
> integer
> > and 32-bit float formats.
> >
>
> Seems like a quite large scope, possibly a bit big for a GSoC though,
> esp one that
> intends to not use any existing Mesa code.
>
>
> most of the vulkan functions have a simple implementation when we don't
> need to worry about building stuff for a gpu and synchronization
> (because we have only one queue), and llvm implements most of the rest
> of the needed functionality. If we leave out most of the image formats,
> that would probably cut the amount of code by a third.
>
>
> Dave.
>
>
>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
More information about the mesa-dev
mailing list