[Mesa-dev] software implementation of vulkan for gsoc/evoc

Mon Feb 13 16:28:26 UTC 2017

forgot to add mesa-dev when I sent (again).
---------- Forwarded message ----------
From: "Jacob Lifshay" <programmerjake at gmail.com>
Date: Feb 13, 2017 8:27 AM
Subject: Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc
To: "Nicolai Hähnle" <nhaehnle at gmail.com>
Cc:


>
> On Feb 13, 2017 7:54 AM, "Nicolai Hähnle" <nhaehnle at gmail.com> wrote:
>
> On 13.02.2017 03:17, Jacob Lifshay wrote:
>
>> On Feb 12, 2017 5:34 PM, "Dave Airlie" <airlied at gmail.com
>> <mailto:airlied at gmail.com>> wrote:
>>
>>     > I'm assuming that control barriers in Vulkan are identical to
>> barriers
>>     > across a work-group in opencl. I was going to have a work-group be
>>     a single
>>     > OS thread, with the different work-items mapped to SIMD lanes. If
>>     we need to
>>     > have additional scheduling, I have written a javascript compiler
>> that
>>     > supports generator functions, so I mostly know how to write a llvm
>>     pass to
>>     > implement that. I was planning on writing the shader compiler
>>     using llvm,
>>     > using the whole-function-vectorization pass I will write, and
>>     using the
>>     > pre-existing spir-v to llvm translation layer. I would also write
>>     some llvm
>>     > passes to translate from texture reads and stuff to basic vector
>> ops.
>>
>>     Well the problem is number of work-groups that gets launched could be
>>     quite high, and this can cause a large overhead in number of host
>>     threads
>>     that have to be launched. There was some discussion on this in
>> mesa-dev
>>     archives back when I added softpipe compute shaders.
>>
>>
>> I would start a thread for each cpu, then have each thread run the
>> compute shader a number of times instead of having a thread per shader
>> invocation.
>>
>
> This will not work.
>
> Please, read again what the barrier() instruction does: When the barrier()
> call is reached, _all_ threads within the workgroup are supposed to be run
> until they reach that barrier() call.
>
>
> to clarify, I had meant that each os thread would run the sections of the
> shader between the barriers for all the shaders in a work group, then, when
> it finished the work group, it would go to the next work group assigned to
> the os thread.
>
> so, if our shader is:
> a = b + tid;
> barrier();
> d = e + f;
>
> and our simd width is 4, our work-group size is 128, and we have 16 os
> threads, then it will run for each os thread:
> for(workgroup = os_thread_index; workgroup < workgroup_count; workgroup++)
> {
>     for(tid_in_workgroup = 0; tid_in_workgroup < 128; tid_in_workgroup +=
> 4)
>     {
>         ivec4 tid = ivec4(0, 1, 2, 3) + ivec4(tid_in_workgroup + workgroup
> * 128);
>         a[tid_in_workgroup / 4] = ivec_add(b[tid_in_workgroup / 4], tid);
>     }
>     memory_fence(); // if needed
>     for(tid_in_workgroup = 0; tid_in_workgroup < 128; tid_in_workgroup +=
> 4)
>     {
>         d[tid_in_workgroup / 4] = vec_add(e[tid_in_workgroup / 4],
> f[tid_in_workgroup / 4]);
>     }
> }
> // after this, we run the next rendering or compute job
>
>
>>     > I have a prototype rasterizer, however I haven't implemented
>>     binning for
>>     > triangles yet or implemented interpolation. currently, it can handle
>>     > triangles in 3D homogeneous and calculate edge equations.
>>     > https://github.com/programmerjake/tiled-renderer
>>     <https://github.com/programmerjake/tiled-renderer>
>>     > A previous 3d renderer that doesn't implement any vectorization
>>     and has
>>     > opengl 1.x level functionality:
>>     > https://github.com/programmerjake/lib3d/blob/master/softrender.cpp
>>     <https://github.com/programmerjake/lib3d/blob/master/softrender.cpp>
>>
>>     Well I think we already have a completely fine rasterizer and binning
>>     and whatever
>>     else in the llvmpipe code base. I'd much rather any Mesa based
>>     project doesn't
>>     throw all of that away, there is no reason the same swrast backend
>>     couldn't
>>     be abstracted to be used for both GL and Vulkan and introducing
>> another
>>     just because it's interesting isn't a great fit for long term project
>>     maintenance..
>>
>>     If there are improvements to llvmpipe that need to be made, then that
>>     is something
>>     to possibly consider, but I'm not sure why a swrast vulkan needs a
>>     from scratch
>>     raster implemented. For a project that is so large in scope, I'd think
>>     reusing that code
>>     would be of some use. Since most of the fun stuff is all the texture
>>     sampling etc.
>>
>>
>> I actually think implementing the rasterization algorithm is the best
>> part. I wanted the rasterization algorithm to be included in the
>> shaders, eg. triangle setup and binning would be tacked on to the end of
>> the vertex shader and parameter interpolation and early z tests would be
>> tacked on to the beginning of the fragment shader and blending on to the
>> end. That way, llvm could do more specialization and instruction
>> scheduling than is possible in llvmpipe now.
>>
>> so the tile rendering function would essentially be:
>>
>> for(i = 0; i < triangle_count; i+= vector_width)
>>     jit_functions[i](tile_x, tile_y, &triangle_setup_results[i]);
>>
>> as opposed to the current llvmpipe code where there is a large amount of
>> fixed code that isn't optimized with the shaders.
>>
>>
>>     > The scope that I intended to complete is the bare minimum to be
>> vulkan
>>     > conformant (i.e. no tessellation and no geometry shaders), so
>>     implementing a
>>     > loadable ICD for linux and windows that implements a single queue,
>>     vertex,
>>     > fragment, and compute shaders, implementing events, semaphores,
>>     and fences,
>>     > implementing images with the minimum requirements, supporting a
>>     f32 depth
>>     > buffer or a f24 with 8bit stencil, and supporting a
>>     yet-to-be-determined
>>     > compressed format. For the image optimal layouts, I will probably
>>     use the
>>     > same chunked layout I use in
>>     >
>>     https://github.com/programmerjake/tiled-renderer/blob/master
>> 2/image.h#L59
>>     <https://github.com/programmerjake/tiled-renderer/blob/maste
>> r2/image.h#L59>
>>     ,
>>     > where I have a linear array of chunks where each chunk has a
>>     linear array of
>>     > texels. If you think that's too big, we could leave out all of the
>>     image
>>     > formats except the two depth-stencil formats, the 8-bit and 32-bit
>>     integer
>>     > and 32-bit float formats.
>>     >
>>
>>     Seems like a quite large scope, possibly a bit big for a GSoC though,
>>     esp one that
>>     intends to not use any existing Mesa code.
>>
>>
>> most of the vulkan functions have a simple implementation when we don't
>> need to worry about building stuff for a gpu and synchronization
>> (because we have only one queue), and llvm implements most of the rest
>> of the needed functionality. If we leave out most of the image formats,
>> that would probably cut the amount of code by a third.
>>
>>
>>     Dave.
>>
>>
>>
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170213/5e59b9a6/attachment-0001.html>