[Mesa-dev] software implementation of vulkan for gsoc/evoc
Jacob Lifshay
programmerjake at gmail.com
Mon Feb 13 16:28:26 UTC 2017
forgot to add mesa-dev when I sent (again).
---------- Forwarded message ----------
From: "Jacob Lifshay" <programmerjake at gmail.com>
Date: Feb 13, 2017 8:27 AM
Subject: Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc
To: "Nicolai Hähnle" <nhaehnle at gmail.com>
Cc:
>
> On Feb 13, 2017 7:54 AM, "Nicolai Hähnle" <nhaehnle at gmail.com> wrote:
>
> On 13.02.2017 03:17, Jacob Lifshay wrote:
>
>> On Feb 12, 2017 5:34 PM, "Dave Airlie" <airlied at gmail.com
>> <mailto:airlied at gmail.com>> wrote:
>>
>> > I'm assuming that control barriers in Vulkan are identical to
>> barriers
>> > across a work-group in opencl. I was going to have a work-group be
>> a single
>> > OS thread, with the different work-items mapped to SIMD lanes. If
>> we need to
>> > have additional scheduling, I have written a javascript compiler
>> that
>> > supports generator functions, so I mostly know how to write a llvm
>> pass to
>> > implement that. I was planning on writing the shader compiler
>> using llvm,
>> > using the whole-function-vectorization pass I will write, and
>> using the
>> > pre-existing spir-v to llvm translation layer. I would also write
>> some llvm
>> > passes to translate from texture reads and stuff to basic vector
>> ops.
>>
>> Well the problem is number of work-groups that gets launched could be
>> quite high, and this can cause a large overhead in number of host
>> threads
>> that have to be launched. There was some discussion on this in
>> mesa-dev
>> archives back when I added softpipe compute shaders.
>>
>>
>> I would start a thread for each cpu, then have each thread run the
>> compute shader a number of times instead of having a thread per shader
>> invocation.
>>
>
> This will not work.
>
> Please, read again what the barrier() instruction does: When the barrier()
> call is reached, _all_ threads within the workgroup are supposed to be run
> until they reach that barrier() call.
>
>
> to clarify, I had meant that each os thread would run the sections of the
> shader between the barriers for all the shaders in a work group, then, when
> it finished the work group, it would go to the next work group assigned to
> the os thread.
>
> so, if our shader is:
> a = b + tid;
> barrier();
> d = e + f;
>
> and our simd width is 4, our work-group size is 128, and we have 16 os
> threads, then it will run for each os thread:
> for(workgroup = os_thread_index; workgroup < workgroup_count; workgroup++)
> {
> for(tid_in_workgroup = 0; tid_in_workgroup < 128; tid_in_workgroup +=
> 4)
> {
> ivec4 tid = ivec4(0, 1, 2, 3) + ivec4(tid_in_workgroup + workgroup
> * 128);
> a[tid_in_workgroup / 4] = ivec_add(b[tid_in_workgroup / 4], tid);
> }
> memory_fence(); // if needed
> for(tid_in_workgroup = 0; tid_in_workgroup < 128; tid_in_workgroup +=
> 4)
> {
> d[tid_in_workgroup / 4] = vec_add(e[tid_in_workgroup / 4],
> f[tid_in_workgroup / 4]);
> }
> }
> // after this, we run the next rendering or compute job
>
>
>> > I have a prototype rasterizer, however I haven't implemented
>> binning for
>> > triangles yet or implemented interpolation. currently, it can handle
>> > triangles in 3D homogeneous and calculate edge equations.
>> > https://github.com/programmerjake/tiled-renderer
>> <https://github.com/programmerjake/tiled-renderer>
>> > A previous 3d renderer that doesn't implement any vectorization
>> and has
>> > opengl 1.x level functionality:
>> > https://github.com/programmerjake/lib3d/blob/master/softrender.cpp
>> <https://github.com/programmerjake/lib3d/blob/master/softrender.cpp>
>>
>> Well I think we already have a completely fine rasterizer and binning
>> and whatever
>> else in the llvmpipe code base. I'd much rather any Mesa based
>> project doesn't
>> throw all of that away, there is no reason the same swrast backend
>> couldn't
>> be abstracted to be used for both GL and Vulkan and introducing
>> another
>> just because it's interesting isn't a great fit for long term project
>> maintenance..
>>
>> If there are improvements to llvmpipe that need to be made, then that
>> is something
>> to possibly consider, but I'm not sure why a swrast vulkan needs a
>> from scratch
>> raster implemented. For a project that is so large in scope, I'd think
>> reusing that code
>> would be of some use. Since most of the fun stuff is all the texture
>> sampling etc.
>>
>>
>> I actually think implementing the rasterization algorithm is the best
>> part. I wanted the rasterization algorithm to be included in the
>> shaders, eg. triangle setup and binning would be tacked on to the end of
>> the vertex shader and parameter interpolation and early z tests would be
>> tacked on to the beginning of the fragment shader and blending on to the
>> end. That way, llvm could do more specialization and instruction
>> scheduling than is possible in llvmpipe now.
>>
>> so the tile rendering function would essentially be:
>>
>> for(i = 0; i < triangle_count; i+= vector_width)
>> jit_functions[i](tile_x, tile_y, &triangle_setup_results[i]);
>>
>> as opposed to the current llvmpipe code where there is a large amount of
>> fixed code that isn't optimized with the shaders.
>>
>>
>> > The scope that I intended to complete is the bare minimum to be
>> vulkan
>> > conformant (i.e. no tessellation and no geometry shaders), so
>> implementing a
>> > loadable ICD for linux and windows that implements a single queue,
>> vertex,
>> > fragment, and compute shaders, implementing events, semaphores,
>> and fences,
>> > implementing images with the minimum requirements, supporting a
>> f32 depth
>> > buffer or a f24 with 8bit stencil, and supporting a
>> yet-to-be-determined
>> > compressed format. For the image optimal layouts, I will probably
>> use the
>> > same chunked layout I use in
>> >
>> https://github.com/programmerjake/tiled-renderer/blob/master
>> 2/image.h#L59
>> <https://github.com/programmerjake/tiled-renderer/blob/maste
>> r2/image.h#L59>
>> ,
>> > where I have a linear array of chunks where each chunk has a
>> linear array of
>> > texels. If you think that's too big, we could leave out all of the
>> image
>> > formats except the two depth-stencil formats, the 8-bit and 32-bit
>> integer
>> > and 32-bit float formats.
>> >
>>
>> Seems like a quite large scope, possibly a bit big for a GSoC though,
>> esp one that
>> intends to not use any existing Mesa code.
>>
>>
>> most of the vulkan functions have a simple implementation when we don't
>> need to worry about building stuff for a gpu and synchronization
>> (because we have only one queue), and llvm implements most of the rest
>> of the needed functionality. If we leave out most of the image formats,
>> that would probably cut the amount of code by a third.
>>
>>
>> Dave.
>>
>>
>>
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170213/5e59b9a6/attachment-0001.html>
More information about the mesa-dev
mailing list