[Mesa-dev] software implementation of vulkan for gsoc/evoc

Tue Feb 14 08:18:29 UTC 2017

On 13.02.2017 17:54, Jacob Lifshay wrote:
> the algorithm i was going to use would get the union of the sets of live
> variables at the barriers (union over barriers), create an array of
> structs that holds them all, then for each barrier, insert the code to
> store all live variables, then end the for loop over tid_in_workgroup,
> then run the memory barrier, then start another for loop over
> tid_in_workgroup, then load all live variables.

Okay, sounds reasonable in theory.

There are some issues, like: how do you actually determine live 
variables? If you're working off TGSI like llvmpipe does today, you'd 
need to write your own analysis for that, but in a structured control 
flow graph like TGSI has, that shouldn't be too difficult.

I'd still recommend you to at least seriously read through the LLVM 
coroutine stuff.

Cheers,
Nicolai

> Jacob Lifshay
>
> On Feb 13, 2017 08:45, "Nicolai Hähnle" <nhaehnle at gmail.com
> <mailto:nhaehnle at gmail.com>> wrote:
>
>     [ re-adding mesa-dev on the assumption that it got dropped by accident ]
>
>     On 13.02.2017 17:27, Jacob Lifshay wrote:
>
>                 I would start a thread for each cpu, then have each
>         thread run the
>                 compute shader a number of times instead of having a
>         thread per
>                 shader
>                 invocation.
>
>
>             This will not work.
>
>             Please, read again what the barrier() instruction does: When the
>             barrier() call is reached, _all_ threads within the
>         workgroup are
>             supposed to be run until they reach that barrier() call.
>
>
>         to clarify, I had meant that each os thread would run the
>         sections of
>         the shader between the barriers for all the shaders in a work group,
>         then, when it finished the work group, it would go to the next work
>         group assigned to the os thread.
>
>         so, if our shader is:
>         a = b + tid;
>         barrier();
>         d = e + f;
>
>         and our simd width is 4, our work-group size is 128, and we have
>         16 os
>         threads, then it will run for each os thread:
>         for(workgroup = os_thread_index; workgroup < workgroup_count;
>         workgroup++)
>         {
>             for(tid_in_workgroup = 0; tid_in_workgroup < 128;
>         tid_in_workgroup += 4)
>             {
>                 ivec4 tid = ivec4(0, 1, 2, 3) + ivec4(tid_in_workgroup +
>         workgroup * 128);
>                 a[tid_in_workgroup / 4] = ivec_add(b[tid_in_workgroup /
>         4], tid);
>             }
>             memory_fence(); // if needed
>             for(tid_in_workgroup = 0; tid_in_workgroup < 128;
>         tid_in_workgroup += 4)
>             {
>                 d[tid_in_workgroup / 4] = vec_add(e[tid_in_workgroup / 4],
>         f[tid_in_workgroup / 4]);
>             }
>         }
>         // after this, we run the next rendering or compute job
>
>
>     Okay good, that's the right concept.
>
>     Actually doing that is not at all straightforward though: consider
>     that the barrier() might occur inside a loop in the shader.
>
>     So if you implemented that within the framework of llvmpipe, you'd
>     make a lot of people very happy: it would allow finally adding
>     compute shader support to llvmpipe. Mind you, that in itself would
>     already be a pretty decent-sized project for GSoC!
>
>     Cheers,
>     Nicolai
>