[Mesa-dev] software implementation of vulkan for gsoc/evoc
Nicolai Hähnle
nhaehnle at gmail.com
Tue Feb 14 08:18:29 UTC 2017
On 13.02.2017 17:54, Jacob Lifshay wrote:
> the algorithm i was going to use would get the union of the sets of live
> variables at the barriers (union over barriers), create an array of
> structs that holds them all, then for each barrier, insert the code to
> store all live variables, then end the for loop over tid_in_workgroup,
> then run the memory barrier, then start another for loop over
> tid_in_workgroup, then load all live variables.
Okay, sounds reasonable in theory.
There are some issues, like: how do you actually determine live
variables? If you're working off TGSI like llvmpipe does today, you'd
need to write your own analysis for that, but in a structured control
flow graph like TGSI has, that shouldn't be too difficult.
I'd still recommend you to at least seriously read through the LLVM
coroutine stuff.
Cheers,
Nicolai
> Jacob Lifshay
>
> On Feb 13, 2017 08:45, "Nicolai Hähnle" <nhaehnle at gmail.com
> <mailto:nhaehnle at gmail.com>> wrote:
>
> [ re-adding mesa-dev on the assumption that it got dropped by accident ]
>
> On 13.02.2017 17:27, Jacob Lifshay wrote:
>
> I would start a thread for each cpu, then have each
> thread run the
> compute shader a number of times instead of having a
> thread per
> shader
> invocation.
>
>
> This will not work.
>
> Please, read again what the barrier() instruction does: When the
> barrier() call is reached, _all_ threads within the
> workgroup are
> supposed to be run until they reach that barrier() call.
>
>
> to clarify, I had meant that each os thread would run the
> sections of
> the shader between the barriers for all the shaders in a work group,
> then, when it finished the work group, it would go to the next work
> group assigned to the os thread.
>
> so, if our shader is:
> a = b + tid;
> barrier();
> d = e + f;
>
> and our simd width is 4, our work-group size is 128, and we have
> 16 os
> threads, then it will run for each os thread:
> for(workgroup = os_thread_index; workgroup < workgroup_count;
> workgroup++)
> {
> for(tid_in_workgroup = 0; tid_in_workgroup < 128;
> tid_in_workgroup += 4)
> {
> ivec4 tid = ivec4(0, 1, 2, 3) + ivec4(tid_in_workgroup +
> workgroup * 128);
> a[tid_in_workgroup / 4] = ivec_add(b[tid_in_workgroup /
> 4], tid);
> }
> memory_fence(); // if needed
> for(tid_in_workgroup = 0; tid_in_workgroup < 128;
> tid_in_workgroup += 4)
> {
> d[tid_in_workgroup / 4] = vec_add(e[tid_in_workgroup / 4],
> f[tid_in_workgroup / 4]);
> }
> }
> // after this, we run the next rendering or compute job
>
>
> Okay good, that's the right concept.
>
> Actually doing that is not at all straightforward though: consider
> that the barrier() might occur inside a loop in the shader.
>
> So if you implemented that within the framework of llvmpipe, you'd
> make a lot of people very happy: it would allow finally adding
> compute shader support to llvmpipe. Mind you, that in itself would
> already be a pretty decent-sized project for GSoC!
>
> Cheers,
> Nicolai
>
More information about the mesa-dev
mailing list