[Mesa-dev] [PATCH 7/7] softpipe: add support for compute shaders.
jfonseca at vmware.com
Wed Apr 27 16:13:59 UTC 2016
On 27/04/16 02:46, Roland Scheidegger wrote:
> Am 27.04.2016 um 03:05 schrieb Dave Airlie:
>> On 27 April 2016 at 11:00, Dave Airlie <airlied at gmail.com> wrote:
>>>>> So far I've set the execmask to 1 active channel, I'm contemplating
>>>>> changing that
>>>>> though and using less machines.
>>>> Ah yes, I think that would indeed be desirable.
>>> I'll look into it, though it's not that trivial, since you might have a 1x20x1
>>> layout, also having to make sure each thread gets the correct system values.
> Looks doable though. I'm mostly asking because the whole point of
> compute shaders is things running in parallel, and while that wouldn't
> really run in parallel it would at least slightly look like it...
>>>>> Any ideas how to implement this in llvm? :-) 1024 CPU threads?
>>>> I suppose 1024 is really the minimum work size you have to support?
>>>> But since things are always run 4-wide (or 8-wide) that would "only" be
>>>> 256 (or 128) threads. That many threads sound a bit suboptimal to me
>>>> (unless you really have a boatload of cpu cores), but why not - I
>>>> suppose you can always pause some of the threads, not all need to be
>>>> active at the same time.
>>>> Though I wonder what the opencl-on-cpu guys do...
>>> pocl appears to spawn a number of threads and split the work out amongst
>>> them in the X direction.
>>> However I'm not seeing how they handle barriers, or if they handle
>>> them correctly at all.
>> Okay newer versions of pocl seem to have some sort of thread scheduler,
>> that schedule workgroups across up to 8 threads, however I can't see how
>> they deal with barriers still.
> Yes the problem with barriers is what I had in mind too. Otherwise could
> just create worker threads, which pick up whatever work items are left.
Regarding llvmpipe, the simple solution seems indeed to be to use one os
thread for one register worth.
The second, intermediate, solution is to use the same number of threads
(ie, == to the number of CPU), each using very large vectors (ie,
1024/num-cpus ), let LLVM deal with breaking those vectors in smaller units.
Emitting LLVM IR such way that it's able to stop/resume execution in the
middle of a thread seems hard (thought not impossible, since we already
deal with execution masks, so it would be mostly a matter of spilling
all input/temp registers and execution maks to/from malloc memory.
Another solution might be to integrate some thirdparty library that
implements so called green/user-space threads (e.g, via setjmp/longjmp,
or something else). I don't know any such library off-hand, and getting
to work on all OSes might be far from trivial. My gut feeling is that
this would be the most promissfull option long term: no need to have
thousands of OS threads, and no need to add increase complexity of LLVM
More information about the mesa-dev