[Mesa-dev] [PATCH 0/4] gallium: add new opcodes needed for ARB_gs5

Fri Apr 25 15:08:58 PDT 2014

Am 25.04.2014 23:11, schrieb Ilia Mirkin:
> On Fri, Apr 25, 2014 at 4:43 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>> Am 25.04.2014 19:41, schrieb Ilia Mirkin:
>>> This is enough to catch up to core mesa, with the exception of
>>> uaddCarry/usubBorrow -- those will require some thought. I don't like the way
>>> they were done in core mesa, so I may redo it differently. (Will start a
>>> discussion on that topic after I've given it more thought.)
>>>
>>> I ran the various piglit tests with
>>>
>>> DRAW_USE_LLVM=0
>>> GALLIUM_DRIVER=softpipe
>>> MESA_EXTENSION_OVERRIDE=GL_ARB_gpu_shader5
>>>
>>> after modifying them to require GLSL 1.40. (The extension requires GLSL 1.50
>>> since it deals with GS stuff as well, but it doesn't matter for any of these
>>> bits.)
>>>
>>> It felt a bit weird to have to add the 4-source logic, but I'm not sure of a
>>> better way of doing it. NVC0 has a BFI instruction that takes 3 arguments,
>>> where the last 2 args are just mushed together into 1.
>> My guess is all hw will have to do with 3 src args. Seems like radeonsi
>> would do something similar, though I don't know if all hw does it
>> exactly the same (meaning you could instead expose the 2 instructions
>> everybody actually uses). But there's nothing really wrong with
>> instructions using 4 source regs in gallium.
> 
> Oh, well on nvc0 it's just 1 instruction. Just that the offset/width
> arguments are passed in a single register (low byte for one, second
> lowest for the other). That's why I didn't want to enable the lowering
> pass in glsl that translates BFI to BFM + BFI-that-takes-mask
> instructions. Then I'd have to reassemble them into 1 on nvc0 :) Based
> on a peek into the radeon ISA's, it seems like they'll have to do the
> BFM + BFI thing "manually".
Ah yes in this case it makes sense that you'd use 4 src args bfi in
gallium. I guess nvidia just did things differently, that solution
sounds ok as well (the real limitation is of course just the number of
source arguments, but if you stick just two args together or split up
the logic slightly to combine these 2 sources in a meaningful way first
doesn't really make much of a difference).

> 
>>
>>>
>>> Also, ARB_gs5 only lets you have one offset/width setting for the entire
>>> vector (for both BFI and BFE), but I didn't enforce that in the TGSI
>>> version. (SM5 doesn't seem to make that restriction either.)
>>>
>>> I'm working on a nvc0 impl for all this too, but wanted to send this out first
>>> in case I messed something up and will have to change a bunch of things
>>> around.
>>>
>>> I figure there will later be a PIPE_CAP_SM5 that will be set if all of these
>>> opcodes are supported (and it could subsume PIPE_CAP_TEXTURE_GATHER_SM5). For
>>> now, there's too much of ARB_gs5 still unsupported to really worry about it.
>> PIPE_CAP_SM5 also implies things like tesselation.
>> I guess though if there's no extensions which would expose some of these
>> things separately it will eventually have to go all together...
>> A pity since some of the stuff would be easy to implement for instance
>> in llvmpipe (like the new instructions) whereas others (non-constant
>> indexing into sampler arrays???) are definitely not trivial.
> 
> ARB_gs5 should definitely have been like 5+ separate extensions, based
> on what I've seen the average-sized extension defines. But it is what
> it is. If PIPE_CAP_SM5 is too broad, we could come up with something
> that more narrowly specifies that ARB_gs5 behaviour is available. But
> then, I suspect at least for nvc0, tesselation will be pretty easy --
> the driver already supports it, just need to hook it up to whatever
> the eventual interface will be. In any case, since there are no real
> users of these yet, no real reason to figure out the PIPE_CAP right
> now either.
Right. But if there's no other extension with just pieces of it
available anywhere, there's really no point of having separate bits -
even for just testing parts of it you'd have to force enable the whole
thing anyway.

> 
>>
>>>
>>> Ilia Mirkin (4):
>>>   gallium: add new opcodes for ARB_gs5 bit manipulation support
>>>   glsl: fix bitfield_insert use when doing ldexp lowering
>>>   mesa/st: implement new bit manipulation opcodes
>>>   softpipe: add tgsi_exec support for new bit manipulation opcodes
>>>
>>>  src/gallium/auxiliary/tgsi/tgsi_exec.c     | 188 +++++++++++++++++++++++++++++
>>>  src/gallium/auxiliary/tgsi/tgsi_info.c     |   8 ++
>>>  src/gallium/auxiliary/util/u_math.h        |  11 ++
>>>  src/gallium/docs/source/tgsi.rst           |  51 ++++++++
>>>  src/gallium/include/pipe/p_shader_tokens.h |  11 +-
>>>  src/glsl/lower_instructions.cpp            |   6 +-
>>>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  70 ++++++++---
>>>  7 files changed, 324 insertions(+), 21 deletions(-)
>>>
>>
>> Roland