[Mesa-dev] [PATCH v2 00/13] nvc0: ARB_compute_shader for Kepler/Maxwell

Thu Mar 31 19:11:28 UTC 2016

On 03/31/2016 06:27 PM, Ilia Mirkin wrote:
> On Thu, Mar 31, 2016 at 12:08 PM, Samuel Pitoiset
> <samuel.pitoiset at gmail.com> wrote:
>> Hi,
>>
>> This series adds support for ARB_compute_shader on GK104 and GM107+, except on
>> GK110 where one test miserably fail (related to texelFetch) for really weird
>> reasons. Anyway, this is not going to break anything because NVF0_COMPUTE is
>> still required for using compute on GK110. I will have a deeper look at this
>> fail later.
>>
>> Almost all dEQP compute tests pass with a very good ratio. As usual, the list
>> of fails is described below. About piglit, only two tests fail but this is
>> related to images support.
>>
>> I don't update GL3.txt in this series because compute shaders are not really
>> useful without images support.
>>
>> ARB_shader_image_load_store and ARB_shader_image_size are in work in progress
>> and should be ready in a couple of weeks.
>>
>> Please review,
>> Thanks!
>>
>> Samuel Pitoiset (13):
>>    nvc0: bind driver cb for compute on c7[] for Kepler
>>    nvc0: bind shader buffers for compute on Kepler
>>    nvc0: bind user uniforms for compute on Kepler
>>    nvc0: reserve an area for ubos info in the driver constbuf
>>    nvc0: store ubo info to the driver constbuf on Kepler
>>    nvc0: reduce likelihood of collision for real buffers on Kepler
>>    nvc0: add indirect compute support on Kepler
>>    nvc0/ir: add support for compute UBOs on Kepler
>>    nvc0/ir: fix wrong pred emission for ld lock on GK104
>>    nvc0/ir: add atomics support on shared memory for Kepler
>>    nvc0/ir: do not lower shared+atomics on GM107+
>>    nvc0: bump the maximum number of UBOs for compute on Kepler
>>    nvc0: enable compute shaders on GK104 and GM107+
>>
>>   .../drivers/nouveau/codegen/nv50_ir_driver.h       |   1 +
>>   .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  |   5 +-
>>   .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp      | 181 +++++++++++++-
>>   .../nouveau/codegen/nv50_ir_lowering_nvc0.h        |   4 +
>>   src/gallium/drivers/nouveau/nvc0/nvc0_compute.c    |   4 +-
>>   src/gallium/drivers/nouveau/nvc0/nvc0_context.h    |  15 +-
>>   src/gallium/drivers/nouveau/nvc0/nvc0_program.c    |  16 +-
>>   src/gallium/drivers/nouveau/nvc0/nvc0_screen.c     |   6 +-
>>   src/gallium/drivers/nouveau/nvc0/nvc0_screen.h     |   1 -
>>   .../drivers/nouveau/nvc0/nvc0_state_validate.c     |   8 +-
>>   src/gallium/drivers/nouveau/nvc0/nvc0_tex.c        |   2 +-
>>   src/gallium/drivers/nouveau/nvc0/nve4_compute.c    | 260 ++++++++++++++++-----
>>   src/gallium/drivers/nouveau/nvc0/nve4_compute.h    |  44 +---
>>   13 files changed, 421 insertions(+), 126 deletions(-)
>>
>> --
>> 2.7.4
>>
>> ** dEQP **
>>
>> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/scalar: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/vec2: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/vec3: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/vec4: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/mediump_compute/vec2: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/mediump_compute/vec4: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/scalar: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/vec2: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/vec3: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/vec4: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/scalar: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/vec2: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/vec3: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/vec4: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/scalar: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/vec2: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/vec3: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/vec4: fail
>
> These are all expected(ish). IIRC I looked into atan2 and it was
> returning numbers outside of the expected range, so we could use a
> clamp on that maybe. I suspect the issue with tanh is similar. These
> are all done in the GLSL IR anyways, and, as it happens, also fail on
> i965. So I wouldn't worry about those.

Yeah, these fails are unrelated to my work.

>
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_expression/compute/sampler2darrayshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_expression/compute/sampler2dshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_expression/compute/samplercubeshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/sampler2darrayshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/sampler2dshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/samplercubeshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/dynamically_uniform/compute/sampler2darrayshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/dynamically_uniform/compute/sampler2dshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/dynamically_uniform/compute/samplercubeshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/uniform/compute/sampler2darrayshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/uniform/compute/sampler2dshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/uniform/compute/samplercubeshadow: fail
>
> These, OTOH, are not. This leads me to believe that I've missed out on
> some bit of subtlety wrt ordering or placement of the shadow argument
> on Kepler. Please trace the simplest one of these (I'm thinking
> gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/sampler2dshadow)
> on the blob, and see if it orders some arguments differently.

Sure, but my plan is to fix them later. :-)

Check your mailbox for the trace.

>
> Note that the current state of the art wrt tex argument ordering
> knowledge is at:
>
> https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp#n636
>
> Note that I may have made wild assumptions about the similarity of
> SM30 and SM35 argument ordering that may have been unwarranted. Were
> you seeing this on SM30 or SM35? Since they reordered *something* on
> every other ISA change, it seems a little odd that they would have
> kept things put for SM30 -> SM35. Perhaps the hw guys had a moment of
> weakness :)
>
> Not included in that description is the splitting up of the (up to) 8
> potential arguments between 2 (implicitly) quad register arguments.
> This logic is available here, in code form only:
>
> https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#n2126
>
> For nve0 (and, implicitly, nvf0), it just cuts them up at the 4th
> argument. However the cutting for nvc0 and gm107 are a little more
> sophisticated than that, and it's entirely possible some bit of
> subtlety was missed there. [And also, entirely possible that some
> wrong way works sometimes even though it's wrong.]

I'll have a look at the MMT trace to see if something is wrong.
Thanks for your explanation.

>
>    -ilia
>