[Mesa-dev] [PATCH v2 00/13] nvc0: ARB_compute_shader for Kepler/Maxwell

Thu Mar 31 16:27:14 UTC 2016

On Thu, Mar 31, 2016 at 12:08 PM, Samuel Pitoiset
<samuel.pitoiset at gmail.com> wrote:
> Hi,
>
> This series adds support for ARB_compute_shader on GK104 and GM107+, except on
> GK110 where one test miserably fail (related to texelFetch) for really weird
> reasons. Anyway, this is not going to break anything because NVF0_COMPUTE is
> still required for using compute on GK110. I will have a deeper look at this
> fail later.
>
> Almost all dEQP compute tests pass with a very good ratio. As usual, the list
> of fails is described below. About piglit, only two tests fail but this is
> related to images support.
>
> I don't update GL3.txt in this series because compute shaders are not really
> useful without images support.
>
> ARB_shader_image_load_store and ARB_shader_image_size are in work in progress
> and should be ready in a couple of weeks.
>
> Please review,
> Thanks!
>
> Samuel Pitoiset (13):
>   nvc0: bind driver cb for compute on c7[] for Kepler
>   nvc0: bind shader buffers for compute on Kepler
>   nvc0: bind user uniforms for compute on Kepler
>   nvc0: reserve an area for ubos info in the driver constbuf
>   nvc0: store ubo info to the driver constbuf on Kepler
>   nvc0: reduce likelihood of collision for real buffers on Kepler
>   nvc0: add indirect compute support on Kepler
>   nvc0/ir: add support for compute UBOs on Kepler
>   nvc0/ir: fix wrong pred emission for ld lock on GK104
>   nvc0/ir: add atomics support on shared memory for Kepler
>   nvc0/ir: do not lower shared+atomics on GM107+
>   nvc0: bump the maximum number of UBOs for compute on Kepler
>   nvc0: enable compute shaders on GK104 and GM107+
>
>  .../drivers/nouveau/codegen/nv50_ir_driver.h       |   1 +
>  .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  |   5 +-
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp      | 181 +++++++++++++-
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.h        |   4 +
>  src/gallium/drivers/nouveau/nvc0/nvc0_compute.c    |   4 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_context.h    |  15 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_program.c    |  16 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c     |   6 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.h     |   1 -
>  .../drivers/nouveau/nvc0/nvc0_state_validate.c     |   8 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_tex.c        |   2 +-
>  src/gallium/drivers/nouveau/nvc0/nve4_compute.c    | 260 ++++++++++++++++-----
>  src/gallium/drivers/nouveau/nvc0/nve4_compute.h    |  44 +---
>  13 files changed, 421 insertions(+), 126 deletions(-)
>
> --
> 2.7.4
>
> ** dEQP **
>
> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/scalar: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/vec2: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/vec3: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/vec4: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/mediump_compute/vec2: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/mediump_compute/vec4: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/scalar: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/vec2: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/vec3: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/vec4: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/scalar: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/vec2: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/vec3: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/vec4: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/scalar: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/vec2: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/vec3: fail
> deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/vec4: fail

These are all expected(ish). IIRC I looked into atan2 and it was
returning numbers outside of the expected range, so we could use a
clamp on that maybe. I suspect the issue with tanh is similar. These
are all done in the GLSL IR anyways, and, as it happens, also fail on
i965. So I wouldn't worry about those.

> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_expression/compute/sampler2darrayshadow: fail
> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_expression/compute/sampler2dshadow: fail
> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_expression/compute/samplercubeshadow: fail
> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/sampler2darrayshadow: fail
> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/sampler2dshadow: fail
> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/samplercubeshadow: fail
> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/dynamically_uniform/compute/sampler2darrayshadow: fail
> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/dynamically_uniform/compute/sampler2dshadow: fail
> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/dynamically_uniform/compute/samplercubeshadow: fail
> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/uniform/compute/sampler2darrayshadow: fail
> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/uniform/compute/sampler2dshadow: fail
> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/uniform/compute/samplercubeshadow: fail

These, OTOH, are not. This leads me to believe that I've missed out on
some bit of subtlety wrt ordering or placement of the shadow argument
on Kepler. Please trace the simplest one of these (I'm thinking
gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/sampler2dshadow)
on the blob, and see if it orders some arguments differently.

Note that the current state of the art wrt tex argument ordering
knowledge is at:

https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp#n636

Note that I may have made wild assumptions about the similarity of
SM30 and SM35 argument ordering that may have been unwarranted. Were
you seeing this on SM30 or SM35? Since they reordered *something* on
every other ISA change, it seems a little odd that they would have
kept things put for SM30 -> SM35. Perhaps the hw guys had a moment of
weakness :)

Not included in that description is the splitting up of the (up to) 8
potential arguments between 2 (implicitly) quad register arguments.
This logic is available here, in code form only:

https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#n2126

For nve0 (and, implicitly, nvf0), it just cuts them up at the 4th
argument. However the cutting for nvc0 and gm107 are a little more
sophisticated than that, and it's entirely possible some bit of
subtlety was missed there. [And also, entirely possible that some
wrong way works sometimes even though it's wrong.]

  -ilia