[Mesa-dev] [PATCH v2 00/13] nvc0: ARB_compute_shader for Kepler/Maxwell
Samuel Pitoiset
samuel.pitoiset at gmail.com
Thu Mar 31 19:11:28 UTC 2016
On 03/31/2016 06:27 PM, Ilia Mirkin wrote:
> On Thu, Mar 31, 2016 at 12:08 PM, Samuel Pitoiset
> <samuel.pitoiset at gmail.com> wrote:
>> Hi,
>>
>> This series adds support for ARB_compute_shader on GK104 and GM107+, except on
>> GK110 where one test miserably fail (related to texelFetch) for really weird
>> reasons. Anyway, this is not going to break anything because NVF0_COMPUTE is
>> still required for using compute on GK110. I will have a deeper look at this
>> fail later.
>>
>> Almost all dEQP compute tests pass with a very good ratio. As usual, the list
>> of fails is described below. About piglit, only two tests fail but this is
>> related to images support.
>>
>> I don't update GL3.txt in this series because compute shaders are not really
>> useful without images support.
>>
>> ARB_shader_image_load_store and ARB_shader_image_size are in work in progress
>> and should be ready in a couple of weeks.
>>
>> Please review,
>> Thanks!
>>
>> Samuel Pitoiset (13):
>> nvc0: bind driver cb for compute on c7[] for Kepler
>> nvc0: bind shader buffers for compute on Kepler
>> nvc0: bind user uniforms for compute on Kepler
>> nvc0: reserve an area for ubos info in the driver constbuf
>> nvc0: store ubo info to the driver constbuf on Kepler
>> nvc0: reduce likelihood of collision for real buffers on Kepler
>> nvc0: add indirect compute support on Kepler
>> nvc0/ir: add support for compute UBOs on Kepler
>> nvc0/ir: fix wrong pred emission for ld lock on GK104
>> nvc0/ir: add atomics support on shared memory for Kepler
>> nvc0/ir: do not lower shared+atomics on GM107+
>> nvc0: bump the maximum number of UBOs for compute on Kepler
>> nvc0: enable compute shaders on GK104 and GM107+
>>
>> .../drivers/nouveau/codegen/nv50_ir_driver.h | 1 +
>> .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 5 +-
>> .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 181 +++++++++++++-
>> .../nouveau/codegen/nv50_ir_lowering_nvc0.h | 4 +
>> src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 4 +-
>> src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 15 +-
>> src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 16 +-
>> src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 6 +-
>> src/gallium/drivers/nouveau/nvc0/nvc0_screen.h | 1 -
>> .../drivers/nouveau/nvc0/nvc0_state_validate.c | 8 +-
>> src/gallium/drivers/nouveau/nvc0/nvc0_tex.c | 2 +-
>> src/gallium/drivers/nouveau/nvc0/nve4_compute.c | 260 ++++++++++++++++-----
>> src/gallium/drivers/nouveau/nvc0/nve4_compute.h | 44 +---
>> 13 files changed, 421 insertions(+), 126 deletions(-)
>>
>> --
>> 2.7.4
>>
>> ** dEQP **
>>
>> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/scalar: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/vec2: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/vec3: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/highp_compute/vec4: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/mediump_compute/vec2: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/atan2/mediump_compute/vec4: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/scalar: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/vec2: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/vec3: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/highp_compute/vec4: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/scalar: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/vec2: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/vec3: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/ldexp/mediump_compute/vec4: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/scalar: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/vec2: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/vec3: fail
>> deqp-gles31/functional/shaders/builtin_functions/precision/tanh/highp_compute/vec4: fail
>
> These are all expected(ish). IIRC I looked into atan2 and it was
> returning numbers outside of the expected range, so we could use a
> clamp on that maybe. I suspect the issue with tanh is similar. These
> are all done in the GLSL IR anyways, and, as it happens, also fail on
> i965. So I wouldn't worry about those.
Yeah, these fails are unrelated to my work.
>
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_expression/compute/sampler2darrayshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_expression/compute/sampler2dshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_expression/compute/samplercubeshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/sampler2darrayshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/sampler2dshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/samplercubeshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/dynamically_uniform/compute/sampler2darrayshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/dynamically_uniform/compute/sampler2dshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/dynamically_uniform/compute/samplercubeshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/uniform/compute/sampler2darrayshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/uniform/compute/sampler2dshadow: fail
>> deqp-gles31/functional/shaders/opaque_type_indexing/sampler/uniform/compute/samplercubeshadow: fail
>
> These, OTOH, are not. This leads me to believe that I've missed out on
> some bit of subtlety wrt ordering or placement of the shadow argument
> on Kepler. Please trace the simplest one of these (I'm thinking
> gles31/functional/shaders/opaque_type_indexing/sampler/const_literal/compute/sampler2dshadow)
> on the blob, and see if it orders some arguments differently.
Sure, but my plan is to fix them later. :-)
Check your mailbox for the trace.
>
> Note that the current state of the art wrt tex argument ordering
> knowledge is at:
>
> https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp#n636
>
> Note that I may have made wild assumptions about the similarity of
> SM30 and SM35 argument ordering that may have been unwarranted. Were
> you seeing this on SM30 or SM35? Since they reordered *something* on
> every other ISA change, it seems a little odd that they would have
> kept things put for SM30 -> SM35. Perhaps the hw guys had a moment of
> weakness :)
>
> Not included in that description is the splitting up of the (up to) 8
> potential arguments between 2 (implicitly) quad register arguments.
> This logic is available here, in code form only:
>
> https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#n2126
>
> For nve0 (and, implicitly, nvf0), it just cuts them up at the 4th
> argument. However the cutting for nvc0 and gm107 are a little more
> sophisticated than that, and it's entirely possible some bit of
> subtlety was missed there. [And also, entirely possible that some
> wrong way works sometimes even though it's wrong.]
I'll have a look at the MMT trace to see if something is wrong.
Thanks for your explanation.
>
> -ilia
>
More information about the mesa-dev
mailing list