[Mesa-dev] [PATCH v4 00/44] anv: SPV_KHR_16bit_storage/VK_KHR_16bit_storage for gen8+

Mon Dec 11 10:07:00 UTC 2017

El 09/12/17 a las 00:52, Jason Ekstrand escribió:
> While reviewing some of the UBO pushing comments from Topi, I
> discovered a fairly disturbing assert in brw_fs_nir.cpp in our
> implementation of nir_intrinsic_load_uniform:
>
>          /* Offsets are in bytes but they should always be multiples
> of 4 */
>          assert(const_offset->u32[0] % 4 == 0);
>
> This assertion isn't triggering with 16bit storage enabled for push
> constants.  Looking at the CTS tests in a bit more detail, they're
> very poor.  They only test basic types (scalars, vectors, and
> matrices) and only in arrays with a dynamic index.  This means that
> the constant optimization paths for UBO pulls aren't getting triggered
> at all.  Also, we're not using push constants with any offsets not
> aligned to 4 (as per the above assert) so there's no real assurance
> that that works.  Given that constant offsets are a very common case
> for apps, this is very disappointing.  For the moment, I'm going to
> push a patch to master to disable 16bit storage.  I'm really sorry
> about that.  I think your code is great and, based on my review, I'm
> pretty sure it should work but I don't think we can really ship this
> extension in good faith when we know that there is a massive gaping
> hole in test coverage like this.  (The coverage hole is not your
> fault!)  I've also filed a bug (893) against the CTS.

I agree that is better to increase testing coverage before enabling the
feature after your findings. At the same time for having the complete
support for VK_KHR_16bit_storage it is still pending the review of part
of input/output support of the feature, so we are not in a hurry to have
it enabled.

Chema

>
> On Wed, Dec 6, 2017 at 12:09 AM, Alejandro Piñeiro
> <apinheiro at igalia.com <mailto:apinheiro at igalia.com>> wrote:
>
>     On 06/12/17 01:19, Chema Casanova wrote:
>     > On 05/12/17 18:31, Chema Casanova wrote:
>     >> El 05/12/17 a las 06:16, Jason Ekstrand escribió:
>     >>> A couple of notes:
>     >>>
>     >>>  1) I *think* I gave you enough reviews to land the UBO/SSBO
>     part and
>     >>> the optimizations in 26-28.  If reviews are still missing
>     anywhere,
>     >>> please let me know.  If not, let's try and get that part landed.
>     >> The series is almost ready to land, I have only pending to
>     address your
>     >> feedback about use untyped_read for reading vec3 ssbos.
>     >>
>     >> The only missing explicit R-b is that " [PATCH v4 28/44]
>     i965/fs: Use
>     >> untyped_surface_read for 16-bit load_ssbo" and "[PATCH v4 23/44]
>     >> i965/fs: Enables 16-bit load_ubo with sampler" i've just
>     answered your
>     >> review to confirm the R-b.
>     >>
>     >> I expect to finish today vec3 ssbo and send the series to
>     Jenkins before
>     >> landing, confirm your "pending" R-b, do a last rebase to master
>     and ask
>     >> for a push.
>     > I've just prepared a rebased branch with the reviewed commits
>     ready to
>     > land to enable VK_KHR_16bit_storage support for SSBO/UBO.
>     >
>     >
>     https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-v4-ubo-ssbo-to-land
>     <https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-v4-ubo-ssbo-to-land>
>     >
>     > As I don't have still commit access to mesa, maybe Eduardo or
>     Alejandro
>     > can land it for me tomorrow. But, Jason feel free to push it if
>     you want.
>
>     I have just pushed it to master.
>
>     >
>     > Chema
>     >
>     >>>  2) I send out a patch to rewrite assign_constant_locations
>     which I
>     >>> think should make it automatically handle 8 and 16-bit values as
>     >>> well.  I'd rather do that than more special casing if
>     everything works
>     >>> out ok.
>     >> I'm testing this patch with 16-bits and make sure whatever is
>     needed to
>     >> have 16-bit working.
>     >>
>     >>>  3) I sent out a series of patches to enable pushing of UBOs in
>     >>> Vulkan.  If we're not careful, these will clash with 16bit
>     storage as
>     >>> UBO support suddenly has to imply push constant support.  That
>     said,
>     >>> I"m willing to wait at least a little while before landing
>     them to let
>     >>> us get 16bit push constant support sorted out.  The UBO pushing
>     >>> patches give us a nice little performance boost but we're
>     nowhere near
>     >>> a release and I don't want it blocking you.
>     >> That would be my next priority, so we would only have pending
>     to land
>     >> the 16-bit input/output support to finish this extension.
>     >>
>     >> Chema
>     >>
>     >>> On Wed, Nov 29, 2017 at 6:07 PM, Jose Maria Casanova Crespo
>     >>> <jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>
>     <mailto:jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>>> wrote:
>     >>>
>     >>>     Hello,
>     >>>
>     >>>     this is the V4 series for the implementation of the
>     >>>     SPV_KHR_16bit_storage
>     >>>     and VK_KHR_16bit_storage extensions on the anv vulkan
>     driver, in
>     >>>     addition
>     >>>     to the GLSL and NIR support needed.
>     >>>
>     >>>     The original series can be found here [1], the following
>     v2 [2]
>     >>>     and v3 [3].
>     >>>
>     >>>     In short V4 includes the following:
>     >>>
>     >>>      * Reorder the series to enable features as they are
>     implemented,
>     >>>     the series
>     >>>        now enables first UBO and SSBO support, and then
>     inputs/outputs and
>     >>>        finally push constants.
>     >>>      * Support the byte scattered read/write messages with
>     different
>     >>>     bit sizes
>     >>>        byte/word/dword.
>     >>>      * Refactor of the store_ssbo code and also fix stores when
>     >>>     writemask was .yz
>     >>>      * Uses the sampler for load_ubo avoiding the initial
>     >>>     implementation of
>     >>>        the series using byte_scattered_read.
>     >>>      * Addressed all the feedback provided by Jason and Topi
>     on v3 review.
>     >>>
>     >>>     This series is also available at:
>     >>>
>     >>>   
>      https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc4
>     <https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc4>
>     >>>   
>      <https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc4
>     <https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc4>>
>     >>>
>     >>>     The objective is to start landing part of this series, all
>     >>>     feedback has been
>     >>>     addressed for SSBO and UBO. But for input/outputs features
>     it will
>     >>>     probably
>     >>>     need another iteration as was not completely reviewed. It
>     is also
>     >>>     needed
>     >>>     to define the approach for push constants issues before of
>     after
>     >>>     landing
>     >>>     the support with this implementation.
>     >>>
>     >>>     Patches 1-5 and 8-17 have already been reviewed. Patch 7
>     was already
>     >>>     reviewed but as it has changed too much i would appreciate
>     another
>     >>>     review. When patches until 25 or 28 are reviewed we could land
>     >>>     UBOs and
>     >>>     SSBOs support.
>     >>>
>     >>>     Finally an updated overview of the patches:
>     >>>
>     >>>     Patches 1-2 add 16-bit float, int and uint types to GLSL.
>     This is
>     >>>     needed because NIR uses GLSL types internally. We use the
>     enums
>     >>>     already defined at AMD_gpu_shader_half_float and NV_gpu_shader
>     >>>     extensions. Patch 2 updates mesa/st, in order to avoid
>     warnings for
>     >>>     types not handled on a switch.
>     >>>
>     >>>     Patches 3-6 add NIR support for those new GLSL 16-bit types,
>     >>>     conversion opcodes, and rounding modes for float to half-float
>     >>>     conversions.
>     >>>
>     >>>     Patches 7-9 add the SPIR-V (SPV_KHR_16bit_storage) to NIR
>     support.
>     >>>
>     >>>     Patches 10-12 add general 16-bit support for i965. This
>     includes
>     >>>     handling of new types on several general purpose methods,
>     >>>     update/remove some asserts.
>     >>>
>     >>>     Patches 14-17 add support for 32 to 16-bit conversions for
>     i965,
>     >>>     including rounding mode opcodes (needed for float to
>     half-float
>     >>>     conversions), and an optimization that removes superfluous
>     rounding
>     >>>     mode sets.
>     >>>
>     >>>     Patches 18-21 add and use two new messages: byte scattered
>     read and
>     >>>     write. Those were needed because untyped surface message
>     has a fixed
>     >>>     32-bit write size. Those messages are used on the 16-bit
>     support of
>     >>>     store SSBO, load SSBO and load shared.
>     >>>
>     >>>     Patch 22 adds helpers to allow un/shuffle 16-bit
>     components in 32-bit
>     >>>     ones. This would be needed for following optimizations on
>     load/store
>     >>>     ssbo. This huck was originally in the input/outputs
>     support, but
>     >>>     needed
>     >>>     a relocation because of the new order of the series.
>     >>>
>     >>>     Patch 23 Enables load_ubo support for 16-bit using the sampler
>     >>>     un-shuffling
>     >>>     pairs 16-bit components from 32-bit.
>     >>>
>     >>>     Patches 24-25 enable SPV_KHR_16bit_storage and
>     >>>     VK_KHR_16bit_storage but only
>     >>>     the support for SSBO and UBO on anv vulkan driver.
>     >>>
>     >>>     Patches 26-28 were new patches included in V2 to improve
>     performance
>     >>>     reducing the use of multiple scattered messages for
>     untyped read/write
>     >>>     opreations. 16bit CTS tests passes without them. The other
>     one would
>     >>>     fix a real problem using predication (patch 27), but
>     >>>     unfourtunately no CTS
>     >>>     test yet catching it.
>     >>>
>     >>>     Patches 29-33 implement 16-bit vertex attribute inputs
>     support on
>     >>>     i965. These include changes on anv. This was needed
>     because 16-bit
>     >>>     surface formats do implicit conversion to 32-bit. To
>     workaround this,
>     >>>     we override the 16-bit surface format, and use 32-bit
>     ones. Issues
>     >>>     related
>     >>>     to robust buffer access have been addressed.
>     >>>
>     >>>     Patch 34 implements load input and load store for all
>     intra stage.
>     >>>     This
>     >>>     patch could have problems pointed by Jason related to how TCS
>     >>>     outputs are
>     >>>     implmemented that need more work.
>     >>>
>     >>>     Patch 35-42 implements 16-bit store output support for
>     fragment
>     >>>     shaders on i965. Last patch enables VK_KHR_16bit for
>     input/outputs.
>     >>>
>     >>>     Patch 43-44 adds 16-bit support for push constant and
>     enables the
>     >>>     feature.
>     >>>     There is still pending to work on a general solution for push
>     >>>     constants and
>     >>>     the mixture of different bit_sizes.
>     >>>
>     >>>     [1]
>     >>>   
>      https://lists.freedesktop.org/archives/mesa-dev/2017-July/162791.html
>     <https://lists.freedesktop.org/archives/mesa-dev/2017-July/162791.html>
>     >>>   
>      <https://lists.freedesktop.org/archives/mesa-dev/2017-July/162791.html
>     <https://lists.freedesktop.org/archives/mesa-dev/2017-July/162791.html>>
>     >>>     [2]
>     >>>   
>      https://lists.freedesktop.org/archives/mesa-dev/2017-August/167455.html
>     <https://lists.freedesktop.org/archives/mesa-dev/2017-August/167455.html>
>     >>>   
>      <https://lists.freedesktop.org/archives/mesa-dev/2017-August/167455.html
>     <https://lists.freedesktop.org/archives/mesa-dev/2017-August/167455.html>>
>     >>>     [3]
>     >>>   
>      https://lists.freedesktop.org/archives/mesa-dev/2017-October/172557.html
>     <https://lists.freedesktop.org/archives/mesa-dev/2017-October/172557.html>
>     >>>   
>      <https://lists.freedesktop.org/archives/mesa-dev/2017-October/172557.html
>     <https://lists.freedesktop.org/archives/mesa-dev/2017-October/172557.html>>
>     >>>
>     >>>     CC: Jason Ekstrand <jason at jlekstrand.net
>     <mailto:jason at jlekstrand.net>
>     >>>     <mailto:jason at jlekstrand.net <mailto:jason at jlekstrand.net>>>
>     >>>     CC: Topi Pohjolainen <topi.pohjolainen at gmail.com
>     <mailto:topi.pohjolainen at gmail.com>
>     >>>     <mailto:topi.pohjolainen at gmail.com
>     <mailto:topi.pohjolainen at gmail.com>>>
>     >>>     CC: Matt Turner <mattst88 at gmail.com
>     <mailto:mattst88 at gmail.com> <mailto:mattst88 at gmail.com
>     <mailto:mattst88 at gmail.com>>>
>     >>>
>     >>>     Alejandro Piñeiro (12):
>     >>>       i965/vec4: Handle 16-bit types at type_size_xvec4
>     >>>       i965/fs: Remove BRW_REGISTER_TYPE_HF assert at get_exec_type
>     >>>       i965/fs: Handle 32-bit to 16-bit conversions
>     >>>       i965/fs: Define new shader opcode to set rounding modes
>     >>>       i965/fs: Enable rounding mode on f2f16 ops
>     >>>       i965/fs: Add remove_extra_rounding_modes optimization
>     >>>       i965/fs: Use byte_scattered_write on 16-bit store_ssbo
>     >>>       anv: Enable VK_KHR_16bit_storage for SSBO and UBO
>     >>>       i965/fs: Predicate byte scattered writes if needed
>     >>>       anv/pipeline: Use 32-bit surface formats for 16-bit formats
>     >>>       anv/cmd_buffer: Add a padding to the vertex buffer
>     >>>       i965/fs: Use half_precision data_format on 16-bit fb writes
>     >>>
>     >>>     Eduardo Lima Mitev (8):
>     >>>       glsl: Add 16-bit types
>     >>>       mesa/st: Handle 16-bit types at st_glsl_storage_type_size()
>     >>>       nir: Add support for 16-bit types (half float, int16 and
>     uint16)
>     >>>       nir: Populate conversion opcodes to 16-bit types
>     >>>       spirv/nir: Handle 16-bit types
>     >>>       spirv/nir: Add support for SPV_KHR_16bit_storage
>     >>>       anv: Enable SPV_KHR_16bit_storage on gen 8+
>     >>>       i965/fs: Optimize 16-bit SSBO stores by packing two into a
>     >>>     32-bit reg
>     >>>
>     >>>     Jose Maria Casanova Crespo (24):
>     >>>       nir: Add rounding modes enum
>     >>>       nir: Handle fp16 rounding modes at nir_type_conversion_op
>     >>>       spirv: Enable FPRoundingMode decorator to nir operations
>     >>>       i965: Support for 16-bit base types in helper functions
>     >>>       i965: Add support for control register
>     >>>       i965/fs: Add byte scattered write message and fs support
>     >>>       i965/fs: Add byte scattered read message and fs support
>     >>>       i965/fs: Use byte scattered read for 16-bit load_ssbo
>     >>>       i965/fs: Helpers for un/shuffle 16-bit pairs in 32-bit
>     components
>     >>>       i965/fs: Enables 16-bit load_ubo with sampler
>     >>>       i965/fs: Use untyped_surface_read for 16-bit load_ssbo
>     >>>       compiler: Mark when input/ouput attribute at VS uses 16-bit
>     >>>       i965/compiler: includes 16-bit vertex input
>     >>>       i965/fs: Unpack 16-bit from 32-bit components in VS
>     load_input
>     >>>       i965/fs: Support 16-bit types at load_input and store_output
>     >>>       i965/fs: Enable Render Target Write for 16-bit outputs
>     >>>       anv: Enable VK_KHR_16bit_storage for input/output
>     >>>       i965/fs: Include support for SEND data_format bit for
>     Render Targets
>     >>>       i965/disasm: Show half-precision data_format on rt_writes
>     >>>       i965/fs: Mark 16-bit outputs on FS store_output
>     >>>       i965/fs: 16-bit source payloads always use 1 register
>     >>>       i965/fs: Enable 16-bit render target write on SKL and CHV
>     >>>       i965/fs: Support push constants of 16-bit types
>     >>>       anv: Enable VK_KHR_16bit_storage for push_constant
>     >>>
>     >>>      src/compiler/builtin_type_macros.h              |  26 ++
>     >>>      src/compiler/glsl/ast_to_hir.cpp                |   3 +
>     >>>      src/compiler/glsl/glsl_to_nir.cpp               |   9 +-
>     >>>      src/compiler/glsl/ir_clone.cpp                  |   3 +
>     >>>      src/compiler/glsl/link_uniform_initializers.cpp |   3 +
>     >>>      src/compiler/glsl/lower_buffer_access.cpp       |   3 +-
>     >>>      src/compiler/glsl_types.cpp                     |  93 +++++-
>     >>>      src/compiler/glsl_types.h                       |  25 +-
>     >>>      src/compiler/nir/nir.c                          |   6 +
>     >>>      src/compiler/nir/nir.h                          |  22 +-
>     >>>      src/compiler/nir/nir_gather_info.c              |  17 +-
>     >>>      src/compiler/nir/nir_opcodes.py                 |  11 +-
>     >>>      src/compiler/nir/nir_opcodes_c.py               |  17 +-
>     >>>      src/compiler/nir/nir_split_var_copies.c         |   6 +
>     >>>      src/compiler/nir_types.cpp                      |  18 ++
>     >>>      src/compiler/nir_types.h                        |   8 +
>     >>>      src/compiler/shader_info.h                      |   2 +
>     >>>      src/compiler/spirv/nir_spirv.h                  |   1 +
>     >>>      src/compiler/spirv/spirv_to_nir.c               | 118 ++++++-
>     >>>      src/compiler/spirv/vtn_alu.c                    |  35 +-
>     >>>      src/compiler/spirv/vtn_variables.c              |  21 ++
>     >>>      src/intel/compiler/brw_compiler.h               |   1 +
>     >>>      src/intel/compiler/brw_disasm.c                 |   4 +
>     >>>      src/intel/compiler/brw_eu.h                     |  25 +-
>     >>>      src/intel/compiler/brw_eu_defines.h             |  36 +++
>     >>>      src/intel/compiler/brw_eu_emit.c                | 130
>     +++++++-
>     >>>      src/intel/compiler/brw_fs.cpp                   | 142
>     +++++++-
>     >>>      src/intel/compiler/brw_fs.h                     |  12 +
>     >>>      src/intel/compiler/brw_fs_copy_propagation.cpp  |   8 +-
>     >>>      src/intel/compiler/brw_fs_generator.cpp         |  30 +-
>     >>>      src/intel/compiler/brw_fs_nir.cpp               | 409
>     >>>     ++++++++++++++++++++++--
>     >>>      src/intel/compiler/brw_fs_surface_builder.cpp   |  23 +-
>     >>>      src/intel/compiler/brw_fs_surface_builder.h     |  14 +
>     >>>      src/intel/compiler/brw_fs_visitor.cpp           |   6 +
>     >>>      src/intel/compiler/brw_inst.h                   |   1 +
>     >>>      src/intel/compiler/brw_ir_fs.h                  |   3 -
>     >>>      src/intel/compiler/brw_nir.c                    |  16 +
>     >>>      src/intel/compiler/brw_reg.h                    |   6 +
>     >>>      src/intel/compiler/brw_shader.cpp               |  23 ++
>     >>>      src/intel/compiler/brw_shader.h                 |   7 +
>     >>>      src/intel/compiler/brw_vec4.cpp                 |   1 +
>     >>>      src/intel/compiler/brw_vec4_generator.cpp       |   3 +-
>     >>>      src/intel/compiler/brw_vec4_visitor.cpp         |   3 +
>     >>>      src/intel/vulkan/anv_device.c                   |  23 ++
>     >>>      src/intel/vulkan/anv_extensions.py              |   1 +
>     >>>      src/intel/vulkan/anv_pipeline.c                 |   1 +
>     >>>      src/intel/vulkan/genX_cmd_buffer.c              |  20 +-
>     >>>      src/intel/vulkan/genX_pipeline.c                |  34 ++
>     >>>      src/mesa/program/ir_to_mesa.cpp                 |   6 +
>     >>>      src/mesa/state_tracker/st_glsl_types.cpp        |   3 +
>     >>>      50 files changed, 1337 insertions(+), 101 deletions(-)
>     >>>
>     >>>     --
>     >>>     2.14.3
>     >>>
>     >>>
>     >>>
>     >>>
>     >>> _______________________________________________
>     >>> mesa-dev mailing list
>     >>> mesa-dev at lists.freedesktop.org
>     <mailto:mesa-dev at lists.freedesktop.org>
>     >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>     <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>     >> _______________________________________________
>     >> mesa-dev mailing list
>     >> mesa-dev at lists.freedesktop.org
>     <mailto:mesa-dev at lists.freedesktop.org>
>     >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>     <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>     >>
>
>
>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev