[Mesa-dev] [PATCH v4 00/44] anv: SPV_KHR_16bit_storage/VK_KHR_16bit_storage for gen8+
Chema Casanova
jmcasanova at igalia.com
Mon Dec 11 10:07:00 UTC 2017
El 09/12/17 a las 00:52, Jason Ekstrand escribió:
> While reviewing some of the UBO pushing comments from Topi, I
> discovered a fairly disturbing assert in brw_fs_nir.cpp in our
> implementation of nir_intrinsic_load_uniform:
>
> /* Offsets are in bytes but they should always be multiples
> of 4 */
> assert(const_offset->u32[0] % 4 == 0);
>
> This assertion isn't triggering with 16bit storage enabled for push
> constants. Looking at the CTS tests in a bit more detail, they're
> very poor. They only test basic types (scalars, vectors, and
> matrices) and only in arrays with a dynamic index. This means that
> the constant optimization paths for UBO pulls aren't getting triggered
> at all. Also, we're not using push constants with any offsets not
> aligned to 4 (as per the above assert) so there's no real assurance
> that that works. Given that constant offsets are a very common case
> for apps, this is very disappointing. For the moment, I'm going to
> push a patch to master to disable 16bit storage. I'm really sorry
> about that. I think your code is great and, based on my review, I'm
> pretty sure it should work but I don't think we can really ship this
> extension in good faith when we know that there is a massive gaping
> hole in test coverage like this. (The coverage hole is not your
> fault!) I've also filed a bug (893) against the CTS.
I agree that is better to increase testing coverage before enabling the
feature after your findings. At the same time for having the complete
support for VK_KHR_16bit_storage it is still pending the review of part
of input/output support of the feature, so we are not in a hurry to have
it enabled.
Chema
>
> On Wed, Dec 6, 2017 at 12:09 AM, Alejandro Piñeiro
> <apinheiro at igalia.com <mailto:apinheiro at igalia.com>> wrote:
>
> On 06/12/17 01:19, Chema Casanova wrote:
> > On 05/12/17 18:31, Chema Casanova wrote:
> >> El 05/12/17 a las 06:16, Jason Ekstrand escribió:
> >>> A couple of notes:
> >>>
> >>> 1) I *think* I gave you enough reviews to land the UBO/SSBO
> part and
> >>> the optimizations in 26-28. If reviews are still missing
> anywhere,
> >>> please let me know. If not, let's try and get that part landed.
> >> The series is almost ready to land, I have only pending to
> address your
> >> feedback about use untyped_read for reading vec3 ssbos.
> >>
> >> The only missing explicit R-b is that " [PATCH v4 28/44]
> i965/fs: Use
> >> untyped_surface_read for 16-bit load_ssbo" and "[PATCH v4 23/44]
> >> i965/fs: Enables 16-bit load_ubo with sampler" i've just
> answered your
> >> review to confirm the R-b.
> >>
> >> I expect to finish today vec3 ssbo and send the series to
> Jenkins before
> >> landing, confirm your "pending" R-b, do a last rebase to master
> and ask
> >> for a push.
> > I've just prepared a rebased branch with the reviewed commits
> ready to
> > land to enable VK_KHR_16bit_storage support for SSBO/UBO.
> >
> >
> https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-v4-ubo-ssbo-to-land
> <https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-v4-ubo-ssbo-to-land>
> >
> > As I don't have still commit access to mesa, maybe Eduardo or
> Alejandro
> > can land it for me tomorrow. But, Jason feel free to push it if
> you want.
>
> I have just pushed it to master.
>
> >
> > Chema
> >
> >>> 2) I send out a patch to rewrite assign_constant_locations
> which I
> >>> think should make it automatically handle 8 and 16-bit values as
> >>> well. I'd rather do that than more special casing if
> everything works
> >>> out ok.
> >> I'm testing this patch with 16-bits and make sure whatever is
> needed to
> >> have 16-bit working.
> >>
> >>> 3) I sent out a series of patches to enable pushing of UBOs in
> >>> Vulkan. If we're not careful, these will clash with 16bit
> storage as
> >>> UBO support suddenly has to imply push constant support. That
> said,
> >>> I"m willing to wait at least a little while before landing
> them to let
> >>> us get 16bit push constant support sorted out. The UBO pushing
> >>> patches give us a nice little performance boost but we're
> nowhere near
> >>> a release and I don't want it blocking you.
> >> That would be my next priority, so we would only have pending
> to land
> >> the 16-bit input/output support to finish this extension.
> >>
> >> Chema
> >>
> >>> On Wed, Nov 29, 2017 at 6:07 PM, Jose Maria Casanova Crespo
> >>> <jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>
> <mailto:jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>>> wrote:
> >>>
> >>> Hello,
> >>>
> >>> this is the V4 series for the implementation of the
> >>> SPV_KHR_16bit_storage
> >>> and VK_KHR_16bit_storage extensions on the anv vulkan
> driver, in
> >>> addition
> >>> to the GLSL and NIR support needed.
> >>>
> >>> The original series can be found here [1], the following
> v2 [2]
> >>> and v3 [3].
> >>>
> >>> In short V4 includes the following:
> >>>
> >>> * Reorder the series to enable features as they are
> implemented,
> >>> the series
> >>> now enables first UBO and SSBO support, and then
> inputs/outputs and
> >>> finally push constants.
> >>> * Support the byte scattered read/write messages with
> different
> >>> bit sizes
> >>> byte/word/dword.
> >>> * Refactor of the store_ssbo code and also fix stores when
> >>> writemask was .yz
> >>> * Uses the sampler for load_ubo avoiding the initial
> >>> implementation of
> >>> the series using byte_scattered_read.
> >>> * Addressed all the feedback provided by Jason and Topi
> on v3 review.
> >>>
> >>> This series is also available at:
> >>>
> >>>
> https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc4
> <https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc4>
> >>>
> <https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc4
> <https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc4>>
> >>>
> >>> The objective is to start landing part of this series, all
> >>> feedback has been
> >>> addressed for SSBO and UBO. But for input/outputs features
> it will
> >>> probably
> >>> need another iteration as was not completely reviewed. It
> is also
> >>> needed
> >>> to define the approach for push constants issues before of
> after
> >>> landing
> >>> the support with this implementation.
> >>>
> >>> Patches 1-5 and 8-17 have already been reviewed. Patch 7
> was already
> >>> reviewed but as it has changed too much i would appreciate
> another
> >>> review. When patches until 25 or 28 are reviewed we could land
> >>> UBOs and
> >>> SSBOs support.
> >>>
> >>> Finally an updated overview of the patches:
> >>>
> >>> Patches 1-2 add 16-bit float, int and uint types to GLSL.
> This is
> >>> needed because NIR uses GLSL types internally. We use the
> enums
> >>> already defined at AMD_gpu_shader_half_float and NV_gpu_shader
> >>> extensions. Patch 2 updates mesa/st, in order to avoid
> warnings for
> >>> types not handled on a switch.
> >>>
> >>> Patches 3-6 add NIR support for those new GLSL 16-bit types,
> >>> conversion opcodes, and rounding modes for float to half-float
> >>> conversions.
> >>>
> >>> Patches 7-9 add the SPIR-V (SPV_KHR_16bit_storage) to NIR
> support.
> >>>
> >>> Patches 10-12 add general 16-bit support for i965. This
> includes
> >>> handling of new types on several general purpose methods,
> >>> update/remove some asserts.
> >>>
> >>> Patches 14-17 add support for 32 to 16-bit conversions for
> i965,
> >>> including rounding mode opcodes (needed for float to
> half-float
> >>> conversions), and an optimization that removes superfluous
> rounding
> >>> mode sets.
> >>>
> >>> Patches 18-21 add and use two new messages: byte scattered
> read and
> >>> write. Those were needed because untyped surface message
> has a fixed
> >>> 32-bit write size. Those messages are used on the 16-bit
> support of
> >>> store SSBO, load SSBO and load shared.
> >>>
> >>> Patch 22 adds helpers to allow un/shuffle 16-bit
> components in 32-bit
> >>> ones. This would be needed for following optimizations on
> load/store
> >>> ssbo. This huck was originally in the input/outputs
> support, but
> >>> needed
> >>> a relocation because of the new order of the series.
> >>>
> >>> Patch 23 Enables load_ubo support for 16-bit using the sampler
> >>> un-shuffling
> >>> pairs 16-bit components from 32-bit.
> >>>
> >>> Patches 24-25 enable SPV_KHR_16bit_storage and
> >>> VK_KHR_16bit_storage but only
> >>> the support for SSBO and UBO on anv vulkan driver.
> >>>
> >>> Patches 26-28 were new patches included in V2 to improve
> performance
> >>> reducing the use of multiple scattered messages for
> untyped read/write
> >>> opreations. 16bit CTS tests passes without them. The other
> one would
> >>> fix a real problem using predication (patch 27), but
> >>> unfourtunately no CTS
> >>> test yet catching it.
> >>>
> >>> Patches 29-33 implement 16-bit vertex attribute inputs
> support on
> >>> i965. These include changes on anv. This was needed
> because 16-bit
> >>> surface formats do implicit conversion to 32-bit. To
> workaround this,
> >>> we override the 16-bit surface format, and use 32-bit
> ones. Issues
> >>> related
> >>> to robust buffer access have been addressed.
> >>>
> >>> Patch 34 implements load input and load store for all
> intra stage.
> >>> This
> >>> patch could have problems pointed by Jason related to how TCS
> >>> outputs are
> >>> implmemented that need more work.
> >>>
> >>> Patch 35-42 implements 16-bit store output support for
> fragment
> >>> shaders on i965. Last patch enables VK_KHR_16bit for
> input/outputs.
> >>>
> >>> Patch 43-44 adds 16-bit support for push constant and
> enables the
> >>> feature.
> >>> There is still pending to work on a general solution for push
> >>> constants and
> >>> the mixture of different bit_sizes.
> >>>
> >>> [1]
> >>>
> https://lists.freedesktop.org/archives/mesa-dev/2017-July/162791.html
> <https://lists.freedesktop.org/archives/mesa-dev/2017-July/162791.html>
> >>>
> <https://lists.freedesktop.org/archives/mesa-dev/2017-July/162791.html
> <https://lists.freedesktop.org/archives/mesa-dev/2017-July/162791.html>>
> >>> [2]
> >>>
> https://lists.freedesktop.org/archives/mesa-dev/2017-August/167455.html
> <https://lists.freedesktop.org/archives/mesa-dev/2017-August/167455.html>
> >>>
> <https://lists.freedesktop.org/archives/mesa-dev/2017-August/167455.html
> <https://lists.freedesktop.org/archives/mesa-dev/2017-August/167455.html>>
> >>> [3]
> >>>
> https://lists.freedesktop.org/archives/mesa-dev/2017-October/172557.html
> <https://lists.freedesktop.org/archives/mesa-dev/2017-October/172557.html>
> >>>
> <https://lists.freedesktop.org/archives/mesa-dev/2017-October/172557.html
> <https://lists.freedesktop.org/archives/mesa-dev/2017-October/172557.html>>
> >>>
> >>> CC: Jason Ekstrand <jason at jlekstrand.net
> <mailto:jason at jlekstrand.net>
> >>> <mailto:jason at jlekstrand.net <mailto:jason at jlekstrand.net>>>
> >>> CC: Topi Pohjolainen <topi.pohjolainen at gmail.com
> <mailto:topi.pohjolainen at gmail.com>
> >>> <mailto:topi.pohjolainen at gmail.com
> <mailto:topi.pohjolainen at gmail.com>>>
> >>> CC: Matt Turner <mattst88 at gmail.com
> <mailto:mattst88 at gmail.com> <mailto:mattst88 at gmail.com
> <mailto:mattst88 at gmail.com>>>
> >>>
> >>> Alejandro Piñeiro (12):
> >>> i965/vec4: Handle 16-bit types at type_size_xvec4
> >>> i965/fs: Remove BRW_REGISTER_TYPE_HF assert at get_exec_type
> >>> i965/fs: Handle 32-bit to 16-bit conversions
> >>> i965/fs: Define new shader opcode to set rounding modes
> >>> i965/fs: Enable rounding mode on f2f16 ops
> >>> i965/fs: Add remove_extra_rounding_modes optimization
> >>> i965/fs: Use byte_scattered_write on 16-bit store_ssbo
> >>> anv: Enable VK_KHR_16bit_storage for SSBO and UBO
> >>> i965/fs: Predicate byte scattered writes if needed
> >>> anv/pipeline: Use 32-bit surface formats for 16-bit formats
> >>> anv/cmd_buffer: Add a padding to the vertex buffer
> >>> i965/fs: Use half_precision data_format on 16-bit fb writes
> >>>
> >>> Eduardo Lima Mitev (8):
> >>> glsl: Add 16-bit types
> >>> mesa/st: Handle 16-bit types at st_glsl_storage_type_size()
> >>> nir: Add support for 16-bit types (half float, int16 and
> uint16)
> >>> nir: Populate conversion opcodes to 16-bit types
> >>> spirv/nir: Handle 16-bit types
> >>> spirv/nir: Add support for SPV_KHR_16bit_storage
> >>> anv: Enable SPV_KHR_16bit_storage on gen 8+
> >>> i965/fs: Optimize 16-bit SSBO stores by packing two into a
> >>> 32-bit reg
> >>>
> >>> Jose Maria Casanova Crespo (24):
> >>> nir: Add rounding modes enum
> >>> nir: Handle fp16 rounding modes at nir_type_conversion_op
> >>> spirv: Enable FPRoundingMode decorator to nir operations
> >>> i965: Support for 16-bit base types in helper functions
> >>> i965: Add support for control register
> >>> i965/fs: Add byte scattered write message and fs support
> >>> i965/fs: Add byte scattered read message and fs support
> >>> i965/fs: Use byte scattered read for 16-bit load_ssbo
> >>> i965/fs: Helpers for un/shuffle 16-bit pairs in 32-bit
> components
> >>> i965/fs: Enables 16-bit load_ubo with sampler
> >>> i965/fs: Use untyped_surface_read for 16-bit load_ssbo
> >>> compiler: Mark when input/ouput attribute at VS uses 16-bit
> >>> i965/compiler: includes 16-bit vertex input
> >>> i965/fs: Unpack 16-bit from 32-bit components in VS
> load_input
> >>> i965/fs: Support 16-bit types at load_input and store_output
> >>> i965/fs: Enable Render Target Write for 16-bit outputs
> >>> anv: Enable VK_KHR_16bit_storage for input/output
> >>> i965/fs: Include support for SEND data_format bit for
> Render Targets
> >>> i965/disasm: Show half-precision data_format on rt_writes
> >>> i965/fs: Mark 16-bit outputs on FS store_output
> >>> i965/fs: 16-bit source payloads always use 1 register
> >>> i965/fs: Enable 16-bit render target write on SKL and CHV
> >>> i965/fs: Support push constants of 16-bit types
> >>> anv: Enable VK_KHR_16bit_storage for push_constant
> >>>
> >>> src/compiler/builtin_type_macros.h | 26 ++
> >>> src/compiler/glsl/ast_to_hir.cpp | 3 +
> >>> src/compiler/glsl/glsl_to_nir.cpp | 9 +-
> >>> src/compiler/glsl/ir_clone.cpp | 3 +
> >>> src/compiler/glsl/link_uniform_initializers.cpp | 3 +
> >>> src/compiler/glsl/lower_buffer_access.cpp | 3 +-
> >>> src/compiler/glsl_types.cpp | 93 +++++-
> >>> src/compiler/glsl_types.h | 25 +-
> >>> src/compiler/nir/nir.c | 6 +
> >>> src/compiler/nir/nir.h | 22 +-
> >>> src/compiler/nir/nir_gather_info.c | 17 +-
> >>> src/compiler/nir/nir_opcodes.py | 11 +-
> >>> src/compiler/nir/nir_opcodes_c.py | 17 +-
> >>> src/compiler/nir/nir_split_var_copies.c | 6 +
> >>> src/compiler/nir_types.cpp | 18 ++
> >>> src/compiler/nir_types.h | 8 +
> >>> src/compiler/shader_info.h | 2 +
> >>> src/compiler/spirv/nir_spirv.h | 1 +
> >>> src/compiler/spirv/spirv_to_nir.c | 118 ++++++-
> >>> src/compiler/spirv/vtn_alu.c | 35 +-
> >>> src/compiler/spirv/vtn_variables.c | 21 ++
> >>> src/intel/compiler/brw_compiler.h | 1 +
> >>> src/intel/compiler/brw_disasm.c | 4 +
> >>> src/intel/compiler/brw_eu.h | 25 +-
> >>> src/intel/compiler/brw_eu_defines.h | 36 +++
> >>> src/intel/compiler/brw_eu_emit.c | 130
> +++++++-
> >>> src/intel/compiler/brw_fs.cpp | 142
> +++++++-
> >>> src/intel/compiler/brw_fs.h | 12 +
> >>> src/intel/compiler/brw_fs_copy_propagation.cpp | 8 +-
> >>> src/intel/compiler/brw_fs_generator.cpp | 30 +-
> >>> src/intel/compiler/brw_fs_nir.cpp | 409
> >>> ++++++++++++++++++++++--
> >>> src/intel/compiler/brw_fs_surface_builder.cpp | 23 +-
> >>> src/intel/compiler/brw_fs_surface_builder.h | 14 +
> >>> src/intel/compiler/brw_fs_visitor.cpp | 6 +
> >>> src/intel/compiler/brw_inst.h | 1 +
> >>> src/intel/compiler/brw_ir_fs.h | 3 -
> >>> src/intel/compiler/brw_nir.c | 16 +
> >>> src/intel/compiler/brw_reg.h | 6 +
> >>> src/intel/compiler/brw_shader.cpp | 23 ++
> >>> src/intel/compiler/brw_shader.h | 7 +
> >>> src/intel/compiler/brw_vec4.cpp | 1 +
> >>> src/intel/compiler/brw_vec4_generator.cpp | 3 +-
> >>> src/intel/compiler/brw_vec4_visitor.cpp | 3 +
> >>> src/intel/vulkan/anv_device.c | 23 ++
> >>> src/intel/vulkan/anv_extensions.py | 1 +
> >>> src/intel/vulkan/anv_pipeline.c | 1 +
> >>> src/intel/vulkan/genX_cmd_buffer.c | 20 +-
> >>> src/intel/vulkan/genX_pipeline.c | 34 ++
> >>> src/mesa/program/ir_to_mesa.cpp | 6 +
> >>> src/mesa/state_tracker/st_glsl_types.cpp | 3 +
> >>> 50 files changed, 1337 insertions(+), 101 deletions(-)
> >>>
> >>> --
> >>> 2.14.3
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> mesa-dev mailing list
> >>> mesa-dev at lists.freedesktop.org
> <mailto:mesa-dev at lists.freedesktop.org>
> >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
> >> _______________________________________________
> >> mesa-dev mailing list
> >> mesa-dev at lists.freedesktop.org
> <mailto:mesa-dev at lists.freedesktop.org>
> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
> >>
>
>
>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list