[Mesa-dev] [PATCH] i965/fs: force pull model for 64-bit GS inputs
Kenneth Graunke
kenneth at whitecape.org
Wed Sep 27 19:26:38 UTC 2017
On Wednesday, September 27, 2017 5:21:49 AM PDT Iago Toral Quiroga wrote:
> Triggering the push model when 64-bit inputs are involved is not easy due to
> the constrains on the maximum number of registers that we allow for this mode,
> however, for GS with 'points' primitive type and just a couple of double
> varyings we can trigger this and it just doesn't work because the
> implementation is not 64-bit aware at all. For now, let's make sure that we
> don't attempt this model whith 64-bit inputs and we always fall back to pull
> model for them.
>
> Also, don't enable the VUE handles in the thread payload on the fly when we
> find an input for which we need the pull model, this is not safe: if we need
> to resort to the pull model we need to account for that when we setup the
> thread payload so we compute the first non-payload register properly. If we
> didn't do that correctly and we enable it on-the-fly here then we will end up
> VUE handles on the first non-payload register which will probably lead to
> GPU hangs. Instead, assert that we have identified that we needed the VUE
> handles during the payload setup.
> ---
> src/intel/compiler/brw_fs.cpp | 12 +++++++++++-
> src/intel/compiler/brw_fs_nir.cpp | 4 +++-
> 2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index eb9b4c3890..69b993bb55 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -5605,6 +5605,15 @@ fs_visitor::setup_gs_payload()
> /* Use a maximum of 24 registers for push-model inputs. */
> const unsigned max_push_components = 24;
>
> + /* We don't support the push model for 64-bit inputs */
> + bool has_64bit_inputs = false;
> + nir_foreach_variable(var, &nir->inputs) {
> + if (glsl_get_bit_size(var->type->without_array()) == 64) {
> + has_64bit_inputs = true;
> + break;
> + }
> + }
> +
> /* If pushing our inputs would take too many registers, reduce the URB read
> * length (which is in HWords, or 8 registers), and resort to pulling.
> *
> @@ -5612,7 +5621,8 @@ fs_visitor::setup_gs_payload()
> * have to multiply by VerticesIn to obtain the total storage requirement.
> */
> if (8 * vue_prog_data->urb_read_length * nir->info.gs.vertices_in >
> - max_push_components || gs_prog_data->invocations > 1) {
> + max_push_components || gs_prog_data->invocations > 1 ||
> + has_64bit_inputs) {
> gs_prog_data->base.include_vue_handles = true;
>
> /* R3..RN: ICP Handles for each incoming vertex (when using pull model) */
> diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp
> index d760946e62..aa57bb9d78 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -1915,7 +1915,9 @@ fs_visitor::emit_gs_input_load(const fs_reg &dst,
> const unsigned push_reg_count = gs_prog_data->base.urb_read_length * 8;
>
> /* TODO: figure out push input layout for invocations == 1 */
> + /* TODO: make this work with 64-bit inputs */
> if (gs_prog_data->invocations == 1 &&
> + type_sz(dst.type) <= 4 &&
> offset_const != NULL && vertex_const != NULL &&
> 4 * (base_offset + offset_const->u32[0]) < push_reg_count) {
> int imm_offset = (base_offset + offset_const->u32[0]) * 4 +
> @@ -1928,7 +1930,7 @@ fs_visitor::emit_gs_input_load(const fs_reg &dst,
> }
>
> /* Resort to the pull model. Ensure the VUE handles are provided. */
> - gs_prog_data->base.include_vue_handles = true;
> + assert(gs_prog_data->base.include_vue_handles);
I believe a variable-indexed input would hit this path and assert fail.
The push support here is pretty sketchy. Push inputs take up a *ton*
of register space, so even basic piglit tests seem to hit the limit
and resort to pulling at least some of them.
It might make sense to simply always include the VUE handles - that's
certainly easiest, and less buggy.
It might also make sense to stop bothering with the push model entirely.
I know it was very useful for TES inputs, but I don't recall whether I
bothered to take any performance data about push inputs for the GS. :(
> unsigned first_icp_handle = gs_prog_data->include_primitive_id ? 3 : 2;
> fs_reg icp_handle = bld.vgrf(BRW_REGISTER_TYPE_UD, 1);
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170927/0e550b4c/attachment.sig>
More information about the mesa-dev
mailing list