Mesa (staging/20.1): intel/nir: Don't try to emit vector load_scratch instructions

Sat Oct 10 08:41:41 UTC 2020

Module: Mesa
Branch: staging/20.1
Commit: 3689c2909249ae84c1abdaf50c310475dd8d4d7b
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=3689c2909249ae84c1abdaf50c310475dd8d4d7b

Author: Jason Ekstrand <jason at jlekstrand.net>
Date:   Thu Sep 24 16:28:56 2020 -0500

intel/nir: Don't try to emit vector load_scratch instructions

In 53bfcdeecf4c9, we added load/store_scratch instructions which deviate
a little bit from most memory load/store instructions in that we can't
use the normal untyped read/write instructions which can read and write
up to a vec4 at a time.  Instead, we have to use the DWORD scattered
read/write instructions which are scalar.  To handle this, we added code
to brw_nir_lower_mem_access_bit_sizes to cause them to be scalarized.
However, one case was missing: the load-as-larger-vector case.  In this
case, we take small bit-sized constant-offset loads replace it with a
32-bit load and shuffle the result around as needed.

For scratch, this case is much trickier to get right because it often
emits vec2 or wider which we would then have to lower again.  We did
this for other load and store ops because, for lower bit-sizes we have
to scalarize thanks to the byte scattered read/write instructions being
scalar.  However, for scratch we're not losing as much because we can't
vectorize 32-bit loads and stores either.  It's easier to just disallow
it whenever we have to scalarize.

Fixes: 53bfcdeecf4c9 "intel/fs: Implement the new load/store_scratch..."
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6872>
(cherry picked from commit fd04f858b0aa9f688f5dfb041ccb706da96f862a)

---

 .pick_status.json                                       | 2 +-
 src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c | 5 ++++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/.pick_status.json b/.pick_status.json
index 9125f14372a..ebe989c5714 100644
--- a/.pick_status.json
+++ b/.pick_status.json
@@ -850,7 +850,7 @@
         "description": "intel/nir: Don't try to emit vector load_scratch instructions",
         "nominated": true,
         "nomination_type": 1,
-        "resolution": 0,
+        "resolution": 1,
         "master_sha": null,
         "because_sha": "53bfcdeecf4c9632e09ee641d2ca02dd9ec25e34"
     },
diff --git a/src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c b/src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c
index 19abc16a9c5..ea982b0a091 100644
--- a/src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c
+++ b/src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c
@@ -53,6 +53,9 @@ dup_mem_intrinsic(nir_builder *b, nir_intrinsic_instr *intrin,
    }
 
    dup->num_components = num_components;
+   if (intrin->intrinsic == nir_intrinsic_load_scratch ||
+       intrin->intrinsic == nir_intrinsic_store_scratch)
+      assert(num_components == 1);
 
    for (unsigned i = 0; i < info->num_indices; i++)
       dup->const_index[i] = intrin->const_index[i];
@@ -92,7 +95,7 @@ lower_mem_load_bit_size(nir_builder *b, nir_intrinsic_instr *intrin,
 
    nir_ssa_def *result;
    nir_src *offset_src = nir_get_io_offset_src(intrin);
-   if (bit_size < 32 && nir_src_is_const(*offset_src)) {
+   if (bit_size < 32 && !needs_scalar && nir_src_is_const(*offset_src)) {
       /* The offset is constant so we can use a 32-bit load and just shift it
        * around as needed.
        */