[Mesa-dev] [PATCH 3/7] i965/fs: Use the sampler for FS pull constant loading on Ivybridge.

Kenneth Graunke kenneth at whitecape.org
Wed Sep 19 13:27:56 PDT 2012


Data port reads are absurdly slow on Ivybridge due to cache issues.

The LD message ignores the sampler unit index and SAMPLER_STATE pointer,
instead relying on hard-wired default state.  Thus, there's no need to
worry about running out of sampler units or providing SAMPLER_STATE;
this small patch should be all that's required.

NOTE: This is a candidate for all release branches.

Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
---
 src/mesa/drivers/dri/i965/brw_fs.h        |  3 +++
 src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 36 ++++++++++++++++++++++++++++++-
 2 files changed, 38 insertions(+), 1 deletion(-)

I did this a long time ago for VS pull constant loading, which resulted in
a 2-5x speedup for certain benchmarks.  Apparently at the time I never got
FS pull constant loading working, and didn't have a benchmark that needed
it, so I never finished and pushed it.

Now I have a game that needs it.  No concrete data as I haven't figured out
how to get consistent FPS numbers out of it.

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
index e69de31..b5f2152 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -295,6 +295,9 @@ public:
    void generate_pull_constant_load(fs_inst *inst, struct brw_reg dst,
 				    struct brw_reg index,
 				    struct brw_reg offset);
+   void gen7_generate_pull_constant_load(fs_inst *inst, struct brw_reg dst,
+                                         struct brw_reg index,
+                                         struct brw_reg offset);
    void generate_mov_dispatch_to_flags();
 
    void emit_dummy_fs();
diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
index 5900c0e..4059660 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
@@ -585,6 +585,37 @@ fs_visitor::generate_unspill(fs_inst *inst, struct brw_reg dst)
 }
 
 void
+fs_visitor::gen7_generate_pull_constant_load(fs_inst *inst, struct brw_reg dst,
+                                             struct brw_reg index,
+                                             struct brw_reg offset)
+{
+   assert(intel->gen == 7);
+   assert(index.file == BRW_IMMEDIATE_VALUE &&
+	  index.type == BRW_REGISTER_TYPE_UD);
+   assert(offset.file == BRW_IMMEDIATE_VALUE &&
+	  offset.type == BRW_REGISTER_TYPE_UD);
+   uint32_t surf_index = index.dw1.ud;
+   uint32_t read_offset = offset.dw1.ud;
+
+   /* offset is an IMM; SEND needs to be from a GRF. */
+   offset = retype(brw_vec8_grf(127, 0), BRW_REGISTER_TYPE_UD);
+   brw_MOV(p, offset, brw_imm_ud(read_offset / 16));
+
+   brw_instruction *insn = brw_next_insn(p, BRW_OPCODE_SEND);
+   brw_set_dest(p, insn, dst);
+   brw_set_src0(p, insn, offset);
+   brw_set_sampler_message(p, insn,
+                           surf_index,
+                           0, /* LD message ignores sampler unit */
+                           GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
+                           1, /* rlen */
+                           1, /* mlen */
+                           false, /* no header */
+                           BRW_SAMPLER_SIMD_MODE_SIMD4X2,
+                           0);
+}
+
+void
 fs_visitor::generate_pull_constant_load(fs_inst *inst, struct brw_reg dst,
 					struct brw_reg index,
 					struct brw_reg offset)
@@ -980,7 +1011,10 @@ fs_visitor::generate_code()
 	 break;
 
       case FS_OPCODE_PULL_CONSTANT_LOAD:
-	 generate_pull_constant_load(inst, dst, src[0], src[1]);
+	 if (intel->gen == 7)
+	    gen7_generate_pull_constant_load(inst, dst, src[0], src[1]);
+	 else
+	    generate_pull_constant_load(inst, dst, src[0], src[1]);
 	 break;
 
       case FS_OPCODE_FB_WRITE:
-- 
1.7.11.4



More information about the mesa-dev mailing list