[Mesa-dev] [PATCH] i965/hsw: approximate DDX with a uniform value across a subspan

Chia-I Wu olvaffe at gmail.com
Wed Sep 11 22:00:52 PDT 2013


From: Chia-I Wu <olv at lunarg.com>

Replicate the gradient of the top-left pixel to the other three pixels in the
subspan, as how DDY is implemented.  Before, different graidents were used for
pixels in the top row and pixels in the bottom row.

This change results in a less accurate approximation.  However, it improves
the performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at
95.0% confidence) on Haswell.  No noticeable image quality difference
observed.

No piglit gpu.tests regressions.

I failed to come up with an explanation for the performance difference.  The
change does not make a difference on Ivy Bridge either.  If anyone has the
insight, please kindly enlighten me.  Performance differences may also be
observed on other games that call textureGrad and dFdx.

Signed-off-by: Chia-I Wu <olv at lunarg.com>
---
 src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
index bfb3d33..c0d24a0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
@@ -564,16 +564,25 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src
 void
 fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src)
 {
+   /* approximate with ((ss0.tr - ss0.tl)x4 (ss1.tr - ss1.tl)x4) on Haswell,
+    * which gives much better performance when the result is used with
+    * sample_d
+    */
+   unsigned vstride = (brw->is_haswell) ? BRW_VERTICAL_STRIDE_4 :
+                                          BRW_VERTICAL_STRIDE_2;
+   unsigned width = (brw->is_haswell) ? BRW_WIDTH_4 :
+                                        BRW_WIDTH_2;
+
    struct brw_reg src0 = brw_reg(src.file, src.nr, 1,
 				 BRW_REGISTER_TYPE_F,
-				 BRW_VERTICAL_STRIDE_2,
-				 BRW_WIDTH_2,
+				 vstride,
+				 width,
 				 BRW_HORIZONTAL_STRIDE_0,
 				 BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
    struct brw_reg src1 = brw_reg(src.file, src.nr, 0,
 				 BRW_REGISTER_TYPE_F,
-				 BRW_VERTICAL_STRIDE_2,
-				 BRW_WIDTH_2,
+				 vstride,
+				 width,
 				 BRW_HORIZONTAL_STRIDE_0,
 				 BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
    brw_ADD(p, dst, src0, negate(src1));
-- 
1.8.3.1



More information about the mesa-dev mailing list