[Mesa-dev] [PATCH] i965/gen7: always lower textureGrad() on gen7

Roland Scheidegger sroland at vmware.com
Thu Sep 5 08:18:05 PDT 2013


Hmm I don't think the math works out here actually, which may explain
why it's faster.
I believe the derivatives need to be transformed to cube coord system
and I don't see that being done here (this is actually something I
haven't figured out the math yet how to do with reasonable effort for
llvmpipe).
OTOH you could actually simplify the rho calculation a bit, since you
could do the sqrt easily after the max hence only needing one instead of
two sqrt (though if your hw has blazing fast sqrt it won't matter...).

Roland


Am 05.09.2013 10:35, schrieb Chia-I Wu:
> sample_d is slower than the lowered version on gen7.  For gen7, this improves
> Xonotic benchmark with Ultimate effects by as much as 25%:
> 
>  before the change:                      40.06 fps
>  after the change:                       51.10 fps
>  after the change with INTEL_DEBUG=no16: 44.46 fps
> 
> As sample_d is not allowed in SIMD16 mode, I firstly thought the difference
> was from SIMD8 versus SIMD16.  If that was the case, we would want to apply
> brw_lower_texture_gradients() only on fragment shaders in SIMD16 mode.
> 
> But, as the numbers show, there is still 10% improvement when SIMD16 is forced
> off after the change.  Thus textureGrad() is lowered unconditionally for now.
> Due to this and that I haven't tried it on Haswell, this is still RFC.
> 
> No piglit regressions.
> 
> Signed-off-by: Chia-I Wu <olvaffe at gmail.com>
> ---
>  .../dri/i965/brw_lower_texture_gradients.cpp       | 54 ++++++++++++++--------
>  1 file changed, 36 insertions(+), 18 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
> index 1589a20..f3fcb56 100644
> --- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
> @@ -34,8 +34,8 @@ using namespace ir_builder;
>  
>  class lower_texture_grad_visitor : public ir_hierarchical_visitor {
>  public:
> -   lower_texture_grad_visitor(bool has_sample_d_c)
> -      : has_sample_d_c(has_sample_d_c)
> +   lower_texture_grad_visitor(bool has_sample_d, bool has_sample_d_c)
> +      : has_sample_d(has_sample_d), has_sample_d_c(has_sample_d_c)
>     {
>        progress = false;
>     }
> @@ -44,6 +44,7 @@ public:
>  
>  
>     bool progress;
> +   bool has_sample_d;
>     bool has_sample_d_c;
>  
>  private:
> @@ -90,22 +91,33 @@ txs_type(const glsl_type *type)
>  ir_visitor_status
>  lower_texture_grad_visitor::visit_leave(ir_texture *ir)
>  {
> -   /* Only lower textureGrad with shadow samplers */
> -   if (ir->op != ir_txd || !ir->shadow_comparitor)
> +   if (ir->op != ir_txd)
>        return visit_continue;
>  
> -   /* Lower textureGrad() with samplerCubeShadow even if we have the sample_d_c
> -    * message.  GLSL provides gradients for the 'r' coordinate.  Unfortunately:
> -    *
> -    * From the Ivybridge PRM, Volume 4, Part 1, sample_d message description:
> -    * "The r coordinate contains the faceid, and the r gradients are ignored
> -    *  by hardware."
> -    *
> -    * We likely need to do a similar treatment for samplerCube and
> -    * samplerCubeArray, but we have insufficient testing for that at the moment.
> -    */
> -   bool need_lowering = !has_sample_d_c ||
> -      ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE;
> +   bool need_lowering = false;
> +
> +   if (ir->shadow_comparitor) {
> +      /* Lower textureGrad() with samplerCubeShadow even if we have the
> +       * sample_d_c message.  GLSL provides gradients for the 'r' coordinate.
> +       * Unfortunately:
> +       *
> +       * From the Ivybridge PRM, Volume 4, Part 1, sample_d message
> +       * description: "The r coordinate contains the faceid, and the r
> +       * gradients are ignored by hardware."
> +       */
> +      if (ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE)
> +         need_lowering = true;
> +      else if (!has_sample_d_c)
> +         need_lowering = true;
> +   }
> +   else {
> +      /* We likely need to do a similar treatment for samplerCube and
> +       * samplerCubeArray, but we have insufficient testing for that at the
> +       * moment.
> +       */
> +      if (!has_sample_d)
> +         need_lowering = true;
> +   }
>  
>     if (!need_lowering)
>        return visit_continue;
> @@ -154,7 +166,9 @@ lower_texture_grad_visitor::visit_leave(ir_texture *ir)
>  			       expr(ir_unop_sqrt, dot(dPdy, dPdy)));
>     }
>  
> -   /* lambda_base = log2(rho).  We're ignoring GL state biases for now. */
> +   /* lambda_base = log2(rho).  It will be biased and clamped by values
> +    * defined in SAMPLER_STATE to get the final lambda.
> +    */
>     ir->op = ir_txl;
>     ir->lod_info.lod = expr(ir_unop_log2, rho);
>  
> @@ -168,8 +182,12 @@ bool
>  brw_lower_texture_gradients(struct brw_context *brw,
>                              struct exec_list *instructions)
>  {
> +   /* sample_d is slower than the lowered version on gen7, and is not allowed
> +    * in SIMD16 mode.  Treating it as unsupported improves the performance.
> +    */
> +   bool has_sample_d = brw->gen != 7;
>     bool has_sample_d_c = brw->gen >= 8 || brw->is_haswell;
> -   lower_texture_grad_visitor v(has_sample_d_c);
> +   lower_texture_grad_visitor v(has_sample_d, has_sample_d_c);
>  
>     visit_list_elements(&v, instructions);
>  
> 


More information about the mesa-dev mailing list