[Mesa-dev] [PATCH] i965/gen7: always lower textureGrad() on gen7

Chia-I Wu olvaffe at gmail.com
Wed Sep 11 22:06:40 PDT 2013


On Tue, Sep 10, 2013 at 2:01 PM, Chia-I Wu <olvaffe at gmail.com> wrote:
> On Tue, Sep 10, 2013 at 4:05 AM, Ian Romanick <idr at freedesktop.org> wrote:
>> On 09/05/2013 03:35 AM, Chia-I Wu wrote:
>>> sample_d is slower than the lowered version on gen7.  For gen7, this improves
>>> Xonotic benchmark with Ultimate effects by as much as 25%:
>>>
>>>  before the change:                      40.06 fps
>>>  after the change:                       51.10 fps
>>>  after the change with INTEL_DEBUG=no16: 44.46 fps
>>>
>>> As sample_d is not allowed in SIMD16 mode, I firstly thought the difference
>>> was from SIMD8 versus SIMD16.  If that was the case, we would want to apply
>>> brw_lower_texture_gradients() only on fragment shaders in SIMD16 mode.
>>>
>>> But, as the numbers show, there is still 10% improvement when SIMD16 is forced
>>> off after the change.  Thus textureGrad() is lowered unconditionally for now.
>>> Due to this and that I haven't tried it on Haswell, this is still RFC.
>>
>> A lot of this code depends on the texture targets being used.  What
>> texture targets is Xonotic using with textureGrad?
> Only sampler2D.
>>
>>> No piglit regressions.
>>>
>>> Signed-off-by: Chia-I Wu <olvaffe at gmail.com>
>>> ---
>>>  .../dri/i965/brw_lower_texture_gradients.cpp       | 54 ++++++++++++++--------
>>>  1 file changed, 36 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
>>> index 1589a20..f3fcb56 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
>>> @@ -34,8 +34,8 @@ using namespace ir_builder;
>>>
>>>  class lower_texture_grad_visitor : public ir_hierarchical_visitor {
>>>  public:
>>> -   lower_texture_grad_visitor(bool has_sample_d_c)
>>> -      : has_sample_d_c(has_sample_d_c)
>>> +   lower_texture_grad_visitor(bool has_sample_d, bool has_sample_d_c)
>>> +      : has_sample_d(has_sample_d), has_sample_d_c(has_sample_d_c)
>>>     {
>>>        progress = false;
>>>     }
>>> @@ -44,6 +44,7 @@ public:
>>>
>>>
>>>     bool progress;
>>> +   bool has_sample_d;
>>>     bool has_sample_d_c;
>>>
>>>  private:
>>> @@ -90,22 +91,33 @@ txs_type(const glsl_type *type)
>>>  ir_visitor_status
>>>  lower_texture_grad_visitor::visit_leave(ir_texture *ir)
>>>  {
>>> -   /* Only lower textureGrad with shadow samplers */
>>> -   if (ir->op != ir_txd || !ir->shadow_comparitor)
>>> +   if (ir->op != ir_txd)
>>>        return visit_continue;
>>>
>>> -   /* Lower textureGrad() with samplerCubeShadow even if we have the sample_d_c
>>> -    * message.  GLSL provides gradients for the 'r' coordinate.  Unfortunately:
>>> -    *
>>> -    * From the Ivybridge PRM, Volume 4, Part 1, sample_d message description:
>>> -    * "The r coordinate contains the faceid, and the r gradients are ignored
>>> -    *  by hardware."
>>> -    *
>>> -    * We likely need to do a similar treatment for samplerCube and
>>> -    * samplerCubeArray, but we have insufficient testing for that at the moment.
>>> -    */
>>> -   bool need_lowering = !has_sample_d_c ||
>>> -      ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE;
>>> +   bool need_lowering = false;
>>> +
>>> +   if (ir->shadow_comparitor) {
>>> +      /* Lower textureGrad() with samplerCubeShadow even if we have the
>>> +       * sample_d_c message.  GLSL provides gradients for the 'r' coordinate.
>>> +       * Unfortunately:
>>> +       *
>>> +       * From the Ivybridge PRM, Volume 4, Part 1, sample_d message
>>> +       * description: "The r coordinate contains the faceid, and the r
>>> +       * gradients are ignored by hardware."
>>> +       */
>>> +      if (ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE)
>>> +         need_lowering = true;
>>> +      else if (!has_sample_d_c)
>>> +         need_lowering = true;
>>
>> This should look like the old code:
>>
>>     need_lowering = !has_sample_d_c ||
>>        ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE;
> Sure.  I moved it so that it is clear the comments are for the first if-block.
>>> +   }
>>> +   else {
>>> +      /* We likely need to do a similar treatment for samplerCube and
>>> +       * samplerCubeArray, but we have insufficient testing for that at the
>>> +       * moment.
>>> +       */
>>> +      if (!has_sample_d)
>>> +         need_lowering = true;
>>
>>     need_lowering = !has_sample_d;
> Will do.
>>
>>> +   }
>>>
>>>     if (!need_lowering)
>>>        return visit_continue;
>>> @@ -154,7 +166,9 @@ lower_texture_grad_visitor::visit_leave(ir_texture *ir)
>>>                              expr(ir_unop_sqrt, dot(dPdy, dPdy)));
>>>     }
>>>
>>> -   /* lambda_base = log2(rho).  We're ignoring GL state biases for now. */
>>> +   /* lambda_base = log2(rho).  It will be biased and clamped by values
>>> +    * defined in SAMPLER_STATE to get the final lambda.
>>> +    */
>>>     ir->op = ir_txl;
>>>     ir->lod_info.lod = expr(ir_unop_log2, rho);
>>>
>>> @@ -168,8 +182,12 @@ bool
>>>  brw_lower_texture_gradients(struct brw_context *brw,
>>>                              struct exec_list *instructions)
>>>  {
>>> +   /* sample_d is slower than the lowered version on gen7, and is not allowed
>>> +    * in SIMD16 mode.  Treating it as unsupported improves the performance.
>>> +    */
>>> +   bool has_sample_d = brw->gen != 7;
>>
>> Based on the data in one of the other messages, it sounds like this
>> selector needs to be slightly more complex.
> To extend it for Haswell and maybe Sandy Bridge?  It is also possible
> to apply this trick only on fragment shaders.  Should I do that?
>
> Now I have my Haswell set up, I have a few more observations.  This
> patch helps the performance for two reasons
>
>  - it allows the shaders to run in SIMD16 mode
>  - it replaces sample_d by sample_l (plus some math ops)
>
> On both Ivy Bridge and Haswell, SIMD16 helps the performance by 10-15%
> for this game.  Avoiding sample_d helps the performance by another
> 10-15% on Ivy Bridge.  However, it does not seem to help Haswell.  I
> am still trying to figure out why (or why it helps Ivy Bridge).
I sent another patch that makes sample_d execute faster on Haswell.
Should the patch be integrated, we do not want to this lowering pass
to kick in for Haswell and will need another way to enable SIMD16.
>
>
>>
>>>     bool has_sample_d_c = brw->gen >= 8 || brw->is_haswell;
>>> -   lower_texture_grad_visitor v(has_sample_d_c);
>>> +   lower_texture_grad_visitor v(has_sample_d, has_sample_d_c);
>>>
>>>     visit_list_elements(&v, instructions);
>>
>
>
>
> --
> olv at LunarG.com



-- 
olv at LunarG.com


More information about the mesa-dev mailing list