[Mesa-dev] [PATCH] i965/gen7: always lower textureGrad() on gen7

Wed Sep 11 22:15:05 PDT 2013

On Tue, Sep 10, 2013 at 1:37 PM, Chia-I Wu <olvaffe at gmail.com> wrote:
> On Tue, Sep 10, 2013 at 4:01 AM, Ian Romanick <idr at freedesktop.org> wrote:
>> On 09/06/2013 05:05 AM, Chia-I Wu wrote:
>>> On Thu, Sep 5, 2013 at 9:57 PM, Chia-I Wu <olvaffe at gmail.com> wrote:
>>>> On Thu, Sep 5, 2013 at 5:12 PM, Chris Forbes <chrisf at ijw.co.nz> wrote:
>>>>> A possible explanation for the perf change is that Xonotic uses
>>>>> anisotropic filtering at this quality level. Lowering to txl defeats
>>>>> it.
>>>> I had a look at that.  gl_sampler->MaxAnisotropy is never greater than
>>>> 1.0 in gen7_update_sampler_state() so there is no anisotropic
>>>> filtering in this case.
>>>>
>>>> It makes sense to me that avoiding punting to SIMD8 helps the
>>>> performance.  But it is not clear to me why >10% performance change
>>>> can still be observed when INTEL_DEBUG=no16 is specified.  A
>>>> reasonable explanation is that the image quality is degraded in some
>>>> way, which is why I am still nervous about the change.
>>> With INTEL_DEBUG=no16 set, the same trick hurts the performance on
>>> Haswell by about 5%.  That is, sample_d on Haswell is faster than the
>>> one emulated with sample_l.
>>
>> What is the delta if sample_d is used for just SIMD8 shaders on HSW?
>> Even when the shader can go SIMD16, some fragments will use the SIMD8 path.
> brw_lower_texture_gradients applies on the IR so it is hard to
> selectively apply it only for SIMD16 fs.  I will see if I can work
> something out here to get the numbers you need.
I could clone the original IR list, run all but
brw_lower_texture_gradients passes on it, and use the cloned list to
generate SIMD8 code.  This is to get the numbers, not for the final
code.

But I sent another patch that should speed up sample_d.  With it, we
do not want to lower sample_d to sample_l at all.  I will see how the
patch goes first.

>
>
>>> But since the trick makes SIMD16 possible, it gains 5% more fps when
>>> INTEL_DEBUG=no16 is not set.
>>>
>>>> An alternative approach to avoid punting seems to emulate SIMD16
>>>> sample_d with two SIMD8 sample_d.  It will take longer to implement
>>>> given my familiarity with the code, and may be less performant.  BUt
>>>> that would allow things like anisotropic filtering to be honored.
And we will need to do this to enable SIMD16.
>>>>
>>>>
>>>>> It would be worth doing an image quality comparison before and after the change.
>>>> Yeah, that is worth doing.  I will do that.
>>>>
>>>>>
>>>>> -- Chris
>>>>>
>>>>> On Thu, Sep 5, 2013 at 8:35 PM, Chia-I Wu <olvaffe at gmail.com> wrote:
>>>>>> sample_d is slower than the lowered version on gen7.  For gen7, this improves
>>>>>> Xonotic benchmark with Ultimate effects by as much as 25%:
>>>>>>
>>>>>>  before the change:                      40.06 fps
>>>>>>  after the change:                       51.10 fps
>>>>>>  after the change with INTEL_DEBUG=no16: 44.46 fps
>>>>>>
>>>>>> As sample_d is not allowed in SIMD16 mode, I firstly thought the difference
>>>>>> was from SIMD8 versus SIMD16.  If that was the case, we would want to apply
>>>>>> brw_lower_texture_gradients() only on fragment shaders in SIMD16 mode.
>>>>>>
>>>>>> But, as the numbers show, there is still 10% improvement when SIMD16 is forced
>>>>>> off after the change.  Thus textureGrad() is lowered unconditionally for now.
>>>>>> Due to this and that I haven't tried it on Haswell, this is still RFC.
>>>>>>
>>>>>> No piglit regressions.
>>>>>>
>>>>>> Signed-off-by: Chia-I Wu <olvaffe at gmail.com>
>>>>>> ---
>>>>>>  .../dri/i965/brw_lower_texture_gradients.cpp       | 54 ++++++++++++++--------
>>>>>>  1 file changed, 36 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
>>>>>> index 1589a20..f3fcb56 100644
>>>>>> --- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
>>>>>> @@ -34,8 +34,8 @@ using namespace ir_builder;
>>>>>>
>>>>>>  class lower_texture_grad_visitor : public ir_hierarchical_visitor {
>>>>>>  public:
>>>>>> -   lower_texture_grad_visitor(bool has_sample_d_c)
>>>>>> -      : has_sample_d_c(has_sample_d_c)
>>>>>> +   lower_texture_grad_visitor(bool has_sample_d, bool has_sample_d_c)
>>>>>> +      : has_sample_d(has_sample_d), has_sample_d_c(has_sample_d_c)
>>>>>>     {
>>>>>>        progress = false;
>>>>>>     }
>>>>>> @@ -44,6 +44,7 @@ public:
>>>>>>
>>>>>>
>>>>>>     bool progress;
>>>>>> +   bool has_sample_d;
>>>>>>     bool has_sample_d_c;
>>>>>>
>>>>>>  private:
>>>>>> @@ -90,22 +91,33 @@ txs_type(const glsl_type *type)
>>>>>>  ir_visitor_status
>>>>>>  lower_texture_grad_visitor::visit_leave(ir_texture *ir)
>>>>>>  {
>>>>>> -   /* Only lower textureGrad with shadow samplers */
>>>>>> -   if (ir->op != ir_txd || !ir->shadow_comparitor)
>>>>>> +   if (ir->op != ir_txd)
>>>>>>        return visit_continue;
>>>>>>
>>>>>> -   /* Lower textureGrad() with samplerCubeShadow even if we have the sample_d_c
>>>>>> -    * message.  GLSL provides gradients for the 'r' coordinate.  Unfortunately:
>>>>>> -    *
>>>>>> -    * From the Ivybridge PRM, Volume 4, Part 1, sample_d message description:
>>>>>> -    * "The r coordinate contains the faceid, and the r gradients are ignored
>>>>>> -    *  by hardware."
>>>>>> -    *
>>>>>> -    * We likely need to do a similar treatment for samplerCube and
>>>>>> -    * samplerCubeArray, but we have insufficient testing for that at the moment.
>>>>>> -    */
>>>>>> -   bool need_lowering = !has_sample_d_c ||
>>>>>> -      ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE;
>>>>>> +   bool need_lowering = false;
>>>>>> +
>>>>>> +   if (ir->shadow_comparitor) {
>>>>>> +      /* Lower textureGrad() with samplerCubeShadow even if we have the
>>>>>> +       * sample_d_c message.  GLSL provides gradients for the 'r' coordinate.
>>>>>> +       * Unfortunately:
>>>>>> +       *
>>>>>> +       * From the Ivybridge PRM, Volume 4, Part 1, sample_d message
>>>>>> +       * description: "The r coordinate contains the faceid, and the r
>>>>>> +       * gradients are ignored by hardware."
>>>>>> +       */
>>>>>> +      if (ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE)
>>>>>> +         need_lowering = true;
>>>>>> +      else if (!has_sample_d_c)
>>>>>> +         need_lowering = true;
>>>>>> +   }
>>>>>> +   else {
>>>>>> +      /* We likely need to do a similar treatment for samplerCube and
>>>>>> +       * samplerCubeArray, but we have insufficient testing for that at the
>>>>>> +       * moment.
>>>>>> +       */
>>>>>> +      if (!has_sample_d)
>>>>>> +         need_lowering = true;
>>>>>> +   }
>>>>>>
>>>>>>     if (!need_lowering)
>>>>>>        return visit_continue;
>>>>>> @@ -154,7 +166,9 @@ lower_texture_grad_visitor::visit_leave(ir_texture *ir)
>>>>>>                                expr(ir_unop_sqrt, dot(dPdy, dPdy)));
>>>>>>     }
>>>>>>
>>>>>> -   /* lambda_base = log2(rho).  We're ignoring GL state biases for now. */
>>>>>> +   /* lambda_base = log2(rho).  It will be biased and clamped by values
>>>>>> +    * defined in SAMPLER_STATE to get the final lambda.
>>>>>> +    */
>>>>>>     ir->op = ir_txl;
>>>>>>     ir->lod_info.lod = expr(ir_unop_log2, rho);
>>>>>>
>>>>>> @@ -168,8 +182,12 @@ bool
>>>>>>  brw_lower_texture_gradients(struct brw_context *brw,
>>>>>>                              struct exec_list *instructions)
>>>>>>  {
>>>>>> +   /* sample_d is slower than the lowered version on gen7, and is not allowed
>>>>>> +    * in SIMD16 mode.  Treating it as unsupported improves the performance.
>>>>>> +    */
>>>>>> +   bool has_sample_d = brw->gen != 7;
>>>>>>     bool has_sample_d_c = brw->gen >= 8 || brw->is_haswell;
>>>>>> -   lower_texture_grad_visitor v(has_sample_d_c);
>>>>>> +   lower_texture_grad_visitor v(has_sample_d, has_sample_d_c);
>>>>>>
>>>>>>     visit_list_elements(&v, instructions);
>>>>>>
>>>>>> --
>>>>>> 1.8.3.1
>>>>>>
>>>>>> _______________________________________________
>>>>>> mesa-dev mailing list
>>>>>> mesa-dev at lists.freedesktop.org
>>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>>
>>>> --
>>>> olv at LunarG.com
>>
>
>
>
> --
> olv at LunarG.com

-- 
olv at LunarG.com