[Mesa-dev] [PATCH] i965/hsw: approximate DDX with a uniform value across a subspan

Chia-I Wu olvaffe at gmail.com
Thu Sep 12 04:50:48 PDT 2013


On Thu, Sep 12, 2013 at 5:27 PM, Chris Forbes <chrisf at ijw.co.nz> wrote:
> I guess fast-by-default. I imagine that more apps care about
> performance than care about the granularity of their derivatives.
That is my preference too.  My concern is that the performance gain is
only observed on Haswell so far.  Why is that and is there a way to
speed up sample_d on Ivy Brdige and Sandy Brdige?

> After a bit more thought -- In HLSL shader model 5 there's both
> ddx_coarse() and ddx_fine() which gives the shader author the choice
> between roughly these options. In a *very* quick look I haven't found
> anything equivalent -- but I might just be being blind.
>
> CC'ing Ian -- any opinion? Is there any conformance issue here?
>
> -- Chris
>
> On Thu, Sep 12, 2013 at 8:41 PM, Chia-I Wu <olvaffe at gmail.com> wrote:
>> On Thu, Sep 12, 2013 at 2:06 PM, Chris Forbes <chrisf at ijw.co.nz> wrote:
>>> Can we make this approximation conditional on an image-quality control
>>> in driconf [or somewhere else]?
>> Sure.  What would be the default behavior?
>>
>>> On Thu, Sep 12, 2013 at 5:00 PM, Chia-I Wu <olvaffe at gmail.com> wrote:
>>>> From: Chia-I Wu <olv at lunarg.com>
>>>>
>>>> Replicate the gradient of the top-left pixel to the other three pixels in the
>>>> subspan, as how DDY is implemented.  Before, different graidents were used for
>>>> pixels in the top row and pixels in the bottom row.
>>>>
>>>> This change results in a less accurate approximation.  However, it improves
>>>> the performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at
>>>> 95.0% confidence) on Haswell.  No noticeable image quality difference
>>>> observed.
>>>>
>>>> No piglit gpu.tests regressions.
>>>>
>>>> I failed to come up with an explanation for the performance difference.  The
>>>> change does not make a difference on Ivy Bridge either.  If anyone has the
>>>> insight, please kindly enlighten me.  Performance differences may also be
>>>> observed on other games that call textureGrad and dFdx.
>>>>
>>>> Signed-off-by: Chia-I Wu <olv at lunarg.com>
>>>> ---
>>>>  src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 17 +++++++++++++----
>>>>  1 file changed, 13 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
>>>> index bfb3d33..c0d24a0 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
>>>> @@ -564,16 +564,25 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src
>>>>  void
>>>>  fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src)
>>>>  {
>>>> +   /* approximate with ((ss0.tr - ss0.tl)x4 (ss1.tr - ss1.tl)x4) on Haswell,
>>>> +    * which gives much better performance when the result is used with
>>>> +    * sample_d
>>>> +    */
>>>> +   unsigned vstride = (brw->is_haswell) ? BRW_VERTICAL_STRIDE_4 :
>>>> +                                          BRW_VERTICAL_STRIDE_2;
>>>> +   unsigned width = (brw->is_haswell) ? BRW_WIDTH_4 :
>>>> +                                        BRW_WIDTH_2;
>>>> +
>>>>     struct brw_reg src0 = brw_reg(src.file, src.nr, 1,
>>>>                                  BRW_REGISTER_TYPE_F,
>>>> -                                BRW_VERTICAL_STRIDE_2,
>>>> -                                BRW_WIDTH_2,
>>>> +                                vstride,
>>>> +                                width,
>>>>                                  BRW_HORIZONTAL_STRIDE_0,
>>>>                                  BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
>>>>     struct brw_reg src1 = brw_reg(src.file, src.nr, 0,
>>>>                                  BRW_REGISTER_TYPE_F,
>>>> -                                BRW_VERTICAL_STRIDE_2,
>>>> -                                BRW_WIDTH_2,
>>>> +                                vstride,
>>>> +                                width,
>>>>                                  BRW_HORIZONTAL_STRIDE_0,
>>>>                                  BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
>>>>     brw_ADD(p, dst, src0, negate(src1));
>>>> --
>>>> 1.8.3.1
>>>>
>>>> _______________________________________________
>>>> mesa-dev mailing list
>>>> mesa-dev at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>>
>>
>> --
>> olv at LunarG.com



-- 
olv at LunarG.com


More information about the mesa-dev mailing list