[Mesa-dev] [PATCH] i965/hsw: approximate DDX with a uniform value across a subspan
Chia-I Wu
olvaffe at gmail.com
Thu Sep 12 20:11:43 PDT 2013
On Thu, Sep 12, 2013 at 10:48 PM, Ian Romanick <idr at freedesktop.org> wrote:
> On 09/12/2013 01:06 AM, Chris Forbes wrote:
>> Can we make this approximation conditional on an image-quality control
>> in driconf [or somewhere else]?
>
> There's already a control that applications can use:
> GL_FRAGMENT_SHADER_DERIVATIVE_HINT. I don't know whether or not /any/
> app has ever used it. The default setting is GL_DONT_CARE, so,
> technically speaking, we could do this optimization whenever the hint
> isn't GL_NICEST. Though, we may want a driconf override anyway. Hmm...
How about, in generate_ddx():
if (brw->ctx.Hint.FragmentShaderDerivative == GL_NICEST ||
brw->accurate_ddx) {
// current code
}
else {
// new code
}
That is, when the app don't care, we treat it as GL_FASTEST. If the
user cares, he can set the new drirc option, accurate_ddx, to true to
override. accurate_ddx is false by default.
>> On Thu, Sep 12, 2013 at 5:00 PM, Chia-I Wu <olvaffe at gmail.com> wrote:
>>> From: Chia-I Wu <olv at lunarg.com>
>>>
>>> Replicate the gradient of the top-left pixel to the other three pixels in the
>>> subspan, as how DDY is implemented. Before, different graidents were used for
>>> pixels in the top row and pixels in the bottom row.
>>>
>>> This change results in a less accurate approximation. However, it improves
>>> the performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at
>>> 95.0% confidence) on Haswell. No noticeable image quality difference
>>> observed.
>>>
>>> No piglit gpu.tests regressions.
>>>
>>> I failed to come up with an explanation for the performance difference. The
>>> change does not make a difference on Ivy Bridge either. If anyone has the
>>> insight, please kindly enlighten me. Performance differences may also be
>>> observed on other games that call textureGrad and dFdx.
>>>
>>> Signed-off-by: Chia-I Wu <olv at lunarg.com>
>>> ---
>>> src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 17 +++++++++++++----
>>> 1 file changed, 13 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
>>> index bfb3d33..c0d24a0 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
>>> @@ -564,16 +564,25 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src
>>> void
>>> fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src)
>>> {
>>> + /* approximate with ((ss0.tr - ss0.tl)x4 (ss1.tr - ss1.tl)x4) on Haswell,
>>> + * which gives much better performance when the result is used with
>>> + * sample_d
>>> + */
>>> + unsigned vstride = (brw->is_haswell) ? BRW_VERTICAL_STRIDE_4 :
>>> + BRW_VERTICAL_STRIDE_2;
>>> + unsigned width = (brw->is_haswell) ? BRW_WIDTH_4 :
>>> + BRW_WIDTH_2;
>>> +
>>> struct brw_reg src0 = brw_reg(src.file, src.nr, 1,
>>> BRW_REGISTER_TYPE_F,
>>> - BRW_VERTICAL_STRIDE_2,
>>> - BRW_WIDTH_2,
>>> + vstride,
>>> + width,
>>> BRW_HORIZONTAL_STRIDE_0,
>>> BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
>>> struct brw_reg src1 = brw_reg(src.file, src.nr, 0,
>>> BRW_REGISTER_TYPE_F,
>>> - BRW_VERTICAL_STRIDE_2,
>>> - BRW_WIDTH_2,
>>> + vstride,
>>> + width,
>>> BRW_HORIZONTAL_STRIDE_0,
>>> BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
>>> brw_ADD(p, dst, src0, negate(src1));
>>> --
>>> 1.8.3.1
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
--
olv at LunarG.com
More information about the mesa-dev
mailing list