[Mesa-dev] [PATCH] i965/gen7: always lower textureGrad() on gen7
Chia-I Wu
olvaffe at gmail.com
Mon Sep 9 23:01:29 PDT 2013
On Tue, Sep 10, 2013 at 4:05 AM, Ian Romanick <idr at freedesktop.org> wrote:
> On 09/05/2013 03:35 AM, Chia-I Wu wrote:
>> sample_d is slower than the lowered version on gen7. For gen7, this improves
>> Xonotic benchmark with Ultimate effects by as much as 25%:
>>
>> before the change: 40.06 fps
>> after the change: 51.10 fps
>> after the change with INTEL_DEBUG=no16: 44.46 fps
>>
>> As sample_d is not allowed in SIMD16 mode, I firstly thought the difference
>> was from SIMD8 versus SIMD16. If that was the case, we would want to apply
>> brw_lower_texture_gradients() only on fragment shaders in SIMD16 mode.
>>
>> But, as the numbers show, there is still 10% improvement when SIMD16 is forced
>> off after the change. Thus textureGrad() is lowered unconditionally for now.
>> Due to this and that I haven't tried it on Haswell, this is still RFC.
>
> A lot of this code depends on the texture targets being used. What
> texture targets is Xonotic using with textureGrad?
Only sampler2D.
>
>> No piglit regressions.
>>
>> Signed-off-by: Chia-I Wu <olvaffe at gmail.com>
>> ---
>> .../dri/i965/brw_lower_texture_gradients.cpp | 54 ++++++++++++++--------
>> 1 file changed, 36 insertions(+), 18 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
>> index 1589a20..f3fcb56 100644
>> --- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
>> @@ -34,8 +34,8 @@ using namespace ir_builder;
>>
>> class lower_texture_grad_visitor : public ir_hierarchical_visitor {
>> public:
>> - lower_texture_grad_visitor(bool has_sample_d_c)
>> - : has_sample_d_c(has_sample_d_c)
>> + lower_texture_grad_visitor(bool has_sample_d, bool has_sample_d_c)
>> + : has_sample_d(has_sample_d), has_sample_d_c(has_sample_d_c)
>> {
>> progress = false;
>> }
>> @@ -44,6 +44,7 @@ public:
>>
>>
>> bool progress;
>> + bool has_sample_d;
>> bool has_sample_d_c;
>>
>> private:
>> @@ -90,22 +91,33 @@ txs_type(const glsl_type *type)
>> ir_visitor_status
>> lower_texture_grad_visitor::visit_leave(ir_texture *ir)
>> {
>> - /* Only lower textureGrad with shadow samplers */
>> - if (ir->op != ir_txd || !ir->shadow_comparitor)
>> + if (ir->op != ir_txd)
>> return visit_continue;
>>
>> - /* Lower textureGrad() with samplerCubeShadow even if we have the sample_d_c
>> - * message. GLSL provides gradients for the 'r' coordinate. Unfortunately:
>> - *
>> - * From the Ivybridge PRM, Volume 4, Part 1, sample_d message description:
>> - * "The r coordinate contains the faceid, and the r gradients are ignored
>> - * by hardware."
>> - *
>> - * We likely need to do a similar treatment for samplerCube and
>> - * samplerCubeArray, but we have insufficient testing for that at the moment.
>> - */
>> - bool need_lowering = !has_sample_d_c ||
>> - ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE;
>> + bool need_lowering = false;
>> +
>> + if (ir->shadow_comparitor) {
>> + /* Lower textureGrad() with samplerCubeShadow even if we have the
>> + * sample_d_c message. GLSL provides gradients for the 'r' coordinate.
>> + * Unfortunately:
>> + *
>> + * From the Ivybridge PRM, Volume 4, Part 1, sample_d message
>> + * description: "The r coordinate contains the faceid, and the r
>> + * gradients are ignored by hardware."
>> + */
>> + if (ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE)
>> + need_lowering = true;
>> + else if (!has_sample_d_c)
>> + need_lowering = true;
>
> This should look like the old code:
>
> need_lowering = !has_sample_d_c ||
> ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE;
Sure. I moved it so that it is clear the comments are for the first if-block.
>> + }
>> + else {
>> + /* We likely need to do a similar treatment for samplerCube and
>> + * samplerCubeArray, but we have insufficient testing for that at the
>> + * moment.
>> + */
>> + if (!has_sample_d)
>> + need_lowering = true;
>
> need_lowering = !has_sample_d;
Will do.
>
>> + }
>>
>> if (!need_lowering)
>> return visit_continue;
>> @@ -154,7 +166,9 @@ lower_texture_grad_visitor::visit_leave(ir_texture *ir)
>> expr(ir_unop_sqrt, dot(dPdy, dPdy)));
>> }
>>
>> - /* lambda_base = log2(rho). We're ignoring GL state biases for now. */
>> + /* lambda_base = log2(rho). It will be biased and clamped by values
>> + * defined in SAMPLER_STATE to get the final lambda.
>> + */
>> ir->op = ir_txl;
>> ir->lod_info.lod = expr(ir_unop_log2, rho);
>>
>> @@ -168,8 +182,12 @@ bool
>> brw_lower_texture_gradients(struct brw_context *brw,
>> struct exec_list *instructions)
>> {
>> + /* sample_d is slower than the lowered version on gen7, and is not allowed
>> + * in SIMD16 mode. Treating it as unsupported improves the performance.
>> + */
>> + bool has_sample_d = brw->gen != 7;
>
> Based on the data in one of the other messages, it sounds like this
> selector needs to be slightly more complex.
To extend it for Haswell and maybe Sandy Bridge? It is also possible
to apply this trick only on fragment shaders. Should I do that?
Now I have my Haswell set up, I have a few more observations. This
patch helps the performance for two reasons
- it allows the shaders to run in SIMD16 mode
- it replaces sample_d by sample_l (plus some math ops)
On both Ivy Bridge and Haswell, SIMD16 helps the performance by 10-15%
for this game. Avoiding sample_d helps the performance by another
10-15% on Ivy Bridge. However, it does not seem to help Haswell. I
am still trying to figure out why (or why it helps Ivy Bridge).
>
>> bool has_sample_d_c = brw->gen >= 8 || brw->is_haswell;
>> - lower_texture_grad_visitor v(has_sample_d_c);
>> + lower_texture_grad_visitor v(has_sample_d, has_sample_d_c);
>>
>> visit_list_elements(&v, instructions);
>
--
olv at LunarG.com
More information about the mesa-dev
mailing list