[Mesa-dev] [PATCH] i965/gen7: always lower textureGrad() on gen7
Ian Romanick
idr at freedesktop.org
Mon Sep 9 13:05:22 PDT 2013
On 09/05/2013 03:35 AM, Chia-I Wu wrote:
> sample_d is slower than the lowered version on gen7. For gen7, this improves
> Xonotic benchmark with Ultimate effects by as much as 25%:
>
> before the change: 40.06 fps
> after the change: 51.10 fps
> after the change with INTEL_DEBUG=no16: 44.46 fps
>
> As sample_d is not allowed in SIMD16 mode, I firstly thought the difference
> was from SIMD8 versus SIMD16. If that was the case, we would want to apply
> brw_lower_texture_gradients() only on fragment shaders in SIMD16 mode.
>
> But, as the numbers show, there is still 10% improvement when SIMD16 is forced
> off after the change. Thus textureGrad() is lowered unconditionally for now.
> Due to this and that I haven't tried it on Haswell, this is still RFC.
A lot of this code depends on the texture targets being used. What
texture targets is Xonotic using with textureGrad?
> No piglit regressions.
>
> Signed-off-by: Chia-I Wu <olvaffe at gmail.com>
> ---
> .../dri/i965/brw_lower_texture_gradients.cpp | 54 ++++++++++++++--------
> 1 file changed, 36 insertions(+), 18 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
> index 1589a20..f3fcb56 100644
> --- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
> @@ -34,8 +34,8 @@ using namespace ir_builder;
>
> class lower_texture_grad_visitor : public ir_hierarchical_visitor {
> public:
> - lower_texture_grad_visitor(bool has_sample_d_c)
> - : has_sample_d_c(has_sample_d_c)
> + lower_texture_grad_visitor(bool has_sample_d, bool has_sample_d_c)
> + : has_sample_d(has_sample_d), has_sample_d_c(has_sample_d_c)
> {
> progress = false;
> }
> @@ -44,6 +44,7 @@ public:
>
>
> bool progress;
> + bool has_sample_d;
> bool has_sample_d_c;
>
> private:
> @@ -90,22 +91,33 @@ txs_type(const glsl_type *type)
> ir_visitor_status
> lower_texture_grad_visitor::visit_leave(ir_texture *ir)
> {
> - /* Only lower textureGrad with shadow samplers */
> - if (ir->op != ir_txd || !ir->shadow_comparitor)
> + if (ir->op != ir_txd)
> return visit_continue;
>
> - /* Lower textureGrad() with samplerCubeShadow even if we have the sample_d_c
> - * message. GLSL provides gradients for the 'r' coordinate. Unfortunately:
> - *
> - * From the Ivybridge PRM, Volume 4, Part 1, sample_d message description:
> - * "The r coordinate contains the faceid, and the r gradients are ignored
> - * by hardware."
> - *
> - * We likely need to do a similar treatment for samplerCube and
> - * samplerCubeArray, but we have insufficient testing for that at the moment.
> - */
> - bool need_lowering = !has_sample_d_c ||
> - ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE;
> + bool need_lowering = false;
> +
> + if (ir->shadow_comparitor) {
> + /* Lower textureGrad() with samplerCubeShadow even if we have the
> + * sample_d_c message. GLSL provides gradients for the 'r' coordinate.
> + * Unfortunately:
> + *
> + * From the Ivybridge PRM, Volume 4, Part 1, sample_d message
> + * description: "The r coordinate contains the faceid, and the r
> + * gradients are ignored by hardware."
> + */
> + if (ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE)
> + need_lowering = true;
> + else if (!has_sample_d_c)
> + need_lowering = true;
This should look like the old code:
need_lowering = !has_sample_d_c ||
ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE;
> + }
> + else {
> + /* We likely need to do a similar treatment for samplerCube and
> + * samplerCubeArray, but we have insufficient testing for that at the
> + * moment.
> + */
> + if (!has_sample_d)
> + need_lowering = true;
need_lowering = !has_sample_d;
> + }
>
> if (!need_lowering)
> return visit_continue;
> @@ -154,7 +166,9 @@ lower_texture_grad_visitor::visit_leave(ir_texture *ir)
> expr(ir_unop_sqrt, dot(dPdy, dPdy)));
> }
>
> - /* lambda_base = log2(rho). We're ignoring GL state biases for now. */
> + /* lambda_base = log2(rho). It will be biased and clamped by values
> + * defined in SAMPLER_STATE to get the final lambda.
> + */
> ir->op = ir_txl;
> ir->lod_info.lod = expr(ir_unop_log2, rho);
>
> @@ -168,8 +182,12 @@ bool
> brw_lower_texture_gradients(struct brw_context *brw,
> struct exec_list *instructions)
> {
> + /* sample_d is slower than the lowered version on gen7, and is not allowed
> + * in SIMD16 mode. Treating it as unsupported improves the performance.
> + */
> + bool has_sample_d = brw->gen != 7;
Based on the data in one of the other messages, it sounds like this
selector needs to be slightly more complex.
> bool has_sample_d_c = brw->gen >= 8 || brw->is_haswell;
> - lower_texture_grad_visitor v(has_sample_d_c);
> + lower_texture_grad_visitor v(has_sample_d, has_sample_d_c);
>
> visit_list_elements(&v, instructions);
More information about the mesa-dev
mailing list