[Mesa-dev] [PATCH 10/11] glsl: Improve precision of mod(x,y)
Ian Romanick
idr at freedesktop.org
Mon Jan 19 19:10:20 PST 2015
On 01/19/2015 03:32 AM, Eduardo Lima Mitev wrote:
> From: Iago Toral Quiroga <itoral at igalia.com>
>
> Currently, Mesa uses the lowering pass MOD_TO_FRACT to implement
> mod(x,y) as y * fract(x/y). This implementation has a down side though:
> it introduces precision errors due to the fract() operation. Even worse,
> since the result of fract() is multiplied by y, the larger y gets the
> larger the precision error we produce, so for large enough numbers the
> precision loss is significant. Some examples on i965:
>
> Operation Precision error
> -----------------------------------------------------
> mod(-1.951171875, 1.9980468750) 0.0000000447
> mod(121.57, 13.29) 0.0000023842
> mod(3769.12, 321.99) 0.0000762939
> mod(3769.12, 1321.99) 0.0001220703
> mod(-987654.125, 123456.984375) 0.0160663128
> mod( 987654.125, 123456.984375) 0.0312500000
>
> This patch replaces the current lowering pass with a different one
> (MOD_TO_FLOOR) that follows the recommended implementation in the GLSL
> man pages:
>
> mod(x,y) = x - y * floor(x/y)
>
> This implementation eliminates the precision errors at the expense of
> an additional add instruction on some systems. On systems that can do
> negate with multiply-add in a single operation this new implementation
> would come at no additional cost.
>
> Fixes the following 16 dEQP tests:
> dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.mediump_*
> dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.highp_*
> ---
> src/glsl/README | 2 +-
> src/glsl/ir_optimization.h | 2 +-
> src/glsl/lower_instructions.cpp | 49 ++++++++++++--------------
> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +-
> src/mesa/drivers/dri/i965/brw_shader.cpp | 2 +-
> src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 2 +-
> src/mesa/program/ir_to_mesa.cpp | 4 +--
> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +-
> 8 files changed, 31 insertions(+), 34 deletions(-)
>
> diff --git a/src/glsl/README b/src/glsl/README
> index 2f93f12..bfcf69f 100644
> --- a/src/glsl/README
> +++ b/src/glsl/README
> @@ -187,7 +187,7 @@ You may also need to update the backends if they will see the new expr type:
>
> You can then use the new expression from builtins (if all backends
> would rather see it), or scan the IR and convert to use your new
> -expression type (see ir_mod_to_fract, for example).
> +expression type (see ir_mod_to_floor, for example).
>
> Q: How is memory management handled in the compiler?
>
> diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
> index 34e0b4b..912d910 100644
> --- a/src/glsl/ir_optimization.h
> +++ b/src/glsl/ir_optimization.h
> @@ -34,7 +34,7 @@
> #define EXP_TO_EXP2 0x04
> #define POW_TO_EXP2 0x08
> #define LOG_TO_LOG2 0x10
> -#define MOD_TO_FRACT 0x20
> +#define MOD_TO_FLOOR 0x20
> #define INT_DIV_TO_MUL_RCP 0x40
> #define BITFIELD_INSERT_TO_BFM_BFI 0x80
> #define LDEXP_TO_ARITH 0x100
> diff --git a/src/glsl/lower_instructions.cpp b/src/glsl/lower_instructions.cpp
> index 6842853..b23c24d 100644
> --- a/src/glsl/lower_instructions.cpp
> +++ b/src/glsl/lower_instructions.cpp
> @@ -36,7 +36,7 @@
> * - EXP_TO_EXP2
> * - POW_TO_EXP2
> * - LOG_TO_LOG2
> - * - MOD_TO_FRACT
> + * - MOD_TO_FLOOR
> * - LDEXP_TO_ARITH
> * - BITFIELD_INSERT_TO_BFM_BFI
> * - CARRY_TO_ARITH
> @@ -77,14 +77,17 @@
> * Many older GPUs don't have an x**y instruction. For these GPUs, convert
> * x**y to 2**(y * log2(x)).
> *
> - * MOD_TO_FRACT:
> + * MOD_TO_FLOOR:
> * -------------
> - * Breaks an ir_binop_mod expression down to (op1 * fract(op0 / op1))
> + * Breaks an ir_binop_mod expression down to (op0 - op1 * floor(op0 / op1))
> *
> * Many GPUs don't have a MOD instruction (945 and 965 included), and
> * if we have to break it down like this anyway, it gives an
> * opportunity to do things like constant fold the (1.0 / op1) easily.
> *
> + * Note: before we used to implement this as op1 * fract(op / op1) but this
> + * implementation had significant precission errors.
> + *
> * LDEXP_TO_ARITH:
> * -------------
> * Converts ir_binop_ldexp to arithmetic and bit operations.
> @@ -136,7 +139,7 @@ private:
> void sub_to_add_neg(ir_expression *);
> void div_to_mul_rcp(ir_expression *);
> void int_div_to_mul_rcp(ir_expression *);
> - void mod_to_fract(ir_expression *);
> + void mod_to_floor(ir_expression *);
> void exp_to_exp2(ir_expression *);
> void pow_to_exp2(ir_expression *);
> void log_to_log2(ir_expression *);
> @@ -276,22 +279,15 @@ lower_instructions_visitor::log_to_log2(ir_expression *ir)
> }
>
> void
> -lower_instructions_visitor::mod_to_fract(ir_expression *ir)
> +lower_instructions_visitor::mod_to_floor(ir_expression *ir)
> {
> - ir_variable *temp = new(ir) ir_variable(ir->operands[1]->type, "mod_b",
> - ir_var_temporary);
> - this->base_ir->insert_before(temp);
> -
> - ir_assignment *const assign =
> - new(ir) ir_assignment(new(ir) ir_dereference_variable(temp),
> - ir->operands[1], NULL);
> -
> - this->base_ir->insert_before(assign);
> + ir_rvalue *x = ir->operands[0];
> + ir_rvalue *y = ir->operands[1];
> + ir_rvalue *x_clone = x->clone(ir, NULL);
> + ir_rvalue *y_clone = y->clone(ir, NULL);
My gut tells me this is a bad idea. What happens if you do the following?
z = mod(a * b * c * d, e + f + g + h);
It seems like it will produce duplicate expression trees. Although, we
might be "saved" by the way function call parameters are handled.
>
> ir_expression *const div_expr =
> - new(ir) ir_expression(ir_binop_div, ir->operands[0]->type,
> - ir->operands[0],
> - new(ir) ir_dereference_variable(temp));
> + new(ir) ir_expression(ir_binop_div, x->type, x, y);
>
> /* Don't generate new IR that would need to be lowered in an additional
> * pass.
> @@ -299,14 +295,15 @@ lower_instructions_visitor::mod_to_fract(ir_expression *ir)
> if (lowering(DIV_TO_MUL_RCP))
> div_to_mul_rcp(div_expr);
>
> - ir_rvalue *expr = new(ir) ir_expression(ir_unop_fract,
> - ir->operands[0]->type,
> - div_expr,
> - NULL);
> + ir_expression *const floor_expr =
> + new(ir) ir_expression(ir_unop_floor, x->type, div_expr);
>
> - ir->operation = ir_binop_mul;
> - ir->operands[0] = new(ir) ir_dereference_variable(temp);
> - ir->operands[1] = expr;
> + ir_expression *const mul_expr =
> + new(ir) ir_expression(ir_binop_mul, y_clone, floor_expr);
> +
> + ir->operation = ir_binop_sub;
> + ir->operands[0] = x_clone;
> + ir->operands[1] = mul_expr;
> this->progress = true;
> }
>
> @@ -535,8 +532,8 @@ lower_instructions_visitor::visit_leave(ir_expression *ir)
> break;
>
> case ir_binop_mod:
> - if (lowering(MOD_TO_FRACT) && ir->type->is_float())
> - mod_to_fract(ir);
> + if (lowering(MOD_TO_FLOOR) && ir->type->is_float())
> + mod_to_floor(ir);
> break;
>
> case ir_binop_pow:
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index 38cf29f..4a7e045 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -877,7 +877,7 @@ fs_visitor::visit(ir_expression *ir)
> break;
> }
> case ir_binop_mod:
> - /* Floating point should be lowered by MOD_TO_FRACT in the compiler. */
> + /* Floating point should be lowered by MOD_TO_FLOOR in the compiler. */
> assert(ir->type->is_integer());
> emit_math(SHADER_OPCODE_INT_REMAINDER, this->result, op[0], op[1]);
> break;
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp
> index d76134b..eac9982 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> @@ -157,7 +157,7 @@ brw_link_shader(struct gl_context *ctx, struct gl_shader_program *shProg)
> ? BITFIELD_INSERT_TO_BFM_BFI
> : 0;
> lower_instructions(shader->base.ir,
> - MOD_TO_FRACT |
> + MOD_TO_FLOOR |
> DIV_TO_MUL_RCP |
> SUB_TO_ADD_NEG |
> EXP_TO_EXP2 |
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index 8b8b27f..8129118 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -1522,7 +1522,7 @@ vec4_visitor::visit(ir_expression *ir)
> break;
> }
> case ir_binop_mod:
> - /* Floating point should be lowered by MOD_TO_FRACT in the compiler. */
> + /* Floating point should be lowered by MOD_TO_FLOOR in the compiler. */
> assert(ir->type->is_integer());
> emit_math(SHADER_OPCODE_INT_REMAINDER, result_dst, op[0], op[1]);
> break;
> diff --git a/src/mesa/program/ir_to_mesa.cpp b/src/mesa/program/ir_to_mesa.cpp
> index ce3af31..fc3dad7 100644
> --- a/src/mesa/program/ir_to_mesa.cpp
> +++ b/src/mesa/program/ir_to_mesa.cpp
> @@ -1152,7 +1152,7 @@ ir_to_mesa_visitor::visit(ir_expression *ir)
> assert(!"not reached: should be handled by ir_div_to_mul_rcp");
> break;
> case ir_binop_mod:
> - /* Floating point should be lowered by MOD_TO_FRACT in the compiler. */
> + /* Floating point should be lowered by MOD_TO_FLOOR in the compiler. */
> assert(ir->type->is_integer());
> emit(ir, OPCODE_MUL, result_dst, op[0], op[1]);
> break;
> @@ -2942,7 +2942,7 @@ _mesa_ir_link_shader(struct gl_context *ctx, struct gl_shader_program *prog)
>
> /* Lowering */
> do_mat_op_to_vec(ir);
> - lower_instructions(ir, (MOD_TO_FRACT | DIV_TO_MUL_RCP | EXP_TO_EXP2
> + lower_instructions(ir, (MOD_TO_FLOOR | DIV_TO_MUL_RCP | EXP_TO_EXP2
> | LOG_TO_LOG2 | INT_DIV_TO_MUL_RCP
> | ((options->EmitNoPow) ? POW_TO_EXP2 : 0)));
>
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index c3d7793..c9903cd 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -5418,7 +5418,7 @@ st_link_shader(struct gl_context *ctx, struct gl_shader_program *prog)
> lower_offset_arrays(ir);
> do_mat_op_to_vec(ir);
> lower_instructions(ir,
> - MOD_TO_FRACT |
> + MOD_TO_FLOOR |
> DIV_TO_MUL_RCP |
> EXP_TO_EXP2 |
> LOG_TO_LOG2 |
>
More information about the mesa-dev
mailing list