[Mesa-dev] [PATCH] glsl: Add a CSE pass.

Mon Oct 28 22:36:47 CET 2013

Paul Berry <stereotype441 at gmail.com> writes:

> On 18 October 2013 15:44, Eric Anholt <eric at anholt.net> wrote:
>
>> This only operates on constant/uniform values for now, because otherwise
>> I'd
>> have to deal with killing my available CSE entries when assignments happen,
>> and getting even this working in the tree ir was painful enough.
>>
>> As is, it has the following effect in shader-db:
>>
>> total instructions in shared programs: 1524077 -> 1521964 (-0.14%)
>> instructions in affected programs:     50629 -> 48516 (-4.17%)
>> GAINED:                                0
>> LOST:                                  0
>>
>> And, for tropics, that accounts for most of the effect, the FPS
>> improvement is 11.67% +/- 0.72% (n=3).

>> +   ir_rvalue *try_cse(ir_rvalue *rvalue);
>> +   void try_add_to_ae(ir_rvalue **rvalue);
>> +
>> +   virtual ir_visitor_status visit_enter(ir_function_signature *ir);
>> +   virtual ir_visitor_status visit_enter(ir_loop *ir);
>> +   virtual ir_visitor_status visit_enter(ir_if *ir);
>> +   virtual ir_visitor_status visit_enter(ir_call *ir);
>> +   virtual void handle_rvalue(ir_rvalue **rvalue);
>> +
>> +   /** List of ae_entry: The available expressions to reuse */
>> +   exec_list *ae;
>>
>
> Calling this "ae" makes the code below hard to read because when I run
> across it later, I assume it is referring to a single ae, not a list of ae
> entries.  Can we call it "ae_entries" or something?

opt_copy_propagation, opt_constant_propagation, and brw_fs_cse.cpp call
the list "acp", "acp", and "aeb" respectively -- calling the list
variable "available entries entries" seems redundant, but then the
structure name of "available entries entry" feels redundant too.

I've left this one as is for consistency with the other
similarly-structured passes.  Let me know if you care strongly about it.

>> +
>> +   bool progress;
>> +
>> +   void *mem_ctx;
>> +
>> +   /**
>> +    * The whole shader, so that we can validate_ir_tree in debug mode.
>> +    *
>> +    * This proved quite useful when trying to get the tree manipulation
>> +    * right.
>> +    */
>> +   exec_list *validate_instructions;
>>
>
> try_cse(), try_add_to_ae(), ae, mem_ctx, and validate_instructions are only
> used by members of the class.  For safety we should make them private.

I disagree with using "private" at all (similar to most use of "const"),
but I'm tired of fighting it and I've made the change.

>> +};
>> +
>> +class cse_operands_visitor : public ir_hierarchical_visitor
>> +{
>> +public:
>> +
>> +   cse_operands_visitor()
>> +      : ok(true)
>> +   {
>> +   }
>> +
>> +   virtual ir_visitor_status visit(ir_dereference_variable *ir);
>> +
>> +   bool ok;
>> +};
>>
>
> Can we have a sentence or two comment above this class explaining what it
> does?  It's not obvious from the class name.  Maybe something like "visitor
> to determine whether an expression is CSE-able based on the
> ir_variable_mode of its operands.

Added:

/**
 * Visitor to walk an expression tree to check that all variables referenced
 * are constants.
 */

> You might also consider renaming this class to a name that reflects its
> purpose rather than its implementation (e.g. "is_cse_candidate_visitor").

Done.

>> +class contains_rvalue_visitor : public ir_rvalue_visitor
>> +{
>> +public:
>> +
>> +   contains_rvalue_visitor(ir_rvalue *val)
>> +      : val(val)
>> +   {
>> +      found = false;
>> +   }
>> +
>> +   virtual void handle_rvalue(ir_rvalue **rvalue);
>> +
>> +   bool found;
>> +   ir_rvalue *val;
>>
>
> "val" should be private.

Done.

>> +ir_visitor_status
>> +cse_operands_visitor::visit(ir_dereference_variable *ir)
>> +{
>> +   /* Currently, since we don't handle kills of the ae based on variables
>> +    * getting assigned, we can only handle constant variables.
>> +    */
>> +   switch (ir->var->mode) {
>> +   case ir_var_uniform:
>> +   case ir_var_shader_in:
>> +   case ir_var_system_value:
>> +      return visit_continue;
>> +   default:
>> +      ok = false;
>> +      return visit_stop;
>> +   }
>>
>
> Would this work just as well?
>
>    if (ir->var->read_only) {
>       return visit_continue;
>    } else {
>       ok = false;
>       return visit_stop;
>    }

I'd forgotten about that field.  It'll catch CSE with things like
gl_MaxVertexAttribs, too.  Done.

>> +}
>> +
>> +void
>> +contains_rvalue_visitor::handle_rvalue(ir_rvalue **rvalue)
>> +{
>> +   if (*rvalue == val)
>> +      found = true;
>> +}
>> +
>> +static bool
>> +contains_rvalue(ir_rvalue *haystack, ir_rvalue *needle)
>> +{
>> +   contains_rvalue_visitor v(needle);
>> +   haystack->accept(&v);
>> +   return v.found;
>> +}
>> +
>> +static bool
>> +valid_cse_operands(ir_rvalue *ir)
>> +{
>> +   /* We don't want to "CSE" for bare derefs.  We might want to do so for
>> +    * variable index derefs though.
>> +    */
>> +   if (ir->as_dereference_variable())
>> +      return false;
>>
>
> The caller (cse_visitor::try_add_to_ae()) only tries to CSE rvalues of type
> ir_type_expression or ir_type_texture, so this "return false" will never
> get executed.  Also, we'll never CSE variable index derefs.  Is this how
> you meant things to work?

Doing CSE for array indexing was something I was thinking about, but I
was concerned about whether it would be a win in general: Do we do it
for small constant indices?  Do we need to do variable-index only?  I
eventually decided get the simplest thing finished, first, and this
detritus was left over.

>> +static bool
>> +equals(ir_texture *a, ir_texture *b)
>> +{
>> +   if (!a || !b)
>> +      return false;
>> +
>> +   if (a->type != b->type)
>> +      return false;
>> +
>> +   if (a->op != b->op)
>> +      return false;
>> +
>> +   if (!equals(a->coordinate, b->coordinate))
>> +      return false;
>> +
>> +   if (!equals(a->projector, b->projector))
>> +      return false;
>> +
>> +   if (!equals(a->shadow_comparitor, b->shadow_comparitor))
>> +      return false;
>> +
>> +   if (!equals(a->offset, b->offset))
>> +      return false;
>> +
>> +   if (!equals(a->sampler, b->sampler))
>> +      return false;
>> +
>> +   if (!equals(a->lod_info.grad.dPdx, b->lod_info.grad.dPdx) ||
>> +       !equals(a->lod_info.grad.dPdy, b->lod_info.grad.dPdy))
>> +      return false;
>>
>
> This seems fragile, since its correctness relies on the fact that:
> - Everything in the lod_info union consists of ir_rvalue *'s
> - lod_info.grad is the largest constituent of the union
> - ir_textures are rzalloc'ed, so if the texturing operation isn't txd,
> lod_info.grad.dPdy will be NULL.
>
> How about this instead?
>
>    switch (a->op) {
>    case ir_tex:
>    case ir_lod:
>    case ir_query_levels:
>       break;
>    case ir_txb:
>       if (!equals(a->lod_info.bias, b->lod_info.bias))
>          return false;
>       break;
>    case ir_txl:
>    case ir_txf:
>    case ir_txs:
>       if (!equals(a->lod_info.lod, b->lod_info.lod))
>          return false;
>       break;
>    case ir_txd:
>       if (!equals(a->lod_info.grad.dPdx, b->lod_info.grad.dPdx) ||
>           !equals(a->lod_info.grad.dPdy, b->lod_info.grad.dPdy))
>          return false;
>    case ir_txf_ms:
>       if (!equals(a->lod_info.sample_index, b->lod_info.sample_index))
>          return false;
>       break;
>    case ir_tg4:
>       if (!equals(a->lod_info.component, b->lod_info.component))
>          return false;
>    default:
>       assert(!"Unrecognized texture op");
>    }
>
> I realize it's more code, but it's safer, and I'm guessing it will be
> faster too, since it avoids calling equals() on irrelevant parts of the
> union.

Totally reasonable.  I've put your block in instead.

>> +
>> +static bool
>> +equals(ir_expression *a, ir_expression *b)
>> +{
>> +   if (!a || !b)
>> +      return false;
>> +
>> +   if (a->type != b->type)
>> +      return false;
>> +
>> +   if (a->operation != b->operation)
>> +      return false;
>> +
>> +   for (unsigned i = 0; i < Elements(a->operands); i++) {
>>
>
> Using a->get_num_operands() as the loop bound would probably be faster.

Done.

>> +/**
>> + * If the rvalue is an appropriate one, add it to the list of available
>> + * expressions for later CSEing if it gets seen again.
>> + */
>> +void
>> +cse_visitor::try_add_to_ae(ir_rvalue **inout_rvalue)
>> +{
>> +   ir_rvalue *rvalue = *inout_rvalue;
>> +
>> +   if (!rvalue->type->is_vector() &&
>> +       !rvalue->type->is_scalar()) {
>> +      /* Our temporary variable assignment generation isn't ready to
>> handle
>> +       * anything bigger than a vector.
>> +       */
>> +      return;
>> +   }
>> +
>> +   switch (rvalue->ir_type) {
>> +   case ir_type_expression:
>> +   case ir_type_texture:
>> +      break;
>> +   default:
>> +      return;
>> +   }
>>
>
> It seems strange to me that two of our criteria for determining whether
> something is CSE-able are in valid_cse_operands() (don't CSE bare variable
> derefs and don't CSE expressions containing non-const variables), and the
> other two of our criteria are here (don't CSE things larger than vectors,
> and only CSE expressions and textures).
>
> Can we rename valid_cse_operands() to something like valid_for_cse() and
> put all four criteria together into it?  (Well, actually, all three
> criteria, since "don't CSE bare variable derefs" is redundant with "only
> CSE expressions and textures".)

Yeah, that's a silly split I had.  I've moved the checks into
valid_cse_operands() and renamed it to is_cse_candidate().

> Also, can we move all of this logic to the top of
> cse_visitor::handle_rvalue() (before both the call to try_cse() and
> try_add_to_ae())?  That way we won't waste time searching the AE list for
> expressions that it could never possibly contain.

Done.

>> +ir_visitor_status
>> +cse_visitor::visit_enter(ir_function_signature *ir)
>> +{
>> +   ae->make_empty();
>> +   visit_list_elements(this, &ir->body);
>> +
>> +   ae->make_empty();
>> +   return visit_continue_with_parent;
>> +}
>> +
>> +ir_visitor_status
>> +cse_visitor::visit_enter(ir_loop *ir)
>> +{
>> +   ae->make_empty();
>> +   visit_list_elements(this, &ir->body_instructions);
>> +
>> +   ae->make_empty();
>> +   return visit_continue_with_parent;
>> +}
>>
>
> I think we could avoid having to write these three visitors if we wrote
> this optimization pass using call_for_basic_blocks().

I originally wrote it that way, but the problem was that running the
rvalue visitor on the if statement at the end of a basic block (since we
want to look at the condition field as part of that basic block) would
process the contents of then_instructions and else_instructions.

>> +ir_visitor_status
>> +cse_visitor::visit_enter(ir_call *ir)
>> +{
>> +   /* Because call is an exec_list of ir_rvalues, handle_rvalue gets
>> passed a
>> +    * pointer to the (ir_rvalue *) on the stack.  Since we save those
>> pointers
>> +    * in the AE list, we can't let handle_rvalue get called.
>> +    */
>> +   return visit_continue_with_parent;
>>
>
> Argh, this is really unfortunate.  Not only does this mean that we can't
> CSE expressions appearing as function arguments, we can't even CSE
> subexpressions of those expressions.
>
> For a long time I've been wanting to change ir_call::actual_parameters from
> this:
>
>    /* List of ir_rvalue of paramaters passed in this call. */
>    exec_list actual_parameters;
>
> to this:
>
>    /* Number of parameters passed in this call. */
>    unsigned num_parameters;
>
>    /* Parameters passed in this call. */
>    ir_rvalue **actual_parameters;
>
> Now I'm even more motivated to try that.  But that seems like something
> that should wait for after Mesa 10.0.
>
> For the time being what you're doing seems reasonable, though.

You might want to chat with Ken about his thoughts about this plan.  I
hear he's attempted it a few times and given up and gone home, and had a
new idea for the next try.

I would *love* to see this happen.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131028/6b877c08/attachment.pgp>