[Mesa-dev] [PATCH] glsl: optimize copy_propagation_elements pass

Ian Romanick idr at freedesktop.org
Fri Sep 30 00:18:13 UTC 2016


On 09/29/2016 12:17 AM, Tapani Pälli wrote:
> 
> On 09/28/2016 06:14 PM, Ian Romanick wrote:
>> On 09/16/2016 06:21 PM, Tapani Pälli wrote:
>>> Changes make copy_propagation_elements pass faster, reducing link
>>> time spent in test case of bug 94477. Does not fix the actual issue
>>> but brings down the total time. No regressions seen in CI.
>>
>> How does this affect the time of a full shader-db run?
> 
> Almost none at all, this is the open-source shaders (100 runs):
> 
> Difference at 95.0% confidence
>     0.0312 +/- 0.00502746
>     1.72566% +/- 0.278068%
>     (Student's t, pooled s = 0.0181375)
> 
> (testing with DOTA-2 shaders gave very similar result)
> 
> My assumption is that this really helps only the most pathological cases
> like in the bug where list size becomes enormous (thousands of entries).
> With just few entries, list is 'fast enough' to walk through anyway (?)
> 
> BTW Eric was proposing to just remove this pass. However when testing
> what happens on removal I noticed there's functional failures
> (arb_gpu_shader5-interpolateAtSample-dynamically-nonuniform starts to
> fail), so it seems we are currently dependent on this pass.
> 
>> There are a bunch of bits of this that are confusing to me.  I think
>> some high-level explanation about which hash tables the acp_ref can be
>> in, which lists it can be in, and how they relate would help.  I've
>> pointed out a couple of the confusing bits below.
>>
>>> Signed-off-by: Tapani Pälli <tapani.palli at intel.com>
>>> ---
>>>
>>> For performance measurements, Martina reported in the bug 8x speedup
>>> to the test case shader link time when using this patch together with
>>> commit 2cd02e30d2e1677762d34f1831b8e609970ef0f3
>>>
>>>  .../glsl/opt_copy_propagation_elements.cpp         | 187
>>> ++++++++++++++++-----
>>>  1 file changed, 145 insertions(+), 42 deletions(-)
>>>
>>> diff --git a/src/compiler/glsl/opt_copy_propagation_elements.cpp
>>> b/src/compiler/glsl/opt_copy_propagation_elements.cpp
>>> index e4237cc..1c5060a 100644
>>> --- a/src/compiler/glsl/opt_copy_propagation_elements.cpp
>>> +++ b/src/compiler/glsl/opt_copy_propagation_elements.cpp
>>> @@ -46,6 +46,7 @@
>>>  #include "ir_basic_block.h"
>>>  #include "ir_optimization.h"
>>>  #include "compiler/glsl_types.h"
>>> +#include "util/hash_table.h"
>>>
>>>  static bool debug = false;
>>>
>>> @@ -76,6 +77,18 @@ public:
>>>     int swizzle[4];
>>>  };
>>>
>>> +/* Class that refers to acp_entry in another exec_list. Used
>>> + * when making removals based on rhs.
>>> + */
>>> +class acp_ref : public exec_node
>>
>> This pattern is called a box, so maybe acp_box would be a better name.
>> I'm not too hung up on it.
>>
>> With this change, can the acp_entry itself still be in a list?
> 
> The idea here is a class that only refers to a acp_entry but does not
> take any ownership .. so it's really just a list of pointers. I'm OK
> with renaming it.

If only a boxed acp_entry can be in a list, then acp_entry doesn't need
to derive from exec_node.  That's why I was asking.  I looked at the
rest of the code again, and I now see that the acp_entry is in the lhs list.

Correct me if I'm wrong, but an entry will effectively be in two lists
at all times: the lhs list and the rhs list.

Assuming that previous assumption is correct, I might suggest a
different structure that makes it all less confusing.  Don't make
acp_entry drive from exec_node.  Instead, embed two acp_ref (or whatever
it ends up being called) nodes in the acp_entry:

    acp_ref lhs_node;
    acp_ref rhs_node;

When adding an entry to the lhs list, use

    lhs_list->push_tail(&entry->lhs_node);

Similar for rhs list:

    rhs_list->push_tail(&entry->rhs_node);

Walk the lists like:

   foreach_in_list_safe(acp_ref, ref, rhs_list) {
      acp_entry *entry = ref->entry;

      ...
   }

I think this would be a lot more clear because both lists are handled in
the same way.  It also avoids the overhead of allocating the boxes.

>>> +{
>>> +public:
>>> +   acp_ref(acp_entry *e)
>>> +   {
>>> +      entry = e;
>>> +   }
>>> +   acp_entry *entry;
>>> +};
>>>
>>>  class kill_entry : public exec_node
>>>  {
>>> @@ -98,14 +111,42 @@ public:
>>>        this->killed_all = false;
>>>        this->mem_ctx = ralloc_context(NULL);
>>>        this->shader_mem_ctx = NULL;
>>> -      this->acp = new(mem_ctx) exec_list;
>>>        this->kills = new(mem_ctx) exec_list;
>>> +
>>> +      create_acp();
>>>     }
>>>     ~ir_copy_propagation_elements_visitor()
>>>     {
>>>        ralloc_free(mem_ctx);
>>>     }
>>>
>>> +   void create_acp()
>>> +   {
>>> +      lhs_ht = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
>>> +                                       _mesa_key_pointer_equal);
>>> +      rhs_ht = _mesa_hash_table_create(mem_ctx, _mesa_hash_pointer,
>>> +                                       _mesa_key_pointer_equal);
>>> +   }
>>> +
>>> +   void destroy_acp()
>>> +   {
>>> +      _mesa_hash_table_destroy(lhs_ht, NULL);
>>> +      _mesa_hash_table_destroy(rhs_ht, NULL);
>>> +   }
>>> +
>>> +   void populate_acp(hash_table *lhs, hash_table *rhs)
>>> +   {
>>> +      struct hash_entry *entry;
>>> +      hash_table_foreach(lhs, entry)
>>> +      {
>>
>> Opening { on the hash_table_foreach line.
>>
>>> +         _mesa_hash_table_insert(lhs_ht, entry->key, entry->data);
>>> +      }
>>
>> Blank line here.
>>
>>> +      hash_table_foreach(rhs, entry)
>>> +      {
>>
>> Opening { on the hash_table_foreach line.
>>
>>> +         _mesa_hash_table_insert(rhs_ht, entry->key, entry->data);
>>> +      }
>>> +   }
> 
> thanks, will fix these
> 
>>> +
>>>     void handle_loop(ir_loop *, bool keep_acp);
>>>     virtual ir_visitor_status visit_enter(class ir_loop *);
>>>     virtual ir_visitor_status visit_enter(class ir_function_signature
>>> *);
>>> @@ -120,8 +161,10 @@ public:
>>>     void kill(kill_entry *k);
>>>     void handle_if_block(exec_list *instructions);
>>>
>>> -   /** List of acp_entry: The available copies to propagate */
>>> -   exec_list *acp;
>>> +   /** Hash of acp_entry: The available copies to propagate */
>>> +   hash_table *lhs_ht;
>>> +   hash_table *rhs_ht;
>>> +
>>>     /**
>>>      * List of kill_entry: The variables whose values were killed in
>>> this
>>>      * block.
>>> @@ -147,23 +190,29 @@
>>> ir_copy_propagation_elements_visitor::visit_enter(ir_function_signature
>>> *ir)
>>>      * block.  Any instructions at global scope will be shuffled into
>>>      * main() at link time, so they're irrelevant to us.
>>>      */
>>> -   exec_list *orig_acp = this->acp;
>>>     exec_list *orig_kills = this->kills;
>>>     bool orig_killed_all = this->killed_all;
>>>
>>> -   this->acp = new(mem_ctx) exec_list;
>>> +   hash_table *orig_lhs_ht = lhs_ht;
>>> +   hash_table *orig_rhs_ht = rhs_ht;
>>> +
>>>     this->kills = new(mem_ctx) exec_list;
>>
>> Orthogonal to this patch, there is some efficiency to be gained by not
>> doing this extra memory allocation.  Doing
>>
>>       this->kills.move_nodes_to(&orig_kills);
>>
>> here, and
>>
>>       assert(this->kills.is_empty());
>>       orig_kills.move_nodes_to(&this->kills);
>>
>> below is sufficient.  You'll also have to convert kills from exec_list*
>> to just exec_list.  One nice thing about that is it will just delete
>> code. :)
> 
> Sounds good to me if we can squeeze a bit more off, I can try adding this.

I'd suggest doing this as a separate patch that comes first.  It should
be a small change, so we ought to be able to land that without too much
debate. :)

>>>     this->killed_all = false;
>>>
>>> +   create_acp();
>>> +
>>>     visit_list_elements(this, &ir->body);
>>>
>>> -   ralloc_free(this->acp);
>>>     ralloc_free(this->kills);
>>>
>>> +   destroy_acp();
>>> +
>>>     this->kills = orig_kills;
>>> -   this->acp = orig_acp;
>>>     this->killed_all = orig_killed_all;
>>>
>>> +   lhs_ht = orig_lhs_ht;
>>> +   rhs_ht = orig_rhs_ht;
>>> +
>>>     return visit_continue_with_parent;
>>>  }
>>>
>>> @@ -249,17 +298,19 @@
>>> ir_copy_propagation_elements_visitor::handle_rvalue(ir_rvalue **ir)
>>>     /* Try to find ACP entries covering swizzle_chan[], hoping they're
>>>      * the same source variable.
>>>      */
>>> -   foreach_in_list(acp_entry, entry, this->acp) {
>>> -      if (var == entry->lhs) {
>>> -     for (int c = 0; c < chans; c++) {
>>> -        if (entry->write_mask & (1 << swizzle_chan[c])) {
>>> -           source[c] = entry->rhs;
>>> -           source_chan[c] = entry->swizzle[swizzle_chan[c]];
>>> +   hash_entry *ht_entry = _mesa_hash_table_search(lhs_ht, var);
>>> +   if (ht_entry) {
>>> +      exec_list *ht_list = (exec_list *) ht_entry->data;
>>> +      foreach_in_list(acp_entry, entry, ht_list) {
>>> +         for (int c = 0; c < chans; c++) {
>>> +            if (entry->write_mask & (1 << swizzle_chan[c])) {
>>> +               source[c] = entry->rhs;
>>> +               source_chan[c] = entry->swizzle[swizzle_chan[c]];
>>>
>>>                 if (source_chan[c] != swizzle_chan[c])
>>>                    noop_swizzle = false;
>>> -        }
>>> -     }
>>> +               }
>>
>> I think this indentation is off.
> 
> will fix
> 
>>> +         }
>>>        }
>>>     }
>>>
>>> @@ -319,7 +370,9 @@
>>> ir_copy_propagation_elements_visitor::visit_enter(ir_call *ir)
>>>     /* Since we're unlinked, we don't (necessarily) know the side
>>> effects of
>>>      * this call.  So kill all copies.
>>>      */
>>> -   acp->make_empty();
>>> +   _mesa_hash_table_clear(lhs_ht, NULL);
>>> +   _mesa_hash_table_clear(rhs_ht, NULL);
>>> +
>>>     this->killed_all = true;
>>>
>>>     return visit_continue_with_parent;
>>> @@ -328,31 +381,36 @@
>>> ir_copy_propagation_elements_visitor::visit_enter(ir_call *ir)
>>>  void
>>>  ir_copy_propagation_elements_visitor::handle_if_block(exec_list
>>> *instructions)
>>>  {
>>> -   exec_list *orig_acp = this->acp;
>>>     exec_list *orig_kills = this->kills;
>>>     bool orig_killed_all = this->killed_all;
>>>
>>> -   this->acp = new(mem_ctx) exec_list;
>>> +   hash_table *orig_lhs_ht = lhs_ht;
>>> +   hash_table *orig_rhs_ht = rhs_ht;
>>> +
>>>     this->kills = new(mem_ctx) exec_list;
>>>     this->killed_all = false;
>>>
>>> +   create_acp();
>>> +
>>>     /* Populate the initial acp with a copy of the original */
>>> -   foreach_in_list(acp_entry, a, orig_acp) {
>>> -      this->acp->push_tail(new(this->acp) acp_entry(a));
>>> -   }
>>> +   populate_acp(orig_lhs_ht, orig_rhs_ht);
>>>
>>>     visit_list_elements(this, instructions);
>>>
>>>     if (this->killed_all) {
>>> -      orig_acp->make_empty();
>>> +      _mesa_hash_table_clear(orig_lhs_ht, NULL);
>>> +      _mesa_hash_table_clear(orig_rhs_ht, NULL);
>>>     }
>>>
>>>     exec_list *new_kills = this->kills;
>>>     this->kills = orig_kills;
>>> -   ralloc_free(this->acp);
>>> -   this->acp = orig_acp;
>>>     this->killed_all = this->killed_all || orig_killed_all;
>>>
>>> +   destroy_acp();
>>> +
>>> +   lhs_ht = orig_lhs_ht;
>>> +   rhs_ht = orig_rhs_ht;
>>> +
>>>     /* Move the new kills into the parent block's list, removing them
>>>      * from the parent's ACP list in the process.
>>>      */
>>> @@ -378,37 +436,42 @@
>>> ir_copy_propagation_elements_visitor::visit_enter(ir_if *ir)
>>>  void
>>>  ir_copy_propagation_elements_visitor::handle_loop(ir_loop *ir, bool
>>> keep_acp)
>>>  {
>>> -   exec_list *orig_acp = this->acp;
>>>     exec_list *orig_kills = this->kills;
>>>     bool orig_killed_all = this->killed_all;
>>>
>>> +   hash_table *orig_lhs_ht = lhs_ht;
>>> +   hash_table *orig_rhs_ht = rhs_ht;
>>> +
>>>     /* FINISHME: For now, the initial acp for loops is totally empty.
>>>      * We could go through once, then go through again with the acp
>>>      * cloned minus the killed entries after the first run through.
>>>      */
>>> -   this->acp = new(mem_ctx) exec_list;
>>>     this->kills = new(mem_ctx) exec_list;
>>>     this->killed_all = false;
>>>
>>> +   create_acp();
>>> +
>>>     if (keep_acp) {
>>>        /* Populate the initial acp with a copy of the original */
>>> -      foreach_in_list(acp_entry, a, orig_acp) {
>>> -         this->acp->push_tail(new(this->acp) acp_entry(a));
>>> -      }
>>> +      populate_acp(orig_lhs_ht, orig_rhs_ht);
>>>     }
>>>
>>>     visit_list_elements(this, &ir->body_instructions);
>>>
>>>     if (this->killed_all) {
>>> -      orig_acp->make_empty();
>>> +      _mesa_hash_table_clear(orig_lhs_ht, NULL);
>>> +      _mesa_hash_table_clear(orig_rhs_ht, NULL);
>>>     }
>>>
>>>     exec_list *new_kills = this->kills;
>>>     this->kills = orig_kills;
>>> -   ralloc_free(this->acp);
>>> -   this->acp = orig_acp;
>>>     this->killed_all = this->killed_all || orig_killed_all;
>>>
>>> +   destroy_acp();
>>> +
>>> +   lhs_ht = orig_lhs_ht;
>>> +   rhs_ht = orig_rhs_ht;
>>> +
>>>     foreach_in_list_safe(kill_entry, k, new_kills) {
>>>        kill(k);
>>>     }
>>> @@ -430,16 +493,33 @@
>>> ir_copy_propagation_elements_visitor::visit_enter(ir_loop *ir)
>>>  void
>>>  ir_copy_propagation_elements_visitor::kill(kill_entry *k)
>>>  {
>>> -   foreach_in_list_safe(acp_entry, entry, acp) {
>>> -      if (entry->lhs == k->var) {
>>> -     entry->write_mask = entry->write_mask & ~k->write_mask;
>>> -     if (entry->write_mask == 0) {
>>> -        entry->remove();
>>> -        continue;
>>> -     }
>>> +   /* removal of lhs entries */
>>> +   hash_entry *ht_entry = _mesa_hash_table_search(lhs_ht, k->var);
>>> +   if (ht_entry) {
>>> +      exec_list *lhs_list = (exec_list *) ht_entry->data;
>>> +      foreach_in_list_safe(acp_entry, entry, lhs_list) {
>>> +         entry->write_mask = entry->write_mask & ~k->write_mask;
>>> +         if (entry->write_mask == 0) {
>>> +            entry->remove();
>>> +            continue;
>>> +         }
>>>        }
>>> -      if (entry->rhs == k->var) {
>>> -     entry->remove();
>>> +   }
>>> +
>>> +   /* removal of rhs entries */
>>> +   ht_entry = _mesa_hash_table_search(rhs_ht, k->var);
>>> +   if (ht_entry) {
>>> +      exec_list *rhs_list = (exec_list *) ht_entry->data;
>>> +      foreach_in_list_safe(acp_ref, ref, rhs_list) {
>>> +        acp_entry *entry = ref->entry;
>>> +        /* If entry is still in a list (not already removed by lhs
>>> entry
>>> +          * removal above), remove it.
>>> +          */
>>> +         if (entry->prev || entry->next)
>>> +            entry->remove();
>>
>> This seems weird.
> 
> Why it's done this way is to separate rhs entry removal from lhs entry
> removal so that we would not end up seeking rhs hash for each lhs entry
> removed. Existing implementation removes from exec_list if lhs matches
> or rhs matches, I did not find way to do that efficiently in one pass.
> 
> So what's done instead is that in the first loop (lhs entries) we remove
> based on lhs. Now when iterating 'rhs_ht' some of the entries have been
> potentially been removed from list already. This checks if that has
> happened, if not we removal here instead.
> 
>>> +
>>> +         /* remove from acp_ref list */
>>> +         ref->remove();
>>
>> What list is ref in?
> 
> This is list of 'acp_ref' in rhs_ht. So first we removed this from
> exec_list that is in lhs_ht and now we remove from rhs_ht lists. I agree
> that this overall structure is a bit confusing but I did not find other
> solution that would be as fast. Comments below (when adding new items to
> hashes) tries to document the overall structure, maybe some more
> comments required there?
> 
> 
>>>        }
>>>     }
>>>
>>> @@ -513,7 +593,30 @@
>>> ir_copy_propagation_elements_visitor::add_copy(ir_assignment *ir)
>>>
>>>     entry = new(this->mem_ctx) acp_entry(lhs->var, rhs->var, write_mask,
>>>                      swizzle);
>>> -   this->acp->push_tail(entry);
>>> +
>>> +   /* lhs hash, hash of lhs -> acp_entry lists */
>>> +   hash_entry *ht_entry = _mesa_hash_table_search(lhs_ht, lhs->var);
>>> +   if (ht_entry) {
>>> +      exec_list *lhs_list = (exec_list *) ht_entry->data;
>>> +      lhs_list->push_tail(entry);
>>> +   } else {
>>> +      exec_list *lhs_list = new(mem_ctx) exec_list;
>>> +      lhs_list->push_tail(entry);
>>> +      _mesa_hash_table_insert(lhs_ht, lhs->var, lhs_list);
>>> +   }
>>> +
>>> +   acp_ref *ref = new(mem_ctx) acp_ref(entry);
>>> +
>>> +   /* rhs hash, hash of rhs -> acp_entry pointers to lhs lists */
>>> +   ht_entry = _mesa_hash_table_search(rhs_ht, rhs->var);
>>> +   if (ht_entry) {
>>> +      exec_list *rhs_list = (exec_list *) ht_entry->data;
>>> +      rhs_list->push_tail(ref);
>>> +   } else {
>>> +      exec_list *rhs_list = new(mem_ctx) exec_list;
>>> +      rhs_list->push_tail(ref);
>>> +      _mesa_hash_table_insert(rhs_ht, rhs->var, rhs_list);
>>> +   }
>>>  }
>>>
>>>  bool



More information about the mesa-dev mailing list