[Mesa-dev] [PATCH 1/2] glsl: Don't rehash the values when copying to new table
Connor Abbott
cwabbott0 at gmail.com
Sat Dec 31 04:39:24 UTC 2016
On Fri, Dec 30, 2016 at 10:04 AM, Vladislav Egorov <vegorov180 at gmail.com> wrote:
> Oh, thanks! A very nice torture case. However, it seems to be related to
> another problem. The test case has no IF or LOOP blocks, so it can't be
> possibly improved with speeding up the cloning of the table (the torture
> case can be modified with a few ifs and fors on top though...).
>
> Anyway, on Gallium the time goes down from 1.91s before my WIP patch to
> 0.10s after my patch-series mostly due to improvements in
> glsl_to_tgsi_visitor::copy_propagation(). On Intel I don't see much of
> improvement however: 157s (!) -> 132s.
>
> Seems to be another case of linear scanning of the ACP table. Mesa has
> several variations of the same copy propagation algorithm, and all of them
> (with exception do_copy_propagation_elements that uses a hash-table)
> linearly scan the entire table when they want to kill all the copies of a
> variable. Which makes the implementation O(N^2) and blows up execution time
> on huge functions like the ones from shadertoy. However most of the time it
> still spends in tree_grafting that also uses all types of linear scans and
> tree walks instead of lookup tables (ir_expression::get_num_operands() gets
> executed > 6 billion times!) -- I'll try to look into it.
Since the problem seemed interesting, I thought about it a little bit
and realized that we could do something even better than what
copy_propagation_elements does. Instead of having a hash table from
ir_variables to ir_variables, we could have a hash table from
ir_variable to acp_entry, where acp_entry contains the original
variable as well as the acp_entry (*not* just the variable) we want to
replace it with. That is, when we see a copy where we've never seen
the lhs or rhs before, we insert two new acp_entry's and make one
point to the other. To kill a variable, we would simply mark its
associated acp_entry as dead and remove it from the table. When we go
to look up a variable's replacement, we check if the acp_entry for the
replacement is dead and ignore it if it is. That should make
everything O(1) (assuming hash table lookup is O(1)) while being much
simpler and probably faster than keeping a separate hash table of rhs
variables.
Another thing that would be potentially interesting is to recursively
replace variables. Right now, with something like:
bar = foo
baz = bar
... = baz
we need two iterations of copy propagation in order to replace "baz"
with "foo" on the rhs. If instead of looking up baz -> bar we looped
through the acp_entry's until the replacement is killed or NULL, we
could look up baz -> bar -> foo and do the whole thing in one shot.
That might make some shaders require less iterations to reach a fixed
point, helping compile times further.
I haven't thought too much about copy_propagation_elements, but we
could probably adapt it to do a similar thing too. Of course, we'd get
less of a benefit out of that.
>
> Also, reading the bug report, I don't understand why did you treat
> do_copy_propagation_elements() and do_copy_propagation() differently -- in
> one case added hash-table for both sides of COPY(to, from), in another only
> for one side?
>
>
> 30.12.2016 09:08, Tapani Pälli пишет:
>
>>
>>
>> On 12/30/2016 05:53 AM, Vladislav Egorov wrote:
>>>
>>> I've looked into it recently (I'm working on series of many various
>>> trivial optimizations)
>>>
>>> and it's faster to just memcpy() it. Just throwing out superfluous
>>> hashing still keeps slow
>>>
>>> hash-table insertion around -- with resizing, rehashing, memory
>>> allocation/deallocation, internal
>>>
>>> hash-function through integer division, collisions and so on. It
>>> produces a nice speed improvement
>>>
>>> actually. It's possible to explore approaches without any copying at
>>> LOOP/IF entering at all,
>>>
>>> but I am not sure it will improve performance.
>>
>>
>> When profiling copy_propagation(_elements) pass you can use Martina's
>> testcase from this bug:
>>
>> https://bugs.freedesktop.org/show_bug.cgi?id=94477
>>
>> We still have a very long compile time for the WebGL case mentioned in the
>> bug so would be cool to have some optimizations there.
>>
>>
>>>
>>> 30.12.2016 02:49, Thomas Helland пишет:
>>>>
>>>> Really, we should have some kind of function for copying the whole
>>>> table,
>>>> but this will work for now.
>>>> ---
>>>> src/compiler/glsl/opt_copy_propagation.cpp | 6 ++++--
>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/src/compiler/glsl/opt_copy_propagation.cpp
>>>> b/src/compiler/glsl/opt_copy_propagation.cpp
>>>> index 247c498..e9f82e0 100644
>>>> --- a/src/compiler/glsl/opt_copy_propagation.cpp
>>>> +++ b/src/compiler/glsl/opt_copy_propagation.cpp
>>>> @@ -210,7 +210,8 @@
>>>> ir_copy_propagation_visitor::handle_if_block(exec_list *instructions)
>>>> /* Populate the initial acp with a copy of the original */
>>>> struct hash_entry *entry;
>>>> hash_table_foreach(orig_acp, entry) {
>>>> - _mesa_hash_table_insert(acp, entry->key, entry->data);
>>>> + _mesa_hash_table_insert_pre_hashed(acp, entry->hash,
>>>> + entry->key, entry->data);
>>>> }
>>>> visit_list_elements(this, instructions);
>>>> @@ -259,7 +260,8 @@ ir_copy_propagation_visitor::handle_loop(ir_loop
>>>> *ir, bool keep_acp)
>>>> if (keep_acp) {
>>>> struct hash_entry *entry;
>>>> hash_table_foreach(orig_acp, entry) {
>>>> - _mesa_hash_table_insert(acp, entry->key, entry->data);
>>>> + _mesa_hash_table_insert_pre_hashed(acp, entry->hash,
>>>> + entry->key, entry->data);
>>>> }
>>>> }
>>>>
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list