[Mesa-dev] [PATCH 1/1] nir: Use a freelist in nir_opt_dce to avoid spamming ralloc

Thu Mar 15 06:43:15 UTC 2018

Yup, most definitely. I just have one more thing to test before
sending out a V2. I've toyed around with arrays and sets and
stuff to see if there are better options than a linked list.
At least for now the answer is: "no, there isn't", but I'm gonna
test u_vector for this use later today to see if that is even better.
Expect new patch this evening CET.

2018-03-14 20:58 GMT+01:00 Dieter Nützel <Dieter at nuetzel-hh.de>:
> Hello Thomas,
>
> is this useful even after '[Mesa-dev] [PATCH 0/2] V2: Use hash table cloning
> in copy propagation' landed?
>
> I've running both together with Dave's '[Mesa-dev] [PATCH] radv/winsys:
> replace bo list searchs with a hash table.' patch.
>
> Dieter
>
>
> Am 24.01.2018 08:33, schrieb Thomas Helland:
>>
>> 2018-01-21 23:58 GMT+01:00 Eric Anholt <eric at anholt.net>:
>>>
>>> Thomas Helland <thomashelland90 at gmail.com> writes:
>>>
>>>> Also, allocate worklist_elem in groups of 20, to reduce the burden of
>>>> allocation. Do not use rzalloc, as there is no need. This lets us drop
>>>> the number of calls to ralloc from aproximately 10% of all calls to
>>>> ralloc(130 000 calls), down to a mere 2000 calls to ralloc_array_size.
>>>> This cuts the runtime of shader-db by 1%, while at the same time
>>>> reducing the number of stalled cycles, executed cycles, and executed
>>>> instructions by about 1 % as reported by perf. I did a five-run
>>>> benchmark pre and post and got a statistical variance less than 0.1% pre
>>>> and post. This was with i965's ir validation polluting the benchmark, so
>>>> the numbers are even better in release builds.
>>>>
>>>> Performance change as found with perf-diff:
>>>> 4.74%     -0.23%  libc-2.26.so            [.] _int_malloc
>>>> 1.88%     -0.21%  libc-2.26.so            [.] malloc
>>>> 2.27%     +0.16%  libmesa_dri_drivers.so  [.] match_value.part.7
>>>> 2.95%     -0.12%  libc-2.26.so            [.] _int_free
>>>>           +0.11%  libmesa_dri_drivers.so  [.] worklist_push
>>>> 1.22%     -0.08%  libc-2.26.so            [.] malloc_consolidate
>>>> 0.16%     -0.06%  libmesa_dri_drivers.so  [.] mark_live_cb
>>>> 1.21%     +0.06%  libmesa_dri_drivers.so  [.] match_expression.part.6
>>>> 0.75%     -0.05%  libc-2.26.so            [.] cfree at GLIBC_2.2.5
>>>> 0.50%     -0.05%  libmesa_dri_drivers.so  [.] ralloc_size
>>>> 0.57%     +0.04%  libmesa_dri_drivers.so  [.] nir_replace_instr
>>>> 1.29%     -0.04%  libmesa_dri_drivers.so  [.] unsafe_free
>>>
>>>
>>> I'm curious, since a NIR instruction worklist seems like a generally
>>> useful thing to have:
>>>
>>> Could nir_worklist.c keep the implementation of this?
>>>
>>> Also, I wonder if it wouldn't be even better to have a u_dynarray of
>>> instructions in the worklist, with push/pop on the end of the array, and
>>> a struct set tracking the instructions in the array to avoid
>>> double-adding.  I actually don't know if that would be better or not, so
>>> I'd be happy with the worklist management just moved to nir_worklist.c.
>>
>>
>> I'll look into this to see what I can do. nir_worklist.c at this time has
>> only
>> a block worklist. This numbers all the blocks, uses a bitset for checking
>> if the item is present, and uses an array with an index pointing to the
>> start of the queue of blocks in the buffer.
>>
>> The same scheme could be easily used for ssa-defs, as these are
>> also numbered. I actually did this for the VRP pass I wrote years ago.
>>
>> However, for instructions we do not have a way of numbering them,
>> so a different scheme would have to be used. A dynarray + set type
>> of thing, us you're suggesting, might get us where we want.
>> I'll see what I can come up with.
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev