[Mesa-dev] [PATCH 1/1] nir: Use a freelist in nir_opt_dce to avoid spamming ralloc
Dieter Nützel
Dieter at nuetzel-hh.de
Wed Mar 14 19:58:35 UTC 2018
Hello Thomas,
is this useful even after '[Mesa-dev] [PATCH 0/2] V2: Use hash table
cloning in copy propagation' landed?
I've running both together with Dave's '[Mesa-dev] [PATCH] radv/winsys:
replace bo list searchs with a hash table.' patch.
Dieter
Am 24.01.2018 08:33, schrieb Thomas Helland:
> 2018-01-21 23:58 GMT+01:00 Eric Anholt <eric at anholt.net>:
>> Thomas Helland <thomashelland90 at gmail.com> writes:
>>
>>> Also, allocate worklist_elem in groups of 20, to reduce the burden of
>>> allocation. Do not use rzalloc, as there is no need. This lets us
>>> drop
>>> the number of calls to ralloc from aproximately 10% of all calls to
>>> ralloc(130 000 calls), down to a mere 2000 calls to
>>> ralloc_array_size.
>>> This cuts the runtime of shader-db by 1%, while at the same time
>>> reducing the number of stalled cycles, executed cycles, and executed
>>> instructions by about 1 % as reported by perf. I did a five-run
>>> benchmark pre and post and got a statistical variance less than 0.1%
>>> pre
>>> and post. This was with i965's ir validation polluting the benchmark,
>>> so
>>> the numbers are even better in release builds.
>>>
>>> Performance change as found with perf-diff:
>>> 4.74% -0.23% libc-2.26.so [.] _int_malloc
>>> 1.88% -0.21% libc-2.26.so [.] malloc
>>> 2.27% +0.16% libmesa_dri_drivers.so [.] match_value.part.7
>>> 2.95% -0.12% libc-2.26.so [.] _int_free
>>> +0.11% libmesa_dri_drivers.so [.] worklist_push
>>> 1.22% -0.08% libc-2.26.so [.] malloc_consolidate
>>> 0.16% -0.06% libmesa_dri_drivers.so [.] mark_live_cb
>>> 1.21% +0.06% libmesa_dri_drivers.so [.] match_expression.part.6
>>> 0.75% -0.05% libc-2.26.so [.] cfree at GLIBC_2.2.5
>>> 0.50% -0.05% libmesa_dri_drivers.so [.] ralloc_size
>>> 0.57% +0.04% libmesa_dri_drivers.so [.] nir_replace_instr
>>> 1.29% -0.04% libmesa_dri_drivers.so [.] unsafe_free
>>
>> I'm curious, since a NIR instruction worklist seems like a generally
>> useful thing to have:
>>
>> Could nir_worklist.c keep the implementation of this?
>>
>> Also, I wonder if it wouldn't be even better to have a u_dynarray of
>> instructions in the worklist, with push/pop on the end of the array,
>> and
>> a struct set tracking the instructions in the array to avoid
>> double-adding. I actually don't know if that would be better or not,
>> so
>> I'd be happy with the worklist management just moved to
>> nir_worklist.c.
>
> I'll look into this to see what I can do. nir_worklist.c at this time
> has only
> a block worklist. This numbers all the blocks, uses a bitset for
> checking
> if the item is present, and uses an array with an index pointing to the
> start of the queue of blocks in the buffer.
>
> The same scheme could be easily used for ssa-defs, as these are
> also numbered. I actually did this for the VRP pass I wrote years ago.
>
> However, for instructions we do not have a way of numbering them,
> so a different scheme would have to be used. A dynarray + set type
> of thing, us you're suggesting, might get us where we want.
> I'll see what I can come up with.
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list