[Mesa-dev] [PATCH] r600g: rework check_and_set_bank_swizzle
Vadim Girlin
vadimgirlin at gmail.com
Sat Jan 5 10:30:22 PST 2013
On 01/04/2013 09:09 PM, Christian König wrote:
> On 31.12.2012 22:55, Vadim Girlin wrote:
>> On 12/31/2012 05:34 AM, Vadim Girlin wrote:
>>> Optimize it for better performance.
>>>
>>> Signed-off-by: Vadim Girlin <vadimgirlin at gmail.com>
>>> ---
>>>
>>> I did some benchmarking for this patch together with the patch that
>>> introduces
>>> ISA tables, using the texCombine test (IIRC it builds hundreds of
>>> shaders) and
>>> making the driver build each shader 5000 times, total time of the
>>> test was
>>> reduced from 45 to 25 seconds, the time spent in
>>> check_and_set_bank_swizzle
>>> is about 8 times less according to the oprofile, the time of
>>> r600_shader_from_tgsi is reduced by half.
>>>
>>> You can also find the patch in this branch:
>>>
>>> http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-disasm
>>
>> After some other benchmarks I think that texCombine benchmark that I
>> used is close to the worst possible case (when most of the inst groups
>> do not use default swizzles, so we have to perform a lot of iterations
>> to find the correct swizzles). Now with other tests I see that in some
>> cases (where most of the alu groups just use default swizzles) this
>> patch might even have slightly worse performance due to additional
>> overhead. I'm going to improve the handling of the default case and
>> gather some stats over the popular apps, and then update the patch to
>> make it more efficient in every case.
>
> Hi Vadim,
>
> I spend quite some time trying to understand the different restrictions
> and implementing them in an usable algorithm before I gave up and just
> hacked the code how it is currently looks like. So please be aware that
> you're dealing with a quite complicated topic and improving it probably
> won't be done with a first shot.
Hi, Christian,
I understand, but in fact these restrictions are not as complicated as
it seems in the isa docs, though I've also spent quite some time to
understand that when I was implementing bank swizzle check for my old
shader optimization branch. Existing function was not very suitable for
me because I needed to perform a lot more bank swizzle checks in the
scheduler while trying to pack more instructions in the group, and the
performance of existing check_and_set_... function was very bad for that
case. But here we mostly have already built groups (instead of
performing the multiple checks when trying to add every instruction to
the group), so this code needs to be optimized differently. IIRC this
patch had no regressions with piglit on my evergreen (though I think
there is at least one bug), and of course it can be improved further.
I decided to look into it because there are cases where existing
function (check_and_set...) takes more than a half of total shader
processing time, e.g. with the texCombine test that I used as a
benchmark, and I think we have to improve that if we want to minimize
shader compilation time. It's fast when we have "default" case (probably
the most frequent case), that is, when we can use default (0 = VEC_012 =
SCL_210) swizzles for all slots, but it becomes slow when we need to
find out non-default combination of the bank swizzles, and even worse
when there is no working swizzle combination for the group (currently
it's possible when check_and_set_... is used from merge_inst_groups) - I
think in most cases we can detect the fail faster than iterating through
all possible combinations, e.g. we can fail earlier if we have more than
3 different gpr elements for any channel in the source operands. Also it
makes sense to check constants first as it's implemented in my patch.
And then we'll have to perform full iteration over the swizzles only in
some rare cases. Also I considered some more smart algorithms to get rid
of the iteration completely, but the problem is that any smart
algorithms can easily become more time-consuming than a simple iteration
in most cases. Perhaps it makes sense to check for the default case
first, and then try to do something smart if default swizzles don't work.
On the other hand, I'm still going to rework other parts of the backend
too, so probably I'll just leave this code as is for now, because I'll
use different implementation of the bank swizzle check for the scheduler
anyway. (Though I'd like to implement everything in the llvm backend,
but I think it will require more time from me, so reworking the existing
backend first and implementing the scheduler and low-level optimizations
here (similar to what I did in my old branch, but now I'm going to
support more features) looks like a more realistic approach for now, and
then we might move some already implemented and tested logic to the llvm
backend if we want.)
>
> Additional to that it would be great if you could remove the forced bank
> swizzle and just have a single algorithm that handles the whole problem
> at just one point in the code generation path.
>
I don't think there will be any big difference for the algorithm of the
bank swizzle computation. There are special cases where some predefined
swizzles are required by the hardware, and algorithm shouldn't try to
modify swizzles for such instructions at all. AFAIK we need such special
handling when dealing with LDS, either directly with LDS_IDX_OP (we
don't use it yet) or with the INTERP_XY/ZW instructions that are loading
parameter values from LDS.
I think any algorithm will have to take into account these special cases
where we always have to use some predefined bank swizzles. Although we
can get rid of the bank_swizzle_force field and simply check some
instruction flags instead, but I think it won't change the algorithm
itself and additional lookup for flags of the every instruction in this
frequently used function will be less efficient (unless we'll store
instruction flags in the r600_bytecode_alu, but then it looks not very
different from storing bank_swizzle_force).
Vadim
> Christian.
>
>>
>> Vadim
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
More information about the mesa-dev
mailing list