[Mesa-dev] [PATCH] r600g: rework check_and_set_bank_swizzle

Vadim Girlin vadimgirlin at gmail.com
Sat Jan 5 10:30:22 PST 2013


On 01/04/2013 09:09 PM, Christian König wrote:
> On 31.12.2012 22:55, Vadim Girlin wrote:
>> On 12/31/2012 05:34 AM, Vadim Girlin wrote:
>>> Optimize it for better performance.
>>>
>>> Signed-off-by: Vadim Girlin <vadimgirlin at gmail.com>
>>> ---
>>>
>>> I did some benchmarking for this patch together with the patch that
>>> introduces
>>> ISA tables, using the texCombine test (IIRC it builds hundreds of
>>> shaders) and
>>> making the driver build each shader 5000 times, total time of the
>>> test was
>>> reduced from 45 to 25 seconds, the time spent in
>>> check_and_set_bank_swizzle
>>> is about 8 times less according to the oprofile, the time of
>>> r600_shader_from_tgsi is reduced by half.
>>>
>>> You can also find the patch in this branch:
>>>
>>> http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-disasm
>>
>> After some other benchmarks I think that texCombine benchmark that I
>> used is close to the worst possible case (when most of the inst groups
>> do not use default swizzles, so we have to perform a lot of iterations
>> to find the correct swizzles). Now with other tests I see that in some
>> cases (where most of the alu groups just use default swizzles) this
>> patch might even have slightly worse performance due to additional
>> overhead. I'm going to improve the handling of the default case and
>> gather some stats over the popular apps, and then update the patch to
>> make it more efficient in every case.
>
> Hi Vadim,
>
> I spend quite some time trying to understand the different restrictions
> and implementing them in an usable algorithm before I gave up and just
> hacked the code how it is currently looks like. So please be aware that
> you're dealing with a quite complicated topic and improving it probably
> won't be done with a first shot.

Hi, Christian,

I understand, but in fact these restrictions are not as complicated as 
it seems in the isa docs, though I've also spent quite some time to 
understand that when I was implementing bank swizzle check for my old 
shader optimization branch. Existing function was not very suitable for 
me because I needed to perform a lot more bank swizzle checks in the 
scheduler while trying to pack more instructions in the group, and the 
performance of existing check_and_set_... function was very bad for that 
case. But here we mostly have already built groups (instead of 
performing the multiple checks when trying to add every instruction to 
the group), so this code needs to be optimized differently. IIRC this 
patch had no regressions with piglit on my evergreen (though I think 
there is at least one bug), and of course it can be improved further.

I decided to look into it because there are cases where existing 
function (check_and_set...) takes more than a half of total shader 
processing time, e.g. with the texCombine test that I used as a 
benchmark, and I think we have to improve that if we want to minimize 
shader compilation time. It's fast when we have "default" case (probably 
the most frequent case), that is, when we can use default (0 = VEC_012 = 
SCL_210) swizzles for all slots, but it becomes slow when we need to 
find out non-default combination of the bank swizzles, and even worse 
when there is no working swizzle combination for the group (currently 
it's possible when check_and_set_... is used from merge_inst_groups) - I 
think in most cases we can detect the fail faster than iterating through 
all possible combinations, e.g. we can fail earlier if we have more than 
3 different gpr elements for any channel in the source operands. Also it 
makes sense to check constants first as it's implemented in my patch. 
And then we'll have to perform full iteration over the swizzles only in 
some rare cases. Also I considered some more smart algorithms to get rid 
of the iteration completely, but the problem is that any smart 
algorithms can easily become more time-consuming than a simple iteration 
in most cases. Perhaps it makes sense to check for the default case 
first, and then try to do something smart if default swizzles don't work.

On the other hand, I'm still going to rework other parts of the backend 
too, so probably I'll just leave this code as is for now, because I'll 
use different implementation of the bank swizzle check for the scheduler 
anyway. (Though I'd like to implement everything in the llvm backend, 
but I think it will require more time from me, so reworking the existing 
backend first and implementing the scheduler and low-level optimizations 
here (similar to what I did in my old branch, but now I'm going to 
support more features) looks like a more realistic approach for now, and 
then we might move some already implemented and tested logic to the llvm 
backend if we want.)

>
> Additional to that it would be great if you could remove the forced bank
> swizzle and just have a single algorithm that handles the whole problem
> at just one point in the code generation path.
>

I don't think there will be any big difference for the algorithm of the 
bank swizzle computation. There are special cases where some predefined 
swizzles are required by the hardware, and algorithm shouldn't try to 
modify swizzles for such instructions at all. AFAIK we need such special 
handling when dealing with LDS, either directly with LDS_IDX_OP (we 
don't use it yet) or with the INTERP_XY/ZW instructions that are loading 
parameter values from LDS.

I think any algorithm will have to take into account these special cases 
where we always have to use some predefined bank swizzles. Although we 
can get rid of the bank_swizzle_force field and simply check some 
instruction flags instead, but I think it won't change the algorithm 
itself and additional lookup for flags of the every instruction in this 
frequently used function will be less efficient (unless we'll store 
instruction flags in the r600_bytecode_alu, but then it looks not very 
different from storing bank_swizzle_force).

Vadim

> Christian.
>
>>
>> Vadim
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>



More information about the mesa-dev mailing list