[Mesa-dev] [PATCH] r600g: cache shader variants instead of rebuilding v2

Vadim Girlin vadimgirlin at gmail.com
Sun Jun 10 19:45:36 CEST 2012


On Sun, 2012-06-10 at 10:27 +0200, Christian König wrote:
> On 10.06.2012 04:07, Vadim Girlin wrote:
> > Shader variants are stored in the list, the key for lookup is based on the
> > states that require different hw shaders - currently it's rctx->two_side (all
> > gpus) and rctx->nr_cbufs (evergreen/cayman, when writes_all property is set).
> >
> > v2:
> >   - use simple list instead of keymap as suggested by Marek on irc
> >   - call r600_adjust_gprs from r600_bind_vs_shader for r6xx/r7xx
> >     (r600_shader_select isn't used for vertex shaders currently)
> >
> > Improves performance for some apps, e.g. FlightGear -
> > see https://bugs.freedesktop.org/show_bug.cgi?id=50360
> >
> > Signed-off-by: Vadim Girlin<vadimgirlin at gmail.com>
> Mhm, I really start wondering if it might not be easier to avoid having 
> different shader variants by using CF_COND_BOOL/CF_COND_NOT_BOOL for 
> those two special cases, e.g. build the shader in a way that it can 
> handle both variants and then select the one we currently want with the 
> CF bool constants.
> 
> If the shader overhead for it is to much we might also try using this 
> implementation only if the application really starts using those 
> features in question.
> 

I agree that we might want to use common shader code for those cases. I
just don't want to use control flow for that. According to the docs, the
cost of the single CF instruction is ~40x comparing to the cost of the
ALU instruction. And it seems we'll need to add 3 CF instructions to
guard color selection for the two_side case. I'm not sure how we could
use it for the writes_all case, where we need varying number of the
exports. 

There are other possible solutions, e.g. for the first case I think we
can pass bool value (0.0/1.0) to PS through the SPI by using
SPI_PS_INPUT_CNTL_x:DEFAULT_VAL and non-existant semantic index, or put
it into the constant buffer - we're already using special const buffer
to pass clip planes for clipvertex, so we can just add the constant for
that. Then we can MUL that value with the front_face to get the selector
value for the colors. Additional MUL instruction per shader could be
merged into some alu group, so I guess it might have lower overhead than
using control flow.

Regarding the writes_all case, I guess we simply need to try playing
with CB_SHADER_MASK, CB_TARGET_MASK, and some other bits to avoid
performance regression when the shader does export to all possible CBs,
as Alex implemented it initially. IIRC there were some changes related
to those masks after that, so maybe the problem is solved already. 

Anyway, those solutions will require additional time for implementation
and testing, and I'm not sure if they will result in a better
performance than caching. After all, it's not a high priority for me, I
just wanted to provide a quick fix for the performance problem with
FlightGear - I don't know any other apps that are affected by
rebuilding. I think we can improve it later if we need.

Vadim

> Cheers,
> Christian.
> 





More information about the mesa-dev mailing list