[Mesa-dev] [PATCH 4/4] llvmpipe: reduce alignment requirement for 1d resources from 4x4 to 4x1

Mon Jun 3 13:26:19 PDT 2013

Jose,

I'm actually seeing sort-of-a failure with wgf11rendertargets, but only
with vector size 128. After much head scratching, I can guess what might
be happening but not sure.
What happens it hat wgf11rendertargest terminates (getting the dreaded
windows app crash notice, however it does not trigger gdb but even if it
would I'm not sure I'd see anything as I don't think it happens
somewhere in our code). I do see "Invalid parameter passed to C runtime
function" though in the console. Seems to happen with r10g10b10a2 format
only (or maybe more such oddities but that's the first it crashes).
It is actually fixable by using lp_build_zero instead of lp_build_undef
for the undefined fs outputs in the blend function.
This however doesn't make much sense to me, I see no way that those
undef values actually can get used (not for this format), that is llvm
should have completely dropped them anyway.

Roland

Am 03.06.2013 22:00, schrieb sroland at vmware.com:
> From: Roland Scheidegger <sroland at vmware.com>
> 
> For rendering to buffers, we cannot have any y alignment.
> So make sure that tile clear commands only clear up to the fb width/height,
> not more (do this for all resources actually as clearing more seems
> pointless for other resources too). For the jit fs function, skip execution
> of the lower half of the fragment shader for the 4x4 stamp completely,
> for depth/stencil only load/store the values from the first row
> (replace other row with undef).
> For the blend function, also only load half the values from fs output,
> replace the rest with undefs so that everything still operates on the
> full 4x4 block to keep code the same between 4x1 and 4x4 (except for
> load/store of course which also needs to skip (store) or replace these
> values with undefs (load))., at the cost of slightly less optimal code
> being produced in some cases.
> Also reduce 1d and 1d array alignment too, because they can be handled the
> same as buffers so don't need to waste memory.
> 
> v2: don't try to run special blend code for 4x1, (very) slightly less
> complexity if we just use the same code as for 4x4 which may or may not
> make it easier to optimize in the future (as we care a lot more about 4x4
> performance than 1d)