[Mesa-dev] [PATCH] gallivm: do per-pixel lod calculations for explicit lod

Wed Jul 3 08:59:08 PDT 2013

Am 03.07.2013 17:28, schrieb Jose Fonseca:
> I don't fully grasp every detail (many paths), but looks good in principle.
> 
> Where do the 16xf32 vectors come from?
Those are the size vectors. Normally (except for 1d case) these contain
width/height/depth/_ (or just width/height for 2d texture).
So, if there's one lod and we minify width/height/depth get all minified
in one step, and the sizes are then extracted from that.
In the 8-wide case if there's 2 lods the minfication is done separate
for both levels (because sse lacks the true vector shift) then the
vectors concatenated.
But now we've got (for the 8-wide case) 8 lods, hence 8 of these w/h/d/_
vectors all concatenated into one.
Now ideally I guess we'd just use separate vectors but that would mean
even more specialized code. Also, you have to consider even if you had a
separate "width" and "height" vector, it is impossible to minify them by
the lod values, it would be a buttload of scalar extraction, scalar
shifts and inserts (REALLY missing the true vector shift), unless you've
got AVX2 (or AMD Bulldozer) so it's probably still better to keep
w/h/d/_ vectors and just hope llvm does something reaosonable with them
(we hopefully shouldn't hit intrinsics as the vectors aren't really used
a lot).
In fact the code looks so butt ugly there for minification because
needed to keep a separate path for 4-wide and 8-wide. Because if you
have a w/h/d/_/w/h/d/_ vector and try to minify (right shift) by
l0l0l0l0l1l1l1l1 llvm cannot separate that into 2 4-wide parts last time
I checked, and that simple shift will be a mess of 16 extracts, 8 scalar
shifts and 8 inserts... All that because intel forgot the real vector
shift instruction before avx2, I don't think there's any other simd
instruction set which lacks it.

> 
> Also, please add a comment somewhere summarizing all the code paths for lod handling:
> 
>  - AVX vs non AVX
>  - SOA vs AOS
>  - scalar lod vs stamp lod
Ok I'll add some more clarifying comments.

Roland


> 
> But I couldn't spot anything wrong.
>