[Mesa-dev] [PATCH] intel/fs: Use a pure vertical stride for large register strides

Thu Nov 9 22:23:10 UTC 2017

On Thu, Nov 2, 2017 at 3:54 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> Register strides higher than 4 are uncommon but they can happen.  For
> instance, if you have a 64-bit extract_u8 operation, we turn that into
> UB -> UQ MOV with a source stride of 8.  Our previous calculation would
> try to generate a stride of <32;8,8>:ub which is invalid because the
> maximum horizontal stride is 4.  To solve this problem, we instead use a
> stride of <8;1,0>.  As noted in the comment, this does not work as a
> destination but that's ok as very few things actually generate that
> stride.

Please put the tests you fixed in the commit message. It's not okay to
leave that out for all the reasons that I'm sure you know.

Looks like this doesn't work on CHV, BXT, GLK :(

KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks now fails on CHV,
BXT, GLK with:

mov(8)          g21<1>UQ        g19<8,1,0>UB                    { align1 1Q };
        ERROR: Source and destination horizontal stride must equal and
a multiple of a qword when the execution type is 64-bit
        ERROR: Vstride must be Width * Hstride when the execution type is 64-bit

Modulo the typo in the first error, I think both of these are correct.
I don't think we can extract_u8 to a 64-bit type on Atom :(

This is filed as https://bugs.freedesktop.org/show_bug.cgi?id=103628