[Mesa-dev] The i965 vec4 backend, exec_masks, and 64-bit types

Tue Nov 3 11:06:58 PST 2015

Hi all,

While working on FP64 for i965, there's an issue that I thought of
with the vec4 backend that I'm not sure how to resolve. From what I
understand, the execmask works the same way in Align16 mode as Align1
mode, except that you only use the first 8 channels in practice for
SIMD4x2, and the first four channels are always the same as well as
the last 4 channels. But this doesn't work for 64-bit things, since
there we only operate on 4 components at the same time, so it's more
like SIMD2x2. For example, imagine that only the second vertex is
currently enabled at the moment. Then the execmask looks like
00001111, and if we do something like:

mul(4)          g24<1>DF     g12<4,4,1>DF g13<4,4,1>DF { align16 };

then all 4 channels will be disabled, which is not what we want.

I think the first thing to do is to write a piglit test that tests
this case, since currently all the arb_gpu_shader_fp64 tests only use
uniforms. We need a test that uses non-uniform control flow that
triggers the case described above. Once we do that, and if we
determine there's actually a problem, then we need to figure out how
to solve it.. The ideas I had were:

1. make every FP64 thing use WE_all. This isn't actually too bad at
the moment, since our notion of interference already assumes
(more-or-less) that everything is WE_all, but it prevents us from
improving it in the future with FP64 things. Unfortunately, it also
means that we can't use writemasks since setting WE_all makes the EU
ignore the writemask, so we'll have to do some trickery to get things
with only 1 channel enabled to work correctly.

2. Use the NibCtrl field, and split each FP64 operation into 2.
Unfortunately, this field only appeared on gen8, and the PRM only says
it works for SIMD4 operations, whereas we need it to work for SIMD2
operations, although there's a chance it'll actually work for SIMD2 as
well. This lets us potentially do better register allocation, but it
might not work and even if it does it won't work for gen7.

#1 sounds like the better solution for now, but who knows... maybe the
HW people magically made it work already, and I'm not aware or they
didn't document it.

Connor