[Mesa-dev] The i965 vec4 backend, exec_masks, and 64-bit types

Francisco Jerez currojerez at riseup.net
Tue Nov 3 11:53:10 PST 2015


Connor Abbott <cwabbott0 at gmail.com> writes:

> Hi all,
>
> While working on FP64 for i965, there's an issue that I thought of
> with the vec4 backend that I'm not sure how to resolve. From what I
> understand, the execmask works the same way in Align16 mode as Align1
> mode, except that you only use the first 8 channels in practice for
> SIMD4x2, and the first four channels are always the same as well as
> the last 4 channels. But this doesn't work for 64-bit things, since
> there we only operate on 4 components at the same time, so it's more
> like SIMD2x2. For example, imagine that only the second vertex is
> currently enabled at the moment. Then the execmask looks like
> 00001111, and if we do something like:
>
> mul(4)          g24<1>DF     g12<4,4,1>DF g13<4,4,1>DF { align16 };
>
> then all 4 channels will be disabled, which is not what we want.
>
AFAIUI this shouldn't be a problem.  In align16 mode each component of
an instruction with double-precision execution type maps to *two* bits
of the execmask instead of one (one for each 32-bit half), which is
compensated by each logical thread having two components instead of
four, so in your example [assuming 00001111 is little-endian notation
and you actually do 'mul(8)' ;)] the x and y components of the first
logical thread will be disabled while the x and y components of the
second logical thread will be enabled.

> I think the first thing to do is to write a piglit test that tests
> this case, since currently all the arb_gpu_shader_fp64 tests only use
> uniforms. We need a test that uses non-uniform control flow that
> triggers the case described above. Once we do that, and if we
> determine there's actually a problem, then we need to figure out how
> to solve it.. The ideas I had were:
>

I guess a piglit test would be nice, but you're unlikely to have to do
much about it. ;)

> 1. make every FP64 thing use WE_all. This isn't actually too bad at
> the moment, since our notion of interference already assumes
> (more-or-less) that everything is WE_all, but it prevents us from
> improving it in the future with FP64 things. Unfortunately, it also
> means that we can't use writemasks since setting WE_all makes the EU
> ignore the writemask, so we'll have to do some trickery to get things
> with only 1 channel enabled to work correctly.
>
> 2. Use the NibCtrl field, and split each FP64 operation into 2.
> Unfortunately, this field only appeared on gen8, and the PRM only says
> it works for SIMD4 operations, whereas we need it to work for SIMD2
> operations, although there's a chance it'll actually work for SIMD2 as
> well. This lets us potentially do better register allocation, but it
> might not work and even if it does it won't work for gen7.
>
NibCtrl is Gen7+ actually.  I believe that indeed has a good chance of
working for Align16 2-wide DF instructions but I don't know for sure
offhand.

> #1 sounds like the better solution for now, but who knows... maybe the
> HW people magically made it work already, and I'm not aware or they
> didn't document it.
>
> Connor
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20151103/78f27de5/attachment.sig>


More information about the mesa-dev mailing list