[Bug 92760] Add FP64 support to the i965 shader backends

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Jun 6 12:34:14 UTC 2016


https://bugs.freedesktop.org/show_bug.cgi?id=92760

--- Comment #88 from Juan A. Suarez <jasuarez at igalia.com> ---
(In reply to Iago Toral from comment #66)
> Curro suggested that we can work around this problem in HSW+ by taking
> advantage of the fact that in these gens it is possible to process 8 DF
> elements in ALU operations, so we can simply not split dvec4 operations in
> dvec2 chunks, which would fix the problem automatically. Unfortunately, this
> also means that we need to figure out a solution to the problem with
> swizzles and writemasks working on 32-bit elements (more on that topic
> below).
> 
> Unfortunately, that solution would not work for IVB, since that hardware
> cannot run ALU instructions operating on more than 4 DF components. Curro
> thinks that in this case we could divide the SIMD4x2 execution in 2 SIMD4x1
> instructions. That is, we generate one dvec4 instruction for each logical
> thread and use NibCtrl to fixup execution masking for the second instruction.
> 
> 3. If we implement any solution that involves getting rid of the double
> splitting pass like the ones suggested above, we need to figure out a new
> solution to address the fact that swizzles operate in 32-bit elements, since
> we are back to a situation where we have to deal with dvec4 operands.
> 


Related with this topic, current liveness analysis code (and also DCE
optimization) just ignore the types when dealing with the swizzles. This works
fine, except when reading/writing a channel from DF type (64-bit channel), and
write/read it later as F (32-bit channel).

For instance, for this piece is code:

    mov(8) vgrf3.0.x:UD[1], 2576980378U[1] NoMask
    mov(8) vgrf3.0.y:UD[1], 1071225241U[1] NoMask
    mov(8) vgrf3.1.x:UD[1], 2576980378U[1] NoMask
    mov(8) vgrf3.1.y:UD[1], 1071225241U[1] NoMask
    mov(8) vgrf2.0.xy:DF[2], vgrf3.0.xxxx:DF[2]

Our liveness analysis just decide vgrf3.0.y channel is not live anymore and
thus DCE removes it. That is because in latest instruction it doesn't realize
vgrf3.0.x:DF is reading both vgrf3.0.x:F and vgrf3.0.y:F.

In order to avoid introducing too many changes in our current code, I've added
a pair of commits[1][2] that basically try to check if a 32-bit channels was
previously read as 64-bit (and the other way around). This seems to fix above
problem.

But as it was said in previous comments HSW interprets all swizzles channels as
32 bits, no matter the type. Iago added a pass that expands the 64bit swizzles
in 32bits. But this means once we expand them, we need to read all the channels
as 32bits,no matter the type.

In order to control this, I've added a boolean to specify if all the channels
must be interpreted as 32bit or not[3]. This is used both in liveness analysis
and DCE.

Just dropping here this comment to get feedback if the approach sounds
reasonable, or if we must change it.

[1]
https://github.com/Igalia/mesa/commit/7bb5dd6b8263f73ecf2f10ff1efc85130cc87bcd
[2]
https://github.com/Igalia/mesa/commit/06ad62a92515af242ba105e635f24e5c5117dda7
[3]
https://github.com/Igalia/mesa/commit/ca8cf009aa2eb61c145211c3461e29e4bf9a62ac

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20160606/226b48c2/attachment.html>


More information about the intel-3d-bugs mailing list