[Mesa-dev] [PATCH 09/22] i965/fs: add lowering x2d step for IVB/VLV

Fri Jan 13 22:36:46 UTC 2017

On Thu, Jan 5, 2017 at 5:07 AM, Samuel Iglesias Gonsálvez
<siglesias at igalia.com> wrote:
> From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
>
> On Ivybridge/Valleyview, when converting a float (F) to a double
> precision float (DF), the hardware automatically duplicates the source
> horizontal stride, hence converting only the values in odd positions.
>
> This commit adds a new lowering step, exclusively for IVB/VLV, where the
> sources are first copied in a temporal register with stride 2, and
> then converted from this temporal register. Thus, we do not lose any
> value.

Curro explained how he thinks the hardware works to me. I'll try to
reproduce that description here.

The FPU channels are 32-bits wide on IVB/BYT. Normally, for example
when operating on 8 float channels, the FPU is given a channel of the
source register to operate on, and each FPU channel produces a value
which is written to the channels of the destination.

But when operating on doubles, each *pair* of FPU channels operates on
one (double-precision) value. Unfortunately the hardware designers
didn't seem to update the input and output logic, so for instance
every pair of float channels from the source region are given as input
to the FPU, even though only the low (or even numbered) channel will
be used. This is why it appears that the hardware doubles the stride,
but it's really just ignoring all of the odd channels.

A similar thing happens on output. The output elements are 64-bits
(even if the output type is float), and so a destination stride of 1
means the writes are strided by 64-bits. This explains the strange
looking behavior you discovered of an instruction like mov(8) gX<1>F
gY<8,8,1>DF.

With that understanding, we actually can read consecutive float
channels and convert them to doubles in one instruction -- by using a
<1,2,0> region. Each float channel is read twice, and the second read
will be ignored by the FPU.

So we can replace this patch with the one I have attached. A nice side
effect of this is that we can simplify VEC4_OPCODE_TO_DOUBLE.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-i965-Use-source-region-1-2-0-when-converting-to-DF.patch
Type: text/x-patch
Size: 2971 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170113/dae42c9e/attachment-0001.bin>