[Mesa-dev] [PATCH 09/22] i965/fs: add lowering x2d step for IVB/VLV

Mon Jan 16 07:59:56 UTC 2017

On Fri, 2017-01-13 at 14:36 -0800, Matt Turner wrote:
> On Thu, Jan 5, 2017 at 5:07 AM, Samuel Iglesias Gonsálvez
> <siglesias at igalia.com> wrote:
> > From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
> > 
> > On Ivybridge/Valleyview, when converting a float (F) to a double
> > precision float (DF), the hardware automatically duplicates the
> > source
> > horizontal stride, hence converting only the values in odd
> > positions.
> > 
> > This commit adds a new lowering step, exclusively for IVB/VLV,
> > where the
> > sources are first copied in a temporal register with stride 2, and
> > then converted from this temporal register. Thus, we do not lose
> > any
> > value.
> 
> Curro explained how he thinks the hardware works to me. I'll try to
> reproduce that description here.
> 
> The FPU channels are 32-bits wide on IVB/BYT. Normally, for example
> when operating on 8 float channels, the FPU is given a channel of the
> source register to operate on, and each FPU channel produces a value
> which is written to the channels of the destination.
> 
> But when operating on doubles, each *pair* of FPU channels operates
> on
> one (double-precision) value. Unfortunately the hardware designers
> didn't seem to update the input and output logic, so for instance
> every pair of float channels from the source region are given as
> input
> to the FPU, even though only the low (or even numbered) channel will
> be used. This is why it appears that the hardware doubles the stride,
> but it's really just ignoring all of the odd channels.
> 
> A similar thing happens on output. The output elements are 64-bits
> (even if the output type is float), and so a destination stride of 1
> means the writes are strided by 64-bits. This explains the strange
> looking behavior you discovered of an instruction like mov(8) gX<1>F
> gY<8,8,1>DF.
> 
> With that understanding, we actually can read consecutive float
> channels and convert them to doubles in one instruction -- by using a
> <1,2,0> region. Each float channel is read twice, and the second read
> will be ignored by the FPU.
> 
> So we can replace this patch with the one I have attached. A nice
> side
> effect of this is that we can simplify VEC4_OPCODE_TO_DOUBLE.

Oh, thanks a lot for this explanation! It helps us a lot to understand
how IvyBridge works :-)

Thanks for the patch, I will apply it to our -rc2 branch.

Sam