[Mesa-dev] [PATCH] i965/fs: Don't disable SIMD16 when using the pixel interpolator

Sun Jul 5 15:19:09 PDT 2015

On Fri, Jul 3, 2015 at 3:46 AM, Francisco Jerez <currojerez at riseup.net> wrote:
> Heh, I happened to come across this comment yesterday while looking for
> the remaining no16 calls and wondered why on earth it couldn't do the
> same that the normal interpolation code does.  After this patch and a
> series coming up that will remove all SIMD8 fallbacks from the texturing
> code, the only case left still applicable to Gen7 hardware and later
> will be "SIMD16 explicit accumulator operands unsupported".  Anyone?

I can explain the problem:

Prior to Gen7, the were were two accumulator registers usable for most
datatypes (acc0, acc1). On Gen7, they removed integer-support from
acc1, which was necessary to implement SIMD16 integer multiplication
using the normal MUL/MACH sequence. I implemented 32-bit integer
multiplication without using the accumulator in:

commit f7df169ba13d22338e9276839a7e9629ca0a6b4f
Author: Matt Turner <mattst88 at gmail.com>
Date:   Wed May 13 18:34:03 2015 -0700

    i965/fs: Implement integer multiply without mul/mach.

The remaining cases of "SIMD16 explicit accumulator operands
unsupported" are ADDC, SUBB, and 32x32 -> high 32-bit multiplication.
The remaining multiplication case can probably be reimplemented
without the accumulator, like I did for the low 32-bit result.

The ADDC and SUBB instructions implicitly write a bit to the
accumulator if their operations overflowed. The 1Q/2Q quarter control
is supposed to select which register is implicitly written -- except
that there is no acc1 for integer types. Haswell and newer ignore the
quarter control and always write acc0, but IVB (and presumably BYT)
attempt to write to the nonexistent acc1.

You could split the the SIMD16 operations into 2x SIMD8s and set
force_writemask_all on the second, followed by a 2Q MOV from the
accumulator. Maybe we'd rather use the .o (overflow) conditional mod
on a result ADD to implement this.

Ideally, we'd recognize merge the addition and carry operations into a
single ADDC instruction, but it's pretty unimportant. It's all pretty
academic -- I've never seen an application use either operation (or
[iu]mulExtended either).