[Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results

Fri May 3 06:05:31 PDT 2013

Not sure if this helps much, but...

With gentype being one of:
char, uchar, short, ushort, int, uint, long, ulong, and the widths
being scalar, 2, 3, 4, 8, or 16 components wide.

>From the OpenCL 1.1 spec:
gentype mad_hi(gentype a, gentype b):
Computes x * y and returns the high half of the product of x and y

gentype mad_hi (gentype x, gentype y, gentype z)
result = mul_hi(a,b) + c

--Aaron

On Fri, May 3, 2013 at 5:31 AM, Marek Olšák <maraeo at gmail.com> wrote:
> FWIW, this maps nicely to r600, which also has separate instructions
> for the low and high 32 bits. As to what option is better, it really
> depends on whether shading languages and OpenCL expose the
> instructions directly through functions, or whether they just have
> 64-bit integers.
>
> Marek
>
> On Fri, May 3, 2013 at 1:29 AM, Roland Scheidegger <sroland at vmware.com> wrote:
>> Currently, there's no way to get the high bits of a 32x32
>> signed/unsigned integer multiplication with tgsi.
>> However, all of d3d10, OpenGL, and OpenCL support that, so we need it as
>> well.
>> There's essentially two ways how it could be done:
>> - a 2-destination instruction returning both high and low bits (this is
>> how it looks like in d3d10 and glsl)
>> - use the existing umul for the low bits and have another instruction
>> for the high bits (this is how it looks like in opencl)
>>
>> Well there's other possibilities but these looked like they'd match both
>> APIs and HW reasonably (well with the exception of things like sse2
>> which would prefer 2x2 32bit inputs and return 2x64bit as one reg...).
>>
>> Actually it's two new instructions because unlike for the low bits it
>> matters for the high bits if the source operands are signed or unsigned.
>>
>> Personally I'm favoring two separate instructions for low and high bits
>> to not have to deal with multi-destination instructions, but if someone
>> makes a strong case for one returning both low and high bits I could be
>> convinced otherwise. I think though two instructions matches most hw
>> very well (with the exception of software renderers and possibly intel
>> graphics but then a good backend could certainly recognize this).
>>
>> So here's what the docs would say about these instructions:
>>
>>
>> .. opcode:: IMUL_HI - Signed Integer Multiply High Bits
>>
>>    The high 32bits of the multiplication of 2 signed integers is returned.
>>
>> .. math::
>>
>>   dst.x = src0.x \times src1.x >> 32
>>
>>   dst.y = src0.y \times src1.y >> 32
>>
>>   dst.z = src0.z \times src1.z >> 32
>>
>>   dst.w = src0.w \times src1.w >> 32
>>
>>
>> .. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
>>
>>    The high 32bits of the multiplication of 2 unsigned integers is returned.
>>
>> .. math::
>>
>>   dst.x = src0.x \times src1.x >> 32
>>
>>   dst.y = src0.y \times src1.y >> 32
>>
>>   dst.z = src0.z \times src1.z >> 32
>>
>>   dst.w = src0.w \times src1.w >> 32
>>
>>
>> Roland
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev