[Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results

Fri May 3 06:50:54 PDT 2013

Yes, that's why I said it looks like separate low and high bits in
opencl. So in opencl you will get the low and high parts separately anyway.
If we have only one instruction, we also probably really wanted to be
able to say that we may only need one or the other destination to avoid
extra work, and I'm not sure we can do that easily without adding some
more code.

Roland

Am 03.05.2013 15:05, schrieb Aaron Watry:
> Not sure if this helps much, but...
> 
> With gentype being one of:
> char, uchar, short, ushort, int, uint, long, ulong, and the widths
> being scalar, 2, 3, 4, 8, or 16 components wide.
> 
> From the OpenCL 1.1 spec:
> gentype mad_hi(gentype a, gentype b):
> Computes x * y and returns the high half of the product of x and y
> 
> gentype mad_hi (gentype x, gentype y, gentype z)
> result = mul_hi(a,b) + c
> 
> --Aaron
> 
> 
> On Fri, May 3, 2013 at 5:31 AM, Marek Olšák <maraeo at gmail.com> wrote:
>> FWIW, this maps nicely to r600, which also has separate instructions
>> for the low and high 32 bits. As to what option is better, it really
>> depends on whether shading languages and OpenCL expose the
>> instructions directly through functions, or whether they just have
>> 64-bit integers.
>>
>> Marek
>>
>> On Fri, May 3, 2013 at 1:29 AM, Roland Scheidegger <sroland at vmware.com> wrote:
>>> Currently, there's no way to get the high bits of a 32x32
>>> signed/unsigned integer multiplication with tgsi.
>>> However, all of d3d10, OpenGL, and OpenCL support that, so we need it as
>>> well.
>>> There's essentially two ways how it could be done:
>>> - a 2-destination instruction returning both high and low bits (this is
>>> how it looks like in d3d10 and glsl)
>>> - use the existing umul for the low bits and have another instruction
>>> for the high bits (this is how it looks like in opencl)
>>>
>>> Well there's other possibilities but these looked like they'd match both
>>> APIs and HW reasonably (well with the exception of things like sse2
>>> which would prefer 2x2 32bit inputs and return 2x64bit as one reg...).
>>>
>>> Actually it's two new instructions because unlike for the low bits it
>>> matters for the high bits if the source operands are signed or unsigned.
>>>
>>> Personally I'm favoring two separate instructions for low and high bits
>>> to not have to deal with multi-destination instructions, but if someone
>>> makes a strong case for one returning both low and high bits I could be
>>> convinced otherwise. I think though two instructions matches most hw
>>> very well (with the exception of software renderers and possibly intel
>>> graphics but then a good backend could certainly recognize this).
>>>
>>> So here's what the docs would say about these instructions:
>>>
>>>
>>> .. opcode:: IMUL_HI - Signed Integer Multiply High Bits
>>>
>>>    The high 32bits of the multiplication of 2 signed integers is returned.
>>>
>>> .. math::
>>>
>>>   dst.x = src0.x \times src1.x >> 32
>>>
>>>   dst.y = src0.y \times src1.y >> 32
>>>
>>>   dst.z = src0.z \times src1.z >> 32
>>>
>>>   dst.w = src0.w \times src1.w >> 32
>>>
>>>
>>> .. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
>>>
>>>    The high 32bits of the multiplication of 2 unsigned integers is returned.
>>>
>>> .. math::
>>>
>>>   dst.x = src0.x \times src1.x >> 32
>>>
>>>   dst.y = src0.y \times src1.y >> 32
>>>
>>>   dst.z = src0.z \times src1.z >> 32
>>>
>>>   dst.w = src0.w \times src1.w >> 32
>>>
>>>
>>> Roland
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev