[Mesa-dev] [PATCH 00/50] GL_ARB_gpu_shader_int64... this time for sure!

Mon Dec 5 22:20:37 UTC 2016

On Mon, Dec 5, 2016 at 5:09 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Mon, Dec 5, 2016 at 3:22 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> On 12/05, Matt Turner wrote:
>>>
>>> On 11/28, Ian Romanick wrote:
>>>>
>>>> From: Ian Romanick <ian.d.romanick at intel.com>
>>>>    Patches 42 through 50 enable the extension on BDW+.
>>>
>>>
>>> 42-48 are
>>>
>>> Reviewed-by: Matt Turner <mattst88 at gmail.com>
>>>
>>> I don't understand the 64-bit CMP issue, so I'm booting a SKL to see how
>>> fp64 works.
>>
>>
>> Ah, I think  I see. Because 16x doubles take up 4 registers, we have to
>> emit two CMP instructions, one with 1Q and one with 2Q:
>>
>> cmp.ge.f0(8)    null<1>DF       g2.2<0,1,0>DF   (abs)g11<4,4,1>DF { align1
>> 1Q };
>> cmp.ge.f0(8)    null<1>DF       g2.2<0,1,0>DF   (abs)g7<4,4,1>DF { align1 2Q
>> };
>>
>> (from fs-op-add-double-double.shader_test)
>>
>> Makes sense to me. 49 is
>>
>> Reviewed-by: Matt Turner <mattst88 at gmail.com>
>
> Actually, it's something a little different. The splitting you're
> talking about is handled just fine by curro's SIMD lowering pass. The
> issue here is that if you don't specify a null destination register
> (in which case this a moot point), CMP will always output the same
> destination bitsize as the source bitsize. That is, if you compare two
> registers with 8 doubles each (two SIMD8 registers each), the result
> will take up two SIMD8 registers instead of one as you'd expect. I
> couldn't track this down in the PRM, but I definitely remember having
> to implement it and getting wrong results without it. The end result
> is that you have to use a strided move to get the low 32 bits of each
> 64-bit destination channel, which is what subscript() does. This
> happens irrespective of whether you're compiling for SIMD8 or SIMD16.
> Of course, in this case some backend optimizations have managed to
> remove the destination register, so that's why you don't see it here,
> but if you do something trickier, like store the result to a buffer,
> the strided mov will be there.
>
> Anyways, that's what I remember of it... it's been a while.

Although, the example you gave has a bug, since the second CMP
overwrites the result of the previous one... it looks like
lower_simd_width isn't offsetting the flag register correctly when
splitting the CMP.

>
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>