[Mesa-dev] [PATCH 00/50] GL_ARB_gpu_shader_int64... this time for sure!

Matt Turner mattst88 at gmail.com
Mon Dec 5 22:48:41 UTC 2016


On Mon, Dec 5, 2016 at 2:20 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Mon, Dec 5, 2016 at 5:09 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>> On Mon, Dec 5, 2016 at 3:22 PM, Matt Turner <mattst88 at gmail.com> wrote:
>>> On 12/05, Matt Turner wrote:
>>>>
>>>> On 11/28, Ian Romanick wrote:
>>>>>
>>>>> From: Ian Romanick <ian.d.romanick at intel.com>
>>>>>    Patches 42 through 50 enable the extension on BDW+.
>>>>
>>>>
>>>> 42-48 are
>>>>
>>>> Reviewed-by: Matt Turner <mattst88 at gmail.com>
>>>>
>>>> I don't understand the 64-bit CMP issue, so I'm booting a SKL to see how
>>>> fp64 works.
>>>
>>>
>>> Ah, I think  I see. Because 16x doubles take up 4 registers, we have to
>>> emit two CMP instructions, one with 1Q and one with 2Q:
>>>
>>> cmp.ge.f0(8)    null<1>DF       g2.2<0,1,0>DF   (abs)g11<4,4,1>DF { align1
>>> 1Q };
>>> cmp.ge.f0(8)    null<1>DF       g2.2<0,1,0>DF   (abs)g7<4,4,1>DF { align1 2Q
>>> };
>>>
>>> (from fs-op-add-double-double.shader_test)
>>>
>>> Makes sense to me. 49 is
>>>
>>> Reviewed-by: Matt Turner <mattst88 at gmail.com>
>>
>> Actually, it's something a little different. The splitting you're
>> talking about is handled just fine by curro's SIMD lowering pass. The
>> issue here is that if you don't specify a null destination register
>> (in which case this a moot point), CMP will always output the same
>> destination bitsize as the source bitsize. That is, if you compare two
>> registers with 8 doubles each (two SIMD8 registers each), the result
>> will take up two SIMD8 registers instead of one as you'd expect. I
>> couldn't track this down in the PRM, but I definitely remember having
>> to implement it and getting wrong results without it. The end result
>> is that you have to use a strided move to get the low 32 bits of each
>> 64-bit destination channel, which is what subscript() does. This
>> happens irrespective of whether you're compiling for SIMD8 or SIMD16.
>> Of course, in this case some backend optimizations have managed to
>> remove the destination register, so that's why you don't see it here,
>> but if you do something trickier, like store the result to a buffer,
>> the strided mov will be there.
>>
>> Anyways, that's what I remember of it... it's been a while.
>
> Although, the example you gave has a bug, since the second CMP
> overwrites the result of the previous one... it looks like
> lower_simd_width isn't offsetting the flag register correctly when
> splitting the CMP.

I assumed that quarter control would select which flag subregister to
write... I sure hope that's how the hardware works.


More information about the mesa-dev mailing list