[Mesa-dev] [PATCH 00/50] GL_ARB_gpu_shader_int64... this time for sure!

Connor Abbott cwabbott0 at gmail.com
Tue Dec 6 00:18:51 UTC 2016


On Mon, Dec 5, 2016 at 5:48 PM, Matt Turner <mattst88 at gmail.com> wrote:
> On Mon, Dec 5, 2016 at 2:20 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>> On Mon, Dec 5, 2016 at 5:09 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>> On Mon, Dec 5, 2016 at 3:22 PM, Matt Turner <mattst88 at gmail.com> wrote:
>>>> On 12/05, Matt Turner wrote:
>>>>>
>>>>> On 11/28, Ian Romanick wrote:
>>>>>>
>>>>>> From: Ian Romanick <ian.d.romanick at intel.com>
>>>>>>    Patches 42 through 50 enable the extension on BDW+.
>>>>>
>>>>>
>>>>> 42-48 are
>>>>>
>>>>> Reviewed-by: Matt Turner <mattst88 at gmail.com>
>>>>>
>>>>> I don't understand the 64-bit CMP issue, so I'm booting a SKL to see how
>>>>> fp64 works.
>>>>
>>>>
>>>> Ah, I think  I see. Because 16x doubles take up 4 registers, we have to
>>>> emit two CMP instructions, one with 1Q and one with 2Q:
>>>>
>>>> cmp.ge.f0(8)    null<1>DF       g2.2<0,1,0>DF   (abs)g11<4,4,1>DF { align1
>>>> 1Q };
>>>> cmp.ge.f0(8)    null<1>DF       g2.2<0,1,0>DF   (abs)g7<4,4,1>DF { align1 2Q
>>>> };
>>>>
>>>> (from fs-op-add-double-double.shader_test)
>>>>
>>>> Makes sense to me. 49 is
>>>>
>>>> Reviewed-by: Matt Turner <mattst88 at gmail.com>
>>>
>>> Actually, it's something a little different. The splitting you're
>>> talking about is handled just fine by curro's SIMD lowering pass. The
>>> issue here is that if you don't specify a null destination register
>>> (in which case this a moot point), CMP will always output the same
>>> destination bitsize as the source bitsize. That is, if you compare two
>>> registers with 8 doubles each (two SIMD8 registers each), the result
>>> will take up two SIMD8 registers instead of one as you'd expect. I
>>> couldn't track this down in the PRM, but I definitely remember having
>>> to implement it and getting wrong results without it. The end result
>>> is that you have to use a strided move to get the low 32 bits of each
>>> 64-bit destination channel, which is what subscript() does. This
>>> happens irrespective of whether you're compiling for SIMD8 or SIMD16.
>>> Of course, in this case some backend optimizations have managed to
>>> remove the destination register, so that's why you don't see it here,
>>> but if you do something trickier, like store the result to a buffer,
>>> the strided mov will be there.
>>>
>>> Anyways, that's what I remember of it... it's been a while.
>>
>> Although, the example you gave has a bug, since the second CMP
>> overwrites the result of the previous one... it looks like
>> lower_simd_width isn't offsetting the flag register correctly when
>> splitting the CMP.
>
> I assumed that quarter control would select which flag subregister to
> write... I sure hope that's how the hardware works.

Ah, yeah, that's right.


More information about the mesa-dev mailing list