[Mesa-dev] [PATCH 5/5] ac: use correct LLVM opcodes for ordered comparisons

Fri Feb 16 04:43:59 UTC 2018

On 15/02/18 20:06, Bas Nieuwenhuizen wrote:
> On Thu, Feb 15, 2018 at 8:54 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>> On Wed, Feb 14, 2018 at 11:53 PM, Timothy Arceri <tarceri at itsqueeze.com> wrote:
>>>
>>>
>>> On 15/02/18 04:39, Marek Olšák wrote:
>>>>
>>>> Reviewed-by: Marek Olšák <marek.olsak at amd.com>
>>>>
>>>> Marek
>>>>
>>>> On Wed, Feb 14, 2018 at 7:29 AM, Timothy Arceri <tarceri at itsqueeze.com>
>>>> wrote:
>>>>>
>>>>> Fixes glsl-1.30/execution/isinf-and-isnan* piglit tests for
>>>>> radeonsi and should fix SPIRV errors when LLVM optimises away
>>>>> the workarounds in vtn_handle_alu() for handling ordered
>>>>> comparisons.
>>>>>
>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104905
>>>>> ---
>>>>>    src/amd/common/ac_nir_to_llvm.c | 8 ++++----
>>>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/src/amd/common/ac_nir_to_llvm.c
>>>>> b/src/amd/common/ac_nir_to_llvm.c
>>>>> index a0c5680205..e81f86bb08 100644
>>>>> --- a/src/amd/common/ac_nir_to_llvm.c
>>>>> +++ b/src/amd/common/ac_nir_to_llvm.c
>>>>> @@ -1792,16 +1792,16 @@ static void visit_alu(struct ac_nir_context *ctx,
>>>>> const nir_alu_instr *instr)
>>>>>                   result = emit_int_cmp(&ctx->ac, LLVMIntUGE, src[0],
>>>>> src[1]);
>>>>>                   break;
>>>>>           case nir_op_feq:
>>>>> -               result = emit_float_cmp(&ctx->ac, LLVMRealUEQ, src[0],
>>>>> src[1]);
>>>>> +               result = emit_float_cmp(&ctx->ac, LLVMRealOEQ, src[0],
>>>>> src[1]);
>>>>>                   break;
>>>>>           case nir_op_fne:
>>>>> -               result = emit_float_cmp(&ctx->ac, LLVMRealUNE, src[0],
>>>>> src[1]);
>>>>> +               result = emit_float_cmp(&ctx->ac, LLVMRealONE, src[0],
>>>>> src[1]);
>>>
>>>
>>> It seems we need to leave this one as is to avoid regressions. This is also
>>> what radeonsi does.
>>
>> So, the thing you have to understand is that in LLVM unordered
>> comparisons are precisely the inverse of the ordered comparisons. That
>> is, (a <=(ordered) b) == !(a >(unordered b), (a ==(ordered) b) == !(a
>> !=(unordered) b), and  so on. C defines that all comparsions are
>> ordered except !=, so that (a == b) == !(a != b) always holds true.
>> Most hardware follows this convention -- offhand, x86 SSE is the only
>> ISA I know of with separate ordered and unordered comparisons, and
>> LLVM appears to have copied the distinction from them, but no one else
>> has both. I'm not even sure if it's in the IEEE spec. GLSL follows the
>> C convention, so glsl_to_nir just uses nir_op_fne to mean unordered
>> not-equal. spirv_to_nir generates some extra instructions, which then
>> get stripped away later... sigh.
>>
>> I think the right way to untangle this mess is to define that the NIR
>> opcodes should always match the C convention. The separate ordered and
>> unordered opcodes are unnecesary, since one is just the logical
>> negation of the other, and LLVM was a little overzealous -- I'm sure
>> they would get rid of the distinction if they had the chance -- and
>> then they were blindly copied to SPIR-V. spirv_to_nir should just
>> negate the result if necessary rather than emitting the extra code to
>> handle NaN, and ac should use ordered except for not-equals.
> 
> GCN hardware actually has both ordered and unordered instructions,
> though I think
> it could be fair to only introduce them during instruction selection
> (or conversion to LLVM)
> and keep a canonical ordered comparison + not in nir.
> 
> I think the most important part would be to firmly define which one
> the nir instructions are
> and then make nir_opt_algebraic not break that.

Well you guys seem to understand the issue and solutions better then I 
do so I think I'll step back from this for the time being. So feel free 
to tackle the issue.

I've also noticed what seems to be mishandling of nans elsewhere. For 
example on radeonsi the following test produces "vec1 32 ssa_5 = 
load_const (0xffffffff /* -nan */)" which seems to incorrectly handled 
and optimised out at some point along the way.

./glcts --deqp-case=KHR-GL43.compute_shader.atomic-case1