[Mesa-dev] [PATCH 4/6] nir: add shader_clock intrinsic

Thu Oct 8 07:09:15 PDT 2015

On 8 October 2015 at 14:20, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Thu, Oct 8, 2015 at 6:54 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>> On 7 October 2015 at 23:50, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>> On Wed, Oct 7, 2015 at 4:48 PM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>>> On 7 October 2015 at 18:04, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>>>> On Wed, Oct 7, 2015 at 7:51 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>>>>> XXX: commit message, comment in nir_intrinsics.h
>>>>>>
>>>>>> Signed-off-by: Emil Velikov <emil.velikov at collabora.com>
>>>>>> ---
>>>>>>  src/glsl/nir/glsl_to_nir.cpp  | 6 ++++++
>>>>>>  src/glsl/nir/nir_intrinsics.h | 2 ++
>>>>>>  2 files changed, 8 insertions(+)
>>>>>>
>>>>>> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
>>>>>> index efaa73e..231bdbf 100644
>>>>>> --- a/src/glsl/nir/glsl_to_nir.cpp
>>>>>> +++ b/src/glsl/nir/glsl_to_nir.cpp
>>>>>> @@ -698,6 +698,8 @@ nir_visitor::visit(ir_call *ir)
>>>>>>           op = nir_intrinsic_ssbo_atomic_exchange;
>>>>>>        } else if (strcmp(ir->callee_name(), "__intrinsic_ssbo_atomic_comp_swap_internal") == 0) {
>>>>>>           op = nir_intrinsic_ssbo_atomic_comp_swap;
>>>>>> +      } else if (strcmp(ir->callee_name(), "__intrinsic_shader_clock") == 0) {
>>>>>> +         op = nir_intrinsic_shader_clock;
>>>>>>        } else {
>>>>>>           unreachable("not reached");
>>>>>>        }
>>>>>> @@ -802,6 +804,10 @@ nir_visitor::visit(ir_call *ir)
>>>>>>        case nir_intrinsic_memory_barrier:
>>>>>>           nir_instr_insert_after_cf_list(this->cf_node_list, &instr->instr);
>>>>>>           break;
>>>>>> +      case nir_intrinsic_shader_clock:
>>>>>> +         nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL);
>>>>>> +         nir_instr_insert_after_cf_list(this->cf_node_list, &instr->instr);
>>>>>> +         break;
>>>>>>        case nir_intrinsic_store_ssbo: {
>>>>>>           exec_node *param = ir->actual_parameters.get_head();
>>>>>>           ir_rvalue *block = ((ir_instruction *)param)->as_rvalue();
>>>>>> diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
>>>>>> index 263d8c1..4b32215 100644
>>>>>> --- a/src/glsl/nir/nir_intrinsics.h
>>>>>> +++ b/src/glsl/nir/nir_intrinsics.h
>>>>>> @@ -83,6 +83,8 @@ BARRIER(discard)
>>>>>>   */
>>>>>>  BARRIER(memory_barrier)
>>>>>>
>>>>>> +INTRINSIC(shader_clock, 0, ARR(), true, 1, 1, 0, 0 /* flags ? */)
>>>>>
>>>>> This should have NIR_INTRINSIC_CAN_DELETE, since if the result is
>>>>> unused we can safely delete it (i.e. it has no side effects), but we
>>>>> can't safely reorder it.
>>>>>
>>>> Thanks. Will do.
>>>>
>>>>> Side note: NIR's current model, as well as any more flexible memory
>>>>> model we might adopt in the future, assumes that intrinsics which are
>>>>> marked as reorderable, as well as ALU operations which are implicitly
>>>>> reorderable, can be freely reordered with respect to *any* other
>>>>> operation, even one that's explicitly not reorderable. So, for
>>>>> example, if you do:
>>>>>
>>>>> ... = clock();
>>>>> a = b + c;
>>>>> ... = clock();
>>>>>
>>>>> then there are no guarantees that the addition won't get moved outside
>>>>> the clock() calls. Currently, this will only happen if the addition
>>>>> becomes involved in some algebraic optimization or CSE, but in the
>>>>> future with passes like GCM that move code around indiscriminately
>>>>> it's going to be much more of a problem. I don't think we could really
>>>>> solve this problem in a useful and general way without making the rest
>>>>> of NIR significantly more complicated and slower, which I definitely
>>>>> don't want. I think the best answer is to say "really these tools are
>>>>> unreliable and meant mainly for driver developers and people who know
>>>>> what they're doing, and if you use them you have to be prepared to
>>>>> look at the assembly source and see if it matches what you expected."
>>>>>
>>>> I haven't looked at the optimisations closely and I assumed that all
>>>> intrinsics act as motion barriers. Seems like I was mistaking.
>>>> Can we call it a "where sub-group is implementation dependent" and be
>>>> done with it ;-)
>>>
>>> Or even "The units of time are not defined and need not be constant"
>>> -- I guess "return 0;" would be a legal implementation ;).
>>>
>> Bikeshedding aside - the spec is quite clear about the motion barrier
>> part. Personally I'm fine either way - leave it as is or look closer
>> at NIR. Just let know how you feel on the topic.
>
> It's my opinion, and Jason's as well, that implementing a "code motion
> barrier" as the spec describes is practically impossible in NIR or
> really any decent SSA-based IR. So we just can't follow that part of
> the spec.
>
>>
>>> But really the issue isn't with spec lawyering, it's with people
>>> potentially using it without knowing the caveats about the underlying
>>> compiler stack and how it might not always do what they think it does.
>>>
>> Replace "compiler stack" with "object X" and you can apply it to
>> everything in life :-P
>
> True :) although in this case, it's perhaps a bit more serious since
> it's against the "spirit" of the spec. But again, there's nothing we
> can really do about it.
>
Fair enough. I am not going to dig any more on the topic.

Can you suggest some patterns/sequences that won't get optimised (too
heavily) to use as piglit tests ? Would be nice to not have them
regress randomly, when opt X gets added/removed.

Cheers
Emil