[Mesa-dev] [PATCH 4/6] nir: add shader_clock intrinsic

Thu Oct 8 07:05:18 PDT 2015

On 8 October 2015 at 14:26, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Thu, Oct 8, 2015 at 8:54 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>> On 8 October 2015 at 11:54, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>> On 7 October 2015 at 23:50, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>>> On Wed, Oct 7, 2015 at 4:48 PM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>>>> On 7 October 2015 at 18:04, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>>>>> On Wed, Oct 7, 2015 at 7:51 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>>>>>> XXX: commit message, comment in nir_intrinsics.h
>>>>>>>
>>>>>>> Signed-off-by: Emil Velikov <emil.velikov at collabora.com>
>>>>>>> ---
>>>>>>>  src/glsl/nir/glsl_to_nir.cpp  | 6 ++++++
>>>>>>>  src/glsl/nir/nir_intrinsics.h | 2 ++
>>>>>>>  2 files changed, 8 insertions(+)
>>>>>>>
>>>>>>> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
>>>>>>> index efaa73e..231bdbf 100644
>>>>>>> --- a/src/glsl/nir/glsl_to_nir.cpp
>>>>>>> +++ b/src/glsl/nir/glsl_to_nir.cpp
>>>>>>> @@ -698,6 +698,8 @@ nir_visitor::visit(ir_call *ir)
>>>>>>>           op = nir_intrinsic_ssbo_atomic_exchange;
>>>>>>>        } else if (strcmp(ir->callee_name(), "__intrinsic_ssbo_atomic_comp_swap_internal") == 0) {
>>>>>>>           op = nir_intrinsic_ssbo_atomic_comp_swap;
>>>>>>> +      } else if (strcmp(ir->callee_name(), "__intrinsic_shader_clock") == 0) {
>>>>>>> +         op = nir_intrinsic_shader_clock;
>>>>>>>        } else {
>>>>>>>           unreachable("not reached");
>>>>>>>        }
>>>>>>> @@ -802,6 +804,10 @@ nir_visitor::visit(ir_call *ir)
>>>>>>>        case nir_intrinsic_memory_barrier:
>>>>>>>           nir_instr_insert_after_cf_list(this->cf_node_list, &instr->instr);
>>>>>>>           break;
>>>>>>> +      case nir_intrinsic_shader_clock:
>>>>>>> +         nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL);
>>>>>>> +         nir_instr_insert_after_cf_list(this->cf_node_list, &instr->instr);
>>>>>>> +         break;
>>>>>>>        case nir_intrinsic_store_ssbo: {
>>>>>>>           exec_node *param = ir->actual_parameters.get_head();
>>>>>>>           ir_rvalue *block = ((ir_instruction *)param)->as_rvalue();
>>>>>>> diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
>>>>>>> index 263d8c1..4b32215 100644
>>>>>>> --- a/src/glsl/nir/nir_intrinsics.h
>>>>>>> +++ b/src/glsl/nir/nir_intrinsics.h
>>>>>>> @@ -83,6 +83,8 @@ BARRIER(discard)
>>>>>>>   */
>>>>>>>  BARRIER(memory_barrier)
>>>>>>>
>>>>>>> +INTRINSIC(shader_clock, 0, ARR(), true, 1, 1, 0, 0 /* flags ? */)
>>>>>>
>>>>>> This should have NIR_INTRINSIC_CAN_DELETE, since if the result is
>>>>>> unused we can safely delete it (i.e. it has no side effects), but we
>>>>>> can't safely reorder it.
>>>>>>
>>>>> Thanks. Will do.
>>>>>
>>>>>> Side note: NIR's current model, as well as any more flexible memory
>>>>>> model we might adopt in the future, assumes that intrinsics which are
>>>>>> marked as reorderable, as well as ALU operations which are implicitly
>>>>>> reorderable, can be freely reordered with respect to *any* other
>>>>>> operation, even one that's explicitly not reorderable. So, for
>>>>>> example, if you do:
>>>>>>
>>>>>> ... = clock();
>>>>>> a = b + c;
>>>>>> ... = clock();
>>>>>>
>>>>>> then there are no guarantees that the addition won't get moved outside
>>>>>> the clock() calls. Currently, this will only happen if the addition
>>>>>> becomes involved in some algebraic optimization or CSE, but in the
>>>>>> future with passes like GCM that move code around indiscriminately
>>>>>> it's going to be much more of a problem. I don't think we could really
>>>>>> solve this problem in a useful and general way without making the rest
>>>>>> of NIR significantly more complicated and slower, which I definitely
>>>>>> don't want. I think the best answer is to say "really these tools are
>>>>>> unreliable and meant mainly for driver developers and people who know
>>>>>> what they're doing, and if you use them you have to be prepared to
>>>>>> look at the assembly source and see if it matches what you expected."
>>>>>>
>>>>> I haven't looked at the optimisations closely and I assumed that all
>>>>> intrinsics act as motion barriers. Seems like I was mistaking.
>>>>> Can we call it a "where sub-group is implementation dependent" and be
>>>>> done with it ;-)
>>>>
>>>> Or even "The units of time are not defined and need not be constant"
>>>> -- I guess "return 0;" would be a legal implementation ;).
>>>>
>>> Bikeshedding aside - the spec is quite clear about the motion barrier
>>> part. Personally I'm fine either way - leave it as is or look closer
>>> at NIR. Just let know how you feel on the topic.
>>>
>> Seems that qualifiers like restrict and volatile are not considered
>> when performing optimisations. Won't this impact the code in
>> strange/unpredictable ways ? With this and the original comment in
>> mind, does memoryBarrier work as expected ? Afaict it's a simple
>> intrinsic, yet I cannot see something that prevents us from doing code
>> (and implicitly memory) motion across it.
>>
>> Is there something subtle that I'm missing or the above sounds about right ?
>
> memoryBarrier, accesses to variables without restrict/with volatile,
> etc. all happen through intrinsics which are marked non-reorderable.
> The point is that if we have something like
>
> foo = a + 0
> ...
> ... = foo
>
> and if we want to replace uses of foo with uses of a, we shouldn't
> have to care about what what happens in between now that we're in SSA.
> If doing
>
> clock();
> foo = a + 0;
> clock();
> ...
> ... = foo;
>
> makes that transform invalid, then now potentially every transform we
> would want to do would have to be checked against some nebulous set of
> constraints, which is pretty awful.
>
Thanks for the comprehensive explanation Connor, I see what's happening here.

Essentially all the 'special cases' (without restrict/with
volatile/etc) are intrinsics, in order to prevent combinatorial
explosion of checks for each normal inst/variable. In order to get
clock() working as defined in the spec we'll need to flip the logic
slowing NIR down immensely. And that obviously it's not going to
happen (soon at least).

Thanks again,
Emil