[Mesa-dev] [PATCH 01/17] nir: introduce new convergent and cross-thread attributes
Connor Abbott
cwabbott0 at gmail.com
Fri Jul 7 00:29:48 UTC 2017
On Mon, Jun 12, 2017 at 9:26 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Mon, Jun 12, 2017 at 7:38 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>
>> On Mon, Jun 12, 2017 at 7:19 PM, Jason Ekstrand <jason at jlekstrand.net>
>> wrote:
>> > On Mon, Jun 12, 2017 at 11:58 AM, Nicolai Hähnle <nhaehnle at gmail.com>
>> > wrote:
>> >>
>> >> On 12.06.2017 20:50, Connor Abbott wrote:
>> >>>
>> >>> On Mon, Jun 12, 2017 at 2:17 AM, Nicolai Hähnle <nhaehnle at gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> On 10.06.2017 01:44, Connor Abbott wrote:
>> >>>>>
>> >>>>>
>> >>>>> From: Connor Abbott <cwabbott0 at gmail.com>
>> >>>>>
>> >>>>> These are properties of the instruction that must be respected when
>> >>>>> moving it around, in addition to the usual SSA dominance guarantee.
>> >>>>> Previously, we only had special handling for fddx and fddy, in a
>> >>>>> very
>> >>>>> ad-hoc way. But with arb_shader_ballot and arb_shader_group_vote,
>> >>>>> we'll
>> >>>>> have to start handling a lot more instructions with similar
>> >>>>> constraints,
>> >>>>> so we want to add a more formal model of what the optimizer can and
>> >>>>> cannot do.
>> >>>>>
>> >>>>> v2: don't add attribute for ALU instructions
>> >>>>> v3: special-case derivative ALU instructions
>> >>>>> Signed-off-by: Connor Abbott <cwabbott0 at gmail.com>
>> >>>>> ---
>> >>>>> src/compiler/nir/nir.h | 80
>> >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>>> 1 file changed, 80 insertions(+)
>> >>>>>
>> >>>>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> >>>>> index 3b827bf..64caccb 100644
>> >>>>> --- a/src/compiler/nir/nir.h
>> >>>>> +++ b/src/compiler/nir/nir.h
>> >>>>> @@ -985,6 +985,25 @@ typedef enum {
>> >>>>> * intrinsic are due to the register reads/writes.
>> >>>>> */
>> >>>>> NIR_INTRINSIC_CAN_REORDER = (1 << 1),
>> >>>>> +
>> >>>>> + /**
>> >>>>> + * Indicates whether this intrinsic is "cross-thread". An
>> >>>>> operation
>> >>>>> is
>> >>>>> + * cross-thread if results in one thread depend on inputs in
>> >>>>> another
>> >>>>> thread,
>> >>>>> + * and therefore optimizations cannot change the execution mask
>> >>>>> when
>> >>>>> the
>> >>>>> + * operation is called. Examples of cross-thread operations
>> >>>>> include
>> >>>>> + * screen-space derivatives, the "any" reduction which returns
>> >>>>> "true"
>> >>>>> in
>> >>>>> + * all threads if any thread inputs "true", etc.
>> >>>>> + */
>> >>>>> + NIR_INTRINSIC_CROSS_THREAD,
>> >>>>> +
>> >>>>> + /**
>> >>>>> + * Indicates that this intrinsic is "convergent". An operation
>> >>>>> is
>> >>>>> + * convergent when it must always be called in convergent
>> >>>>> control
>> >>>>> flow,
>> >>>>> + * that is, control flow with the same execution mask as when
>> >>>>> the
>> >>>>> program
>> >>>>> + * started. If an operation is convergent, it must be
>> >>>>> cross-thread
>> >>>>> as
>> >>>>> well,
>> >>>>> + * since the optimizer must maintain the guarantee.
>> >>>>> + */
>> >>>>> + NIR_INTRINSIC_CONVERGENT,
>> >>>>
>> >>>>
>> >>>>
>> >>>> This is inconsistent with LLVM's definition of 'convergent', and I'd
>> >>>> like
>> >>>> you to change it to match up with LLVM.
>> >>>>
>> >>>> LLVM's definition of convergent is: "The operation must not be made
>> >>>> control-dependent on additional values."
>> >>>>
>> >>>> In the language of execution masks, this means that optimizations
>> >>>> must
>> >>>> guarantee that the execution mask for the instruction can only become
>> >>>> a
>> >>>> superset of what it was originally. This means lifting is actually
>> >>>> okay.
>> >>>>
>> >>>> This is relevant because e.g. texture instructions with implicit
>> >>>> derivatives
>> >>>> are actually convergent operations (in the LLVM sense), but obviously
>> >>>> they
>> >>>> can be called with exec masks that are subsets of the exec mask at
>> >>>> program
>> >>>> start.
>> >>>
>> >>>
>> >>> Actually, according to GLSL (and I think SPIR-V, although I'm not 100%
>> >>> sure), they can't be called that way -- results are undefined if
>> >>> derivatives (or textures that take implicit derivatives) aren't called
>> >>> in uniform control flow, full stop. That's why I changed the
>> >>> definition compared to LLVM - this definition of convergent allows all
>> >>> the optimizations that the LLVM definition does, but it opens up
>> >>> additional optimization opportunities since we can assume that control
>> >>> flow is always uniform when doing divergence analysis. Also, as-is,
>> >>> the definition matches the GLSL/SPIR-V semantics closely, and since
>> >>> the purpose of the convergent attribute is to model derivatives in
>> >>> GLSL and SPIR-V, I'd like to keep that. If GLSL or SPIR-V change their
>> >>> semantics to allow what you describe, then we can add something
>> >>> something closer to the LLVM convergent semantics. If you want me to
>> >>> change the name to avoid confusion with LLVM, that's fair though --
>> >>> suggestions welcome on what to call it ;)
>> >>
>> >>
>> >> Okay, I'm convinced that it makes sense to have these semantics, but a
>> >> different name would be good.
>> >
>> >
>> > I'm not quite so convinced. :-) The LLVM definition seems, at first
>> > brush,
>> > more powerful than the proposed definition and I think it's actually
>> > what
>> > you want for most optimizations.
>>
>> The LLVM definition is actually less powerful than my definition - see
>> my response to Nicolai in patch 3.
>
>
> I'm still not sure I'm convinced. I'll give it more thought.
>
>>
>> > The only advantage I can see to the strict
>> > uniform definition is that it would let us imply information about
>> > control-flow uniformity from instructions. However, while probably
>> > technically correct, that sounds like a dangerous path to go down. What
>> > specific optimizations were you thinking this stricter definition would
>> > enable?
>>
>> It would let us assume that control flow is uniform in more places,
>> which means that we can infer that more values are uniform. This would
>> be useful, for example, for barriers in compute shaders, since if the
>> author is doing some complicated stuff with loads and stores but is
>> using a barrier, then we can infer that control is stlll uniform, and
>> hopefully use less storage space for registers if we can infer that
>> they're uniform too, enable SPF on Intel, use less instructions for
>> control flow on AMD, etc. The original implementation of the
>> Divergence Analysis paper by Coutinho et. al. actually does this [1].
>
>
> Yeah... That's exactly the kind of inference I'm afraid of. Barrier may be
> safe, but my gut tells me that people get derivatives (whether explicit or
> implicit in texturing) wrong all the time. I'm afraid of doing crazy
> optimizations based on a texture instruction being in our block. Maybe it's
> ok, but it makes me very nervous...
>
> I'll give it more thought
So, I realized that we might still need something like the LLVM
definition of convergent, at least in two places:
1. read_invocation is just convergent, not cross-thread, since if you
call it with a superset of active invocations, you'll still get the
same result.
2. AMD_shader_ballot adds a whole-workgroup reduction operation, which
is guaranteed to work in non-uniform control flow as long as the
subgroups are uniform within a workgroup. Basically this turns into a
subgroup reduction, followed by some atomic operations on shared
memory and a barrier. But the barrier has to execute in non-uniform
control flow, so it really is LLVM-convergent. So if we want to do the
lowering in NIR, we'll need to have a barrier that can occur in
non-uniform control flow.
Should we keep the original thing, rename it to uniform_control or
whatever, and add convergent, or just get rid of it? Thoughts?
>
> --Jason
>
>>
>> [1]
>> https://github.com/gtcasl/gpuocelot/blob/master/ocelot/ocelot/analysis/implementation/DivergenceAnalysis.cpp#L514
>>
>> >
>> >>
>> >> How about NIR_INTRINSIC_UNIFORM_CONTROL?
>> >
>> >
>> > That works.
>> >
>> > --Jason
>> >
>> >>
>> >> Cheers,
>> >> Nicolai
>> >>
>> >>
>> >>
>> >>
>> >>>
>> >>>>
>> >>>> LLVM currently has no equivalent to cross_thread, and we hack around
>> >>>> it
>> >>>> as
>> >>>> I'm sure you're well aware. The nightmare is trying to find a sound
>> >>>> definition of "cross_thread" that works in LLVM's execution model.
>> >>>
>> >>>
>> >>> Yeah... this stuff is really tricky to reason about. I think that
>> >>> eventually, we're going to have to add the notions of control flow
>> >>> divergence and re-convergence to LLVM's execution model, even though
>> >>> there's been pushback from some LLVM developers about it. I just don't
>> >>> see any way we'll be able to do stuff like LICM, aggressive CSE, etc.
>> >>> effectively in the presence of this cross-thread operations, when
>> >>> whether you can do those things at all depends on whether branch
>> >>> conditions are uniform.
>> >>>
>> >>>>
>> >>>> Cheers,
>> >>>> Nicolai
>> >>>>
>> >>>>
>> >>>>
>> >>>>> } nir_intrinsic_semantic_flag;
>> >>>>> /**
>> >>>>> @@ -1459,6 +1478,67 @@ NIR_DEFINE_CAST(nir_instr_as_parallel_copy,
>> >>>>> nir_instr,
>> >>>>> type, nir_instr_type_parallel_copy)
>> >>>>> /*
>> >>>>> + * Helpers to determine if an instruction is cross-thread or
>> >>>>> convergent.
>> >>>>> See
>> >>>>> + * NIR_INTRINSIC_{CONVERGENT|CROSS_THREAD} for the definitions.
>> >>>>> + */
>> >>>>> +static inline bool
>> >>>>> +nir_instr_is_convergent(const nir_instr *instr)
>> >>>>> +{
>> >>>>> + switch (instr->type) {
>> >>>>> + case nir_instr_type_alu:
>> >>>>> + switch (nir_instr_as_alu(instr)->op) {
>> >>>>> + case nir_op_fddx:
>> >>>>> + case nir_op_fddy:
>> >>>>> + case nir_op_fddx_fine:
>> >>>>> + case nir_op_fddy_fine:
>> >>>>> + case nir_op_fddx_coarse:
>> >>>>> + case nir_op_fddy_coarse:
>> >>>>> + /* Partial derivatives are convergent */
>> >>>>> + return true;
>> >>>>> +
>> >>>>> + default:
>> >>>>> + return false;
>> >>>>> + }
>> >>>>> +
>> >>>>> + case nir_instr_type_intrinsic: {
>> >>>>> + nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
>> >>>>> + return nir_intrinsic_infos[intrin->intrinsic].flags &
>> >>>>> + NIR_INTRINSIC_CONVERGENT;
>> >>>>> + }
>> >>>>> +
>> >>>>> + case nir_instr_type_tex:
>> >>>>> + switch (nir_instr_as_tex(instr)->op) {
>> >>>>> + case nir_texop_tex:
>> >>>>> + case nir_texop_txb:
>> >>>>> + case nir_texop_lod:
>> >>>>> + /* These three take implicit derivatives, so they are
>> >>>>> convergent */
>> >>>>> + return true;
>> >>>>> +
>> >>>>> + default:
>> >>>>> + return false;
>> >>>>> + }
>> >>>>> +
>> >>>>> + default:
>> >>>>> + return false;
>> >>>>> + }
>> >>>>> +}
>> >>>>> +
>> >>>>> +static inline bool
>> >>>>> +nir_instr_is_cross_thread(const nir_instr *instr)
>> >>>>> +{
>> >>>>> + switch (instr->type) {
>> >>>>> + case nir_instr_type_intrinsic: {
>> >>>>> + nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
>> >>>>> + return nir_intrinsic_infos[intrin->intrinsic].flags &
>> >>>>> + NIR_INTRINSIC_CROSS_THREAD;
>> >>>>> + }
>> >>>>> +
>> >>>>> + default:
>> >>>>> + return nir_instr_is_convergent(instr);
>> >>>>> + }
>> >>>>> +}
>> >>>>> +
>> >>>>> +/*
>> >>>>> * Control flow
>> >>>>> *
>> >>>>> * Control flow consists of a tree of control flow nodes, which
>> >>>>> include
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Lerne, wie die Welt wirklich ist,
>> >>>> Aber vergiss niemals, wie sie sein sollte.
>> >>>> _______________________________________________
>> >>>> mesa-dev mailing list
>> >>>> mesa-dev at lists.freedesktop.org
>> >>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> >>
>> >>
>> >>
>> >> --
>> >> Lerne, wie die Welt wirklich ist,
>> >> Aber vergiss niemals, wie sie sein sollte.
>> >> _______________________________________________
>> >> mesa-dev mailing list
>> >> mesa-dev at lists.freedesktop.org
>> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> >
>> >
>
>
More information about the mesa-dev
mailing list