[Mesa-dev] [PATCH 3/5] nir: Add new GS intrinsics that maintain a count of emitted vertices.
Jason Ekstrand
jason at jlekstrand.net
Tue Sep 8 11:12:39 PDT 2015
On Tue, Sep 8, 2015 at 2:24 AM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> On Monday, September 07, 2015 10:51:37 AM Jason Ekstrand wrote:
>> On Sep 3, 2015 1:49 AM, "Kenneth Graunke" <kenneth at whitecape.org> wrote:
>> >
>> > This patch also introduces a lowering pass to convert the simple GS
>> > intrinsics to the new ones. See the comments above that for the
>> > rationale behind the new intrinsics.
>> >
>> > This should be useful for i965; it's a generic enough mechanism that I
>> > could see other drivers potentially using it as well, so I don't feel
>> > too bad about putting it in the generic code.
>> >
>> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
>> > ---
>> > src/glsl/Makefile.sources | 1 +
>> > src/glsl/nir/nir.h | 2 +
>> > src/glsl/nir/nir_intrinsics.h | 21 ++++
>> > src/glsl/nir/nir_lower_gs_intrinsics.c | 214
>> +++++++++++++++++++++++++++++++++
>> > 4 files changed, 238 insertions(+)
>> > create mode 100644 src/glsl/nir/nir_lower_gs_intrinsics.c
>> >
>> > diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
>> > index c422303..ae2e3e1 100644
>> > --- a/src/glsl/Makefile.sources
>> > +++ b/src/glsl/Makefile.sources
>> > @@ -36,6 +36,7 @@ NIR_FILES = \
>> > nir/nir_lower_alu_to_scalar.c \
>> > nir/nir_lower_atomics.c \
>> > nir/nir_lower_global_vars_to_local.c \
>> > + nir/nir_lower_gs_intrinsics.c \
>> > nir/nir_lower_load_const_to_scalar.c \
>> > nir/nir_lower_locals_to_regs.c \
>> > nir/nir_lower_idiv.c \
>> > diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>> > index 570a33c..13f123f 100644
>> > --- a/src/glsl/nir/nir.h
>> > +++ b/src/glsl/nir/nir.h
>> > @@ -1809,6 +1809,8 @@ void nir_lower_idiv(nir_shader *shader);
>> > void nir_lower_atomics(nir_shader *shader);
>> > void nir_lower_to_source_mods(nir_shader *shader);
>> >
>> > +void nir_lower_gs_intrinsics(nir_shader *shader);
>> > +
>> > void nir_normalize_cubemap_coords(nir_shader *shader);
>> >
>> > void nir_live_variables_impl(nir_function_impl *impl);
>> > diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
>> > index ed309b6..bcfe70d 100644
>> > --- a/src/glsl/nir/nir_intrinsics.h
>> > +++ b/src/glsl/nir/nir_intrinsics.h
>> > @@ -79,9 +79,30 @@ BARRIER(memory_barrier)
>> > /** A conditional discard, with a single boolean source. */
>> > INTRINSIC(discard_if, 1, ARR(1), false, 0, 0, 0, 0)
>> >
>> > +/**
>> > + * Basic Geometry Shader intrinsics.
>> > + *
>> > + * emit_vertex implements GLSL's EmitStreamVertex() built-in. It takes
>> a single
>> > + * index, which is the stream ID to write to.
>> > + *
>> > + * end_primitive implements GLSL's EndPrimitive() built-in.
>> > + */
>> > INTRINSIC(emit_vertex, 0, ARR(), false, 0, 0, 1, 0)
>> > INTRINSIC(end_primitive, 0, ARR(), false, 0, 0, 1, 0)
>> >
>> > +/**
>> > + * Geometry Shader intrinsics with a vertex count.
>> > + *
>> > + * Alternatively, drivers may implement these intrinsics, and use
>> > + * nir_lower_gs_intrinsics() to convert from the basic intrinsics.
>> > + *
>> > + * These maintain a count of the number of vertices emitted, as an
>> additional
>> > + * unsigned integer source.
>> > + */
>> > +INTRINSIC(emit_vertex_with_counter, 1, ARR(1), false, 0, 0, 1, 0)
>> > +INTRINSIC(end_primitive_with_counter, 1, ARR(1), false, 0, 0, 1, 0)
>> > +INTRINSIC(set_vertex_count, 1, ARR(1), false, 0, 0, 0, 0)
>> > +
>> > /*
>> > * Atomic counters
>> > *
>> > diff --git a/src/glsl/nir/nir_lower_gs_intrinsics.c
>> b/src/glsl/nir/nir_lower_gs_intrinsics.c
>> > new file mode 100644
>> > index 0000000..b866d87
>> > --- /dev/null
>> > +++ b/src/glsl/nir/nir_lower_gs_intrinsics.c
>> > @@ -0,0 +1,214 @@
>> > +/*
>> > + * Copyright © 2015 Intel Corporation
>> > + *
>> > + * Permission is hereby granted, free of charge, to any person obtaining
>> a
>> > + * copy of this software and associated documentation files (the
>> "Software"),
>> > + * to deal in the Software without restriction, including without
>> limitation
>> > + * the rights to use, copy, modify, merge, publish, distribute,
>> sublicense,
>> > + * and/or sell copies of the Software, and to permit persons to whom the
>> > + * Software is furnished to do so, subject to the following conditions:
>> > + *
>> > + * The above copyright notice and this permission notice (including the
>> next
>> > + * paragraph) shall be included in all copies or substantial portions of
>> the
>> > + * Software.
>> > + *
>> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>> SHALL
>> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> OTHER
>> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING
>> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> DEALINGS
>> > + * IN THE SOFTWARE.
>> > + *
>> > + * Authors:
>> > + * Kenneth Graunke <kenneth at whitecape.org>
>> > + */
>> > +
>> > +#include "nir.h"
>> > +#include "nir_builder.h"
>> > +
>> > +/**
>> > + * \file nir_lower_gs_intrinsics.c
>> > + *
>> > + * Geometry Shaders can call EmitVertex()/EmitStreamVertex() to output an
>> > + * arbitrary number of vertices. However, the shader must declare the
>> maximum
>> > + * number of vertices that it will ever output - further attempts to emit
>> > + * vertices result in undefined behavior according to the GLSL
>> specification.
>> > + *
>> > + * Drivers might use this maximum number of vertices to allocate enough
>> space
>> > + * to hold the geometry shader's output. Some drivers (such as i965)
>> need to
>> > + * implement "safety checks" which ensure that the shader hasn't emitted
>> too
>> > + * many vertices, to avoid overflowing that space and trashing other
>> memory.
>> > + *
>> > + * The count of emitted vertices can also be useful in buffer offset
>> > + * calculations, so drivers know where to write the GS output.
>> > + *
>> > + * However, for simple geometry shaders that emit a statically
>> determinable
>> > + * number of vertices, this extra bookkeeping is unnecessary and
>> inefficient.
>> > + * By tracking the vertex count in NIR, we allow constant
>> folding/propagation
>> > + * and dead control flow optimizations to eliminate most of it where
>> possible.
>> > + *
>> > + * This pass introduces a new global variable which stores the current
>> vertex
>> > + * count (initialized to 0), and converts emit_vertex/end_primitive
>> intrinsics
>> > + * to their *_with_counter variants. emit_vertex is also wrapped in a
>> safety
>> > + * check to avoid buffer overflows. Finally, it adds a set_vertex_count
>> > + * intrinsic at the end of the program, informing the driver of the final
>> > + * vertex count.
>> > + */
>> > +
>> > +struct state {
>> > + nir_builder *builder;
>> > + nir_variable *vertex_count;
>> > +};
>> > +
>> > +/**
>> > + * Replace emit_vertex intrinsics with:
>> > + *
>> > + * if (vertex_count < max_vertices) {
>> > + * emit_vertex_with_counter vertex_count ...
>> > + * vertex_count += 1
>> > + * }
>> > + */
>> > +static void
>> > +rewrite_emit_vertex(nir_intrinsic_instr *intrin, struct state *state)
>>
>> I'm inclined to rename "intrin" to "emit". It may be easier to track with
>> the lowering code.
>
> Why "emit"? Using a verb doesn't make much sense to me...and it's not
> even something we emit. orig_intrin might be clearer...
Just because it's the emit instruction as opposed to an add or
something. Given that we're lowering that instruction, instr/intrin
and lowered works fine too. I don't care too much as long as the
aliases are gone.
>>
>> > +{
>> > + nir_builder *b = state->builder;
>> > + nir_variable *vertex_count_var = state->vertex_count;
>> > + nir_instr *instr = &intrin->instr;
>>
>> Please get rid of these last two aliases ("b" is fine). They don't make
>> things much shorter and, IMHO, they actually make readability worse because
>> there are multiple names for things.
>
> Sure.
>
>> > +
>> > + /* Load the vertex count */
>> > + b->cursor = nir_before_instr(instr);
>> > + nir_ssa_def *count = nir_load_var(b, vertex_count_var);
>> > +
>> > + nir_ssa_def *max_vertices = nir_imm_int(b,
>> b->shader->gs.vertices_out);
>> > +
>> > + /* Create: if (vertex_count < max_vertices) */
>> > + nir_if *if_stmt = nir_if_create(b->shader);
>> > + if_stmt->condition = nir_src_for_ssa(nir_ilt(b, count, max_vertices));
>> > +
>> > + /* Replace the instruction. The new if statement needs to be hooked
>> up
>> > + * to the control flow graph before we start inserting instructions
>> in it.
>> > + */
>> > + nir_builder_cf_insert(b, &if_stmt->cf_node);
>> > + nir_instr_remove(instr);
>>
>> I think we usually delete the instruction at the end. Not that it matters
>> but it does make it more clear that you don't have dangling cursor problems.
>
> Sure, that seems reasonable.
>
>> > +
>> > + /* Fill out the new then-block */
>> > + b->cursor = nir_after_cf_list(&if_stmt->then_list);
>> > +
>> > + nir_intrinsic_instr *lowered =
>> > + nir_intrinsic_instr_create(b->shader,
>> > + nir_intrinsic_emit_vertex_with_counter);
>> > + lowered->const_index[0] = intrin->const_index[0];
>> > + lowered->src[0] = nir_src_for_ssa(count);
>> > + nir_builder_instr_insert(b, &lowered->instr);
>> > +
>> > + /* Increment the vertex count by 1 */
>> > + nir_store_var(b, vertex_count_var, nir_iadd(b, count, nir_imm_int(b,
>> 1)));
>> > +}
>> > +
>> > +/**
>> > + * Replace end_primitive with end_primitive_with_counter.
>> > + */
>> > +static void
>> > +rewrite_end_primitive(nir_intrinsic_instr *intrin, struct state *state)
>> > +{
>> > + nir_builder *b = state->builder;
>> > +
>> > + b->cursor = nir_before_instr(&intrin->instr);
>> > + nir_ssa_def *count = nir_load_var(b, state->vertex_count);
>> > +
>> > + nir_intrinsic_instr *lowered =
>> > + nir_intrinsic_instr_create(b->shader,
>> > +
>> nir_intrinsic_end_primitive_with_counter);
>> > + lowered->const_index[0] = intrin->const_index[0];
>> > + lowered->src[0] = nir_src_for_ssa(count);
>> > + nir_builder_instr_insert(b, &lowered->instr);
>> > +
>> > + nir_instr_remove(&intrin->instr);
>> > +}
>> > +
>> > +static bool
>> > +rewrite_intrinsics(nir_block *block, void *closure)
>> > +{
>> > + struct state *state = closure;
>> > +
>> > + nir_foreach_instr_safe(block, instr) {
>> > + if (instr->type != nir_instr_type_intrinsic)
>> > + continue;
>> > +
>> > + nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
>> > + switch (intrin->intrinsic) {
>> > + case nir_intrinsic_emit_vertex:
>> > + rewrite_emit_vertex(intrin, state);
>> > + break;
>> > + case nir_intrinsic_end_primitive:
>> > + rewrite_end_primitive(intrin, state);
>> > + break;
>> > + default:
>> > + /* not interesting; skip this */
>> > + break;
>> > + }
>> > + }
>> > +
>> > + return true;
>> > +}
>> > +
>> > +/**
>> > + * Add a set_vertex_count intrinsic at the end of the program
>> > + * (representing the final vertex count).
>> > + */
>> > +static void
>> > +append_set_vertex_count(nir_block *end_block, struct state *state)
>> > +{
>> > + nir_variable *vertex_count_var = state->vertex_count;
>> > + nir_builder *b = state->builder;
>> > + nir_shader *shader = state->builder->shader;
>> > +
>> > + /* Insert the new intrinsic in all of the predecessors of the end
>> block,
>> > + * but before any jump instructions (return).
>> > + */
>> > + struct set_entry *entry;
>> > + set_foreach(end_block->predecessors, entry) {
>> > + nir_block *pred = (nir_block *) entry->key;
>> > + b->cursor = nir_after_block(pred);
>>
>> Technically, this should be after_block_before_jump(). We should probably
>> just land the patch that adds that. I'll send them out.
>
> Ah, thanks! Long ago I was using a builder function which accomplished
> that, but in the new cursor era I messed that up. I'll be happy to use
> your new function.
>
>>
>> > +
>> > + nir_ssa_def *count = nir_load_var(b, vertex_count_var);
>> > +
>> > + nir_intrinsic_instr *set_vertex_count =
>> > + nir_intrinsic_instr_create(shader,
>> nir_intrinsic_set_vertex_count);
>> > + set_vertex_count->src[0] = nir_src_for_ssa(count);
>> > +
>> > + nir_builder_instr_insert(b, &set_vertex_count->instr);
>> > + }
>> > +}
>> > +
>> > +void
>> > +nir_lower_gs_intrinsics(nir_shader *shader)
>> > +{
>> > + struct state state;
>> > +
>> > + /* Create the counter variable */
>> > + nir_variable *var = rzalloc(shader, nir_variable);
>> > + var->data.mode = nir_var_global;
>> > + var->type = glsl_uint_type();
>> > + var->name = "vertex_count";
>> > + var->constant_initializer = rzalloc(shader, nir_constant); /*
>> initialize to 0 */
>> > +
>> > + exec_list_push_tail(&shader->globals, &var->node);
>> > + state.vertex_count = var;
>> > +
>> > + nir_foreach_overload(shader, overload) {
>> > + if (overload->impl) {
>> > + nir_builder b;
>> > + nir_builder_init(&b, overload->impl);
>> > + state.builder = &b;
>> > +
>> > + nir_foreach_block(overload->impl, rewrite_intrinsics, &state);
>> > +
>> > + /* This only works because we have a single main() function. */
>>
>> Then why don't you predicate it on name == "main" or something? Really, we
>> should probably just add an is_entrypoint bool to nir_function_impl and
>> switch on that. Another option would be to make this take a function impl
>> instead of a shader and trust in function inlining (maybe even assert no
>> function calls).
>
> I don't know. We really only have the pretense of functions today, but
> they're not an actual thing. There will only ever be one, so it seems
> a bit pointless to check for main or add flags indicating the one and
> only thing is the thing we want...
>
> Honestly, I've wondered if we should just delete functions from NIR
> altogether, though I suppose if we had code that arrived through some
> means other than GLSL IR, or moved toward NIR being a thing before final
> linking is finished...then they could be useful.
Yeah, we should keep them. However, there are a lot of passes that
should probably take a nir_function_impl rather than a nir_shader and
the backend should probably work more-or-less in terms of a
nir_function_impl. This is going to take a bit more thinking to be
honest.
More information about the mesa-dev
mailing list