[Mesa-dev] [RFC] nir: add lowering stage for user-clip-planes
Rob Clark
robdclark at gmail.com
Fri Sep 11 08:25:32 PDT 2015
On Fri, Sep 11, 2015 at 1:20 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Thu, Sep 10, 2015 at 10:54 PM, Rob Clark <robdclark at gmail.com> wrote:
>>
>> On Sep 10, 2015 7:39 PM, "Jason Ekstrand" <jason at jlekstrand.net> wrote:
>>>
>>> On Thu, Sep 10, 2015 at 2:39 PM, Rob Clark <robdclark at gmail.com> wrote:
>>> > From: Rob Clark <robclark at freedesktop.org>
>>> >
>>> > So this is basically working as a lowering pass for handling user-clip-
>>> > planes, and frag-shader emulation of CLIPDIST for hardware that needs
>>> > to handle this in the shader.
>>> >
>>> > For user-clip-planes, instructions are inserted to calculate CLIPDIST
>>> > in the VS. And in both cases, discard_if's are inserted in FS to
>>> > handle the clipping.
>>> >
>>> > NOTE: This currently requires a bit of a hack in nir_validate, which
>>> > is unhappy about things like:
>>> >
>>> > decl_var uniform vec4[10] uniform_0 (0, 0)
>>> > decl_var shader_out vec4 out_0 (0, 0)
>>> > decl_var shader_out vec4 clipdist_1 (17, 1)
>>> > decl_var shader_out vec4 clipdist_2 (18, 2)
>>> > decl_overload main returning void
>>> >
>>> > impl main {
>>> > block block_0:
>>> > /* preds: */
>>> > ...
>>> > /* succs: block_1 */
>>> > block block_1:
>>> > /* preds: block_0 */
>>> > ... inserted CLIPDIST calculations ...
>>> > /* succs: block_2 */
>>> > block block_2:
>>> > }
>>> >
>>> > The thing is, repurposing the end_block and creating a new end_block,
>>> > without having to consider all the different potential successors,
>>> > seems like by far the easiest way to insert code at the end of a
>>> > shader. So probably useful in other cases. (I do something similar
>>> > internally in ir3 for transform-feedback).
>>> >
>>> > Seems like the easiest thing would be to relax the restriction in
>>> > nir_validate, but I'm not sure if this will break other assumptions
>>> > elsewhere.
>>>
>>> Yes it does break assumptions. In particular, it breaks the
>>> assumption that the end-block is empty :-) In all seriousness though,
>>> the idea is that once you jump to the end, you're done. The end block
>>> is the place where returns go. You can't put code after a return
>>> (Ok, yeah, exception handling and destructors etc...).
>>
>> No, I insert a new end_block, so that particular assumption should be
>> safe... I was just wondering if there was some less obvious assumption..
>
> In principle there's no reason you can't have two basic blocks next to
> each other, but it would be pointless -- in that case, they could
> always be combined into a single basic block. That being said, I think
> you're trying to fit a square peg in a round hole. The fundamental
> restriction, as Jason said, is that you can't have code that gets
> executed after the return jump, since (on some architectures) the
> return jump corresponds to an actual instruction, whereas if you can
> execute stuff after the return you're essentially turning the return
> instruction into a more general-purpose jump -- you might as well just
> add a nir_jump_branch and call it a day. The end block is only there
> for bookkeeping purposes, to make representing post-dominance easier
> (if we want to add it) and to make it easy to enumerate all the return
> jumps in a function.
>
> So if you do have to insert stuff that always runs before the shader
> ends, and you're worried about doing the right thing, I would simply
> iterate over all the predecessors of the end block and do your thing
> at the end of each of them. As Jason said, in TGSI all the early
> returns have been lowered away anyways so the issue is kind of
> irrelevant for you now, but at least it will make things more
> future-proof.
So I was starting to think of adding a nir_cursor fxn to dtrt when a
lowering pass wants to insert what is basically like a "finally" block
(ie. just insert into last block before end_block if end_block only
has one predecessor, or insert a new block and fix up end_block
otherwise. At least this way we don't have to duplicate that logic in
other lowering passes which want to do something similar. (For
example, it would be nice to eventually convert my stream-out handling
to a generic nir pass.)
I guess that new nir_cursor function could fix up return's to be jumps
too.. although that makes things a bit more complicated if you have
multiple returns. I guess the returns will always be the last
instruction in the N predecessors of the end_block, so maybe it isn't
so bad.
BR,
-R
>>
>>> That said, you're in TGSI land where function inlining and early
>>> return lowering has already happened so there are no early returns.
>>> This means that the end block has exactly one predecessor and you can
>>> put your stuff there. If you want to make it obvious, feel free to
>>> add a nir_end_of_impl constructor for nir_cursor which asserts that
>>> there is exactly one predecessor and calls nir_after_block_before_jump
>>> on that block. That would make it both easy and correct.
>>>
>>> --Jason
>>>
>>> > TODO: sort out hack in nir_validate, reindent, etc.
>>> >
>>> > ---
>>> > src/glsl/Makefile.sources | 1 +
>>> > src/glsl/nir/nir.h | 3 +
>>> > src/glsl/nir/nir_lower_clip.c | 346
>>> > ++++++++++++++++++++++++++++++++++++++++++
>>> > src/glsl/nir/nir_validate.c | 3 +-
>>> > 4 files changed, 351 insertions(+), 2 deletions(-)
>>> > create mode 100644 src/glsl/nir/nir_lower_clip.c
>>> >
>>> > diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
>>> > index da7fdf9..5d98b7b 100644
>>> > --- a/src/glsl/Makefile.sources
>>> > +++ b/src/glsl/Makefile.sources
>>> > @@ -35,6 +35,7 @@ NIR_FILES = \
>>> > nir/nir_live_variables.c \
>>> > nir/nir_lower_alu_to_scalar.c \
>>> > nir/nir_lower_atomics.c \
>>> > + nir/nir_lower_clip.c \
>>> > nir/nir_lower_global_vars_to_local.c \
>>> > nir/nir_lower_load_const_to_scalar.c \
>>> > nir/nir_lower_locals_to_regs.c \
>>> > diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>>> > index 2ba7731..8544140 100644
>>> > --- a/src/glsl/nir/nir.h
>>> > +++ b/src/glsl/nir/nir.h
>>> > @@ -1811,6 +1811,9 @@ void nir_lower_system_values(nir_shader *shader);
>>> > void nir_lower_tex_projector(nir_shader *shader);
>>> > void nir_lower_idiv(nir_shader *shader);
>>> >
>>> > +void nir_lower_clip_vs(nir_shader *shader, unsigned ucp_enables);
>>> > +void nir_lower_clip_fs(nir_shader *shader, unsigned ucp_enables);
>>> > +
>>> > void nir_lower_atomics(nir_shader *shader);
>>> > void nir_lower_to_source_mods(nir_shader *shader);
>>> >
>>> > diff --git a/src/glsl/nir/nir_lower_clip.c
>>> > b/src/glsl/nir/nir_lower_clip.c
>>> > new file mode 100644
>>> > index 0000000..54ebe7b
>>> > --- /dev/null
>>> > +++ b/src/glsl/nir/nir_lower_clip.c
>>> > @@ -0,0 +1,346 @@
>>> > +/*
>>> > + * Copyright © 2015 Red Hat
>>> > + *
>>> > + * Permission is hereby granted, free of charge, to any person
>>> > obtaining a
>>> > + * copy of this software and associated documentation files (the
>>> > "Software"),
>>> > + * to deal in the Software without restriction, including without
>>> > limitation
>>> > + * the rights to use, copy, modify, merge, publish, distribute,
>>> > sublicense,
>>> > + * and/or sell copies of the Software, and to permit persons to whom
>>> > the
>>> > + * Software is furnished to do so, subject to the following conditions:
>>> > + *
>>> > + * The above copyright notice and this permission notice (including the
>>> > next
>>> > + * paragraph) shall be included in all copies or substantial portions
>>> > of the
>>> > + * Software.
>>> > + *
>>> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>> > EXPRESS OR
>>> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>>> > MERCHANTABILITY,
>>> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>>> > SHALL
>>> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
>>> > OTHER
>>> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>> > ARISING
>>> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>>> > DEALINGS
>>> > + * IN THE SOFTWARE.
>>> > + *
>>> > + * Authors:
>>> > + * Rob Clark <robclark at freedesktop.org>
>>> > + */
>>> > +
>>> > +#include "nir.h"
>>> > +#include "nir_builder.h"
>>> > +
>>> > +#define MAX_CLIP_PLANES 8
>>> > +
>>> > +/* Generates the lowering code for user-clip-planes, generating
>>> > CLIPDIST
>>> > + * from UCP[n] + CLIPVERTEX or POSITION. Additionally, an optional
>>> > pass
>>> > + * for fragment shaders to insert conditional kill's based on the
>>> > inter-
>>> > + * polated CLIPDIST
>>> > + *
>>> > + * NOTE: should be run after nir_lower_outputs_to_temporaries() (or at
>>> > + * least in scenarios where you can count on each output written once
>>> > + * and only once).
>>> > + */
>>> > +
>>> > +
>>> > +static nir_variable *
>>> > +create_clipdist_var(nir_shader *shader, unsigned drvloc,
>>> > + bool output, gl_varying_slot slot)
>>> > +{
>>> > + nir_variable *var = rzalloc(shader, nir_variable);
>>> > +
>>> > + var->data.driver_location = drvloc;
>>> > + var->type = glsl_vec4_type();
>>> > + var->data.mode = output ? nir_var_shader_out :
>>> > nir_var_shader_in;
>>> > + var->name = ralloc_asprintf(var, "clipdist_%d", drvloc);
>>> > + var->data.index = 0;
>>> > + var->data.location = slot;
>>> > +
>>> > + if (output) {
>>> > + exec_list_push_tail(&shader->outputs, &var->node);
>>> > + } else {
>>> > + exec_list_push_tail(&shader->inputs, &var->node);
>>> > + }
>>> > + return var;
>>> > +}
>>> > +
>>> > +static void
>>> > +store_clipdist_output(nir_builder *b, nir_variable *out, nir_ssa_def
>>> > **val)
>>> > +{
>>> > + nir_intrinsic_instr *store;
>>> > +
>>> > + store = nir_intrinsic_instr_create(b->shader,
>>> > nir_intrinsic_store_output);
>>> > + store->num_components = 4;
>>> > + store->const_index[0] = out->data.driver_location;
>>> > + store->src[0].ssa = nir_vec4(b, val[0], val[1], val[2], val[3]);
>>> > + store->src[0].is_ssa = true;
>>> > + nir_builder_instr_insert(b, &store->instr);
>>> > +}
>>> > +
>>> > +static void
>>> > +load_clipdist_input(nir_builder *b, nir_variable *in, nir_ssa_def
>>> > **val)
>>> > +{
>>> > + nir_intrinsic_instr *load;
>>> > +
>>> > + load = nir_intrinsic_instr_create(b->shader,
>>> > nir_intrinsic_load_input);
>>> > + load->num_components = 4;
>>> > + load->const_index[0] = in->data.driver_location;
>>> > + nir_ssa_dest_init(&load->instr, &load->dest, 4, NULL);
>>> > + nir_builder_instr_insert(b, &load->instr);
>>> > +
>>> > + val[0] = nir_channel(b, &load->dest.ssa, 0);
>>> > + val[1] = nir_channel(b, &load->dest.ssa, 1);
>>> > + val[2] = nir_channel(b, &load->dest.ssa, 2);
>>> > + val[3] = nir_channel(b, &load->dest.ssa, 3);
>>> > +}
>>> > +
>>> > +struct find_output_state {
>>> > + unsigned drvloc;
>>> > + nir_ssa_def *def;
>>> > +};
>>> > +
>>> > +static bool
>>> > +find_output_in_block(nir_block *block, void *void_state)
>>> > +{
>>> > + struct find_output_state *state = void_state;
>>> > + nir_foreach_instr(block, instr) {
>>> > +
>>> > + if (instr->type == nir_instr_type_intrinsic) {
>>> > + nir_intrinsic_instr *intr =
>>> > nir_instr_as_intrinsic(instr);
>>> > + if ((intr->intrinsic ==
>>> > nir_intrinsic_store_output) &&
>>> > + intr->const_index[0] ==
>>> > state->drvloc) {
>>> > + assert(state->def == NULL);
>>> > + assert(intr->src[0].is_ssa);
>>> > + state->def = intr->src[0].ssa;
>>> > + }
>>> > + }
>>> > +
>>> > +#if !defined(DEBUG)
>>> > + /* for debug builds, scan entire shader to assert
>>> > + * if output is written multiple times. For release
>>> > + * builds just assume all is well and bail when we
>>> > + * find first:
>>> > + */
>>> > + if (state->def)
>>> > + return false;
>>> > +#endif
>>> > + }
>>> > +
>>> > + return true;
>>> > +}
>>> > +
>>> > +/* TODO: maybe this would be a useful helper?
>>> > + * NOTE: assumes each output is written exactly once (and
>>> > unconditionally)
>>> > + * so if needed nir_lower_outputs_to_temporaries()
>>> > + */
>>> > +static nir_ssa_def *
>>> > +find_output(nir_shader *shader, unsigned drvloc)
>>> > +{
>>> > + struct find_output_state state = {
>>> > + .drvloc = drvloc,
>>> > + };
>>> > +
>>> > + nir_foreach_overload(shader, overload) {
>>> > + if (overload->impl) {
>>> > + nir_foreach_block_reverse(overload->impl,
>>> > + find_output_in_block, &state);
>>> > + }
>>> > + }
>>> > +
>>> > + return state.def;
>>> > +}
>>> > +
>>> > +/*
>>> > + * VS lowering
>>> > + */
>>> > +
>>> > +static void
>>> > +lower_clip_vs(nir_function_impl *impl, unsigned ucp_enables,
>>> > + nir_ssa_def *cv, nir_variable **out)
>>> > +{
>>> > + nir_ssa_def *clipdist[MAX_CLIP_PLANES];
>>> > + nir_builder b;
>>> > +
>>> > + nir_builder_init(&b, impl);
>>> > +
>>> > + /* Push original end_block to end of body. Since all other
>>> > + * blocks flow into end_block, it is a convenient place to
>>> > + * insert our new clipdist calculations. A new empty
>>> > + * end_block is created:
>>> > + *
>>> > + * NOTE: by default nir_validate is unhappy about formations
>>> > + * such as
>>> > + *
>>> > + * block block_0:
>>> > + * // preds:
>>> > + * ...
>>> > + * // succs: block_1
>>> > + * block block_1:
>>> > + * // preds: block_0
>>> > + * ...
>>> > + * // succs: block_2
>>> > + * block block_2:
>>> > + * ...
>>> > + */
>>> > + exec_list_push_tail(&impl->body,
>>> > &impl->end_block->cf_node.node);
>>> > + nir_block *new_end_block = nir_block_create(b.shader);
>>> > + impl->end_block->successors[0] = new_end_block;
>>> > + _mesa_set_add(new_end_block->predecessors, impl->end_block);
>>> > + impl->end_block = new_end_block;
>>> > + b.cursor = nir_after_cf_list(&impl->body);
>>> > +
>>> > + for (int plane = 0; plane < MAX_CLIP_PLANES; plane++) {
>>> > + if (ucp_enables & (1 << plane)) {
>>> > + nir_intrinsic_instr *ucp;
>>> > +
>>> > + /* insert intrinsic to fetch ucp[plane]: */
>>> > + ucp = nir_intrinsic_instr_create(b.shader,
>>> > +
>>> > nir_intrinsic_load_user_clip_plane);
>>> > + ucp->num_components = 4;
>>> > + ucp->const_index[0] = plane;
>>> > + nir_ssa_dest_init(&ucp->instr, &ucp->dest, 4,
>>> > NULL);
>>> > + nir_builder_instr_insert(&b, &ucp->instr);
>>> > +
>>> > + /* calculate clipdist[plane] - dot(ucp, cv): */
>>> > + clipdist[plane] = nir_fdot3(&b, &ucp->dest.ssa,
>>> > cv);
>>> > + } else {
>>> > + /* 0.0 == don't-clip == disabled: */
>>> > + clipdist[plane] = nir_imm_float(&b, 0.0);
>>> > + }
>>> > + }
>>> > +
>>> > + if (ucp_enables & 0x0f)
>>> > + store_clipdist_output(&b, out[0], &clipdist[0]);
>>> > + if (ucp_enables & 0xf0)
>>> > + store_clipdist_output(&b, out[1], &clipdist[4]);
>>> > +
>>> > + nir_metadata_preserve(impl, nir_metadata_dominance);
>>> > +}
>>> > +
>>> > +/* ucp_enables is bitmask of enabled ucp's. Actual ucp values are
>>> > + * passed in to shader via user_clip_plane system-values
>>> > + */
>>> > +void
>>> > +nir_lower_clip_vs(nir_shader *shader, unsigned ucp_enables)
>>> > +{
>>> > + int clipvertex = -1;
>>> > + int position = -1;
>>> > + int maxloc = -1;
>>> > + nir_ssa_def *cv;
>>> > + nir_variable *out[2];
>>> > +
>>> > + if (!ucp_enables)
>>> > + return;
>>> > +
>>> > + /* find clipvertex/position outputs: */
>>> > + foreach_list_typed(nir_variable, var, node, &shader->outputs) {
>>> > + int loc = var->data.driver_location;
>>> > +
>>> > + /* keep track of last used driver-location.. we'll be
>>> > + * appending CLIP_DIST0/CLIP_DIST1 after last existing
>>> > + * output:
>>> > + */
>>> > + maxloc = MAX2(maxloc, loc);
>>> > +
>>> > + switch (var->data.location) {
>>> > + case VARYING_SLOT_POS:
>>> > + position = loc;
>>> > + break;
>>> > + case VARYING_SLOT_CLIP_VERTEX:
>>> > + clipvertex = loc;
>>> > + break;
>>> > + case VARYING_SLOT_CLIP_DIST0:
>>> > + case VARYING_SLOT_CLIP_DIST1:
>>> > + /* if shader is already writing CLIPDIST, then
>>> > + * there should be no user-clip-planes to deal
>>> > + * with.
>>> > + */
>>> > + return;
>>> > + }
>>> > + }
>>> > +
>>> > + if (clipvertex != -1)
>>> > + cv = find_output(shader, clipvertex);
>>> > + else if (position != -1)
>>> > + cv = find_output(shader, position);
>>> > + else
>>> > + return;
>>> > +
>>> > + /* insert CLIPDIST outputs: */
>>> > + if (ucp_enables & 0x0f)
>>> > + out[0] = create_clipdist_var(shader, ++maxloc, true,
>>> > VARYING_SLOT_CLIP_DIST0);
>>> > + if (ucp_enables & 0xf0)
>>> > + out[1] = create_clipdist_var(shader, ++maxloc, true,
>>> > VARYING_SLOT_CLIP_DIST1);
>>> > +
>>> > + nir_foreach_overload(shader, overload) {
>>> > + if (!strcmp(overload->function->name, "main"))
>>> > + lower_clip_vs(overload->impl, ucp_enables, cv,
>>> > out);
>>> > + }
>>> > +}
>>> > +
>>> > +/*
>>> > + * FS lowering
>>> > + */
>>> > +
>>> > +static void
>>> > +lower_clip_fs(nir_function_impl *impl, unsigned ucp_enables,
>>> > + nir_variable **in)
>>> > +{
>>> > + nir_ssa_def *clipdist[MAX_CLIP_PLANES];
>>> > + nir_builder b;
>>> > +
>>> > + nir_builder_init(&b, impl);
>>> > + b.cursor = nir_before_cf_list(&impl->body);
>>> > +
>>> > + if (ucp_enables & 0x0f)
>>> > + load_clipdist_input(&b, in[0], &clipdist[0]);
>>> > + if (ucp_enables & 0xf0)
>>> > + load_clipdist_input(&b, in[1], &clipdist[4]);
>>> > +
>>> > + for (int plane = 0; plane < MAX_CLIP_PLANES; plane++) {
>>> > + if (ucp_enables & (1 << plane)) {
>>> > + nir_intrinsic_instr *discard;
>>> > + nir_ssa_def *cond;
>>> > +
>>> > + cond = nir_flt(&b, clipdist[plane],
>>> > nir_imm_float(&b, 0.0));
>>> > +
>>> > + discard = nir_intrinsic_instr_create(b.shader,
>>> > + nir_intrinsic_discard_if);
>>> > + discard->src[0] = nir_src_for_ssa(cond);
>>> > + nir_builder_instr_insert(&b, &discard->instr);
>>> > + }
>>> > + }
>>> > +}
>>> > +
>>> > +/* insert conditional kill based on interpolated CLIPDIST
>>> > + */
>>> > +void
>>> > +nir_lower_clip_fs(nir_shader *shader, unsigned ucp_enables)
>>> > +{
>>> > + nir_variable *in[2];
>>> > + int maxloc = -1;
>>> > +
>>> > + if (!ucp_enables)
>>> > + return;
>>> > +
>>> > + foreach_list_typed(nir_variable, var, node, &shader->inputs) {
>>> > + int loc = var->data.driver_location;
>>> > +
>>> > + /* keep track of last used driver-location.. we'll be
>>> > + * appending CLIP_DIST0/CLIP_DIST1 after last existing
>>> > + * input:
>>> > + */
>>> > + maxloc = MAX2(maxloc, loc);
>>> > + }
>>> > +
>>> > + /* The shader won't normally have CLIPDIST inputs, so we
>>> > + * must add our own:
>>> > + */
>>> > + /* insert CLIPDIST outputs: */
>>> > + if (ucp_enables & 0x0f)
>>> > + in[0] = create_clipdist_var(shader, ++maxloc, false,
>>> > VARYING_SLOT_CLIP_DIST0);
>>> > + if (ucp_enables & 0xf0)
>>> > + in[1] = create_clipdist_var(shader, ++maxloc, false,
>>> > VARYING_SLOT_CLIP_DIST1);
>>> > +
>>> > + nir_foreach_overload(shader, overload) {
>>> > + if (!strcmp(overload->function->name, "main"))
>>> > + lower_clip_fs(overload->impl, ucp_enables, in);
>>> > + }
>>> > +}
>>> > diff --git a/src/glsl/nir/nir_validate.c b/src/glsl/nir/nir_validate.c
>>> > index 9938c0e..f682575 100644
>>> > --- a/src/glsl/nir/nir_validate.c
>>> > +++ b/src/glsl/nir/nir_validate.c
>>> > @@ -667,8 +667,7 @@ validate_block(nir_block *block, validate_state
>>> > *state)
>>> > nir_if_first_then_node(if_stmt));
>>> > assert(&block->successors[1]->cf_node ==
>>> > nir_if_first_else_node(if_stmt));
>>> > - } else {
>>> > - assert(next->type == nir_cf_node_loop);
>>> > + } else if (next->type == nir_cf_node_loop) {
>>> > nir_loop *loop = nir_cf_node_as_loop(next);
>>> > assert(&block->successors[0]->cf_node ==
>>> > nir_loop_first_cf_node(loop));
>>> > --
>>> > 2.4.3
>>> >
>>> > _______________________________________________
>>> > mesa-dev mailing list
>>> > mesa-dev at lists.freedesktop.org
>>> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list