[Mesa-dev] [PATCH 1/2] nir: Use alloca instead of variable length arrays.

Mon Mar 2 07:40:30 PST 2015

On 27/02/15 22:50, Jason Ekstrand wrote:
>
>
> On Fri, Feb 27, 2015 at 6:04 AM, Jose Fonseca <jfonseca at vmware.com
> <mailto:jfonseca at vmware.com>> wrote:
>
>     On 26/02/15 18:07, Brian Paul wrote:
>
>         On 02/26/2015 09:51 AM, Jose Fonseca wrote:
>
>             This is to enable the code to build with -Werror=vla in the
>             short term,
>             and enable the code to build with MSVC2013 soon after.
>             ---
>                include/c99_alloca.h                 | 45
>             ++++++++++++++++++++++++++++++__++++++
>                src/glsl/nir/nir_from_ssa.c          | 19 +++++++--------
>                src/glsl/nir/nir_live___variables.c    |  5 ++--
>                src/glsl/nir/nir_lower_vars___to_ssa.c | 13 +++++++----
>                4 files changed, 66 insertions(+), 16 deletions(-)
>                create mode 100644 include/c99_alloca.h
>
>             diff --git a/include/c99_alloca.h b/include/c99_alloca.h
>             new file mode 100644
>             index 0000000..6d96d06
>             --- /dev/null
>             +++ b/include/c99_alloca.h
>             @@ -0,0 +1,45 @@
>             +/****************************__******************************__****************
>
>             + *
>             + * Copyright 2015 VMware, Inc.
>             + * All Rights Reserved.
>             + *
>             + * Permission is hereby granted, free of charge, to any person
>             obtaining a
>             + * copy of this software and associated documentation files
>             (the
>             + * "Software"), to deal in the Software without
>             restriction, including
>             + * without limitation the rights to use, copy, modify,
>             merge, publish,
>             + * distribute, sub license, and/or sell copies of the
>             Software, and to
>             + * permit persons to whom the Software is furnished to do
>             so, subject to
>             + * the following conditions:
>             + *
>             + * The above copyright notice and this permission notice
>             (including the
>             + * next paragraph) shall be included in all copies or
>             substantial
>             portions
>             + * of the Software.
>             + *
>             + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF
>             ANY KIND,
>             EXPRESS
>             + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>             + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>             NON-INFRINGEMENT.
>             + * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
>             + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
>             ACTION OF
>             CONTRACT,
>             + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
>             WITH THE
>             + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>             + *
>             +
>             ******************************__******************************__**************/
>
>             +
>             +#ifndef _C99_ALLOCA_H_
>             +#define _C99_ALLOCA_H_
>             +
>             +
>             +#if defined(_MSC_VER)
>             +
>             +#  include <malloc.h>
>             +
>             +#  define alloca _alloca
>             +
>             +#else /* !defined(_MSC_VER) */
>             +
>             +#  include <alloca.h>
>             +
>             +#endif /* !defined(_MSC_VER) */
>             +
>             +
>             +#endif
>             diff --git a/src/glsl/nir/nir_from_ssa.c
>             b/src/glsl/nir/nir_from_ssa.c
>             index c695c95..66339f3 100644
>             --- a/src/glsl/nir/nir_from_ssa.c
>             +++ b/src/glsl/nir/nir_from_ssa.c
>             @@ -26,6 +26,7 @@
>                 */
>
>                #include "nir.h"
>             +#include "c99_alloca.h"
>
>                /*
>                 * This file implements an out-of-SSA pass as described
>             in "Revisiting
>             @@ -181,7 +182,7 @@ merge_merge_sets(merge_set *a, merge_set *b)
>                static bool
>                merge_sets_interfere(merge_set *a, merge_set *b)
>                {
>             -   merge_node *dom[a->size + b->size];
>             +   merge_node **dom = alloca((a->size + b->size) * sizeof
>             *dom);
>                   int dom_idx = -1;
>
>                   struct exec_node *an = exec_list_get_head(&a->nodes);
>             @@ -673,21 +674,21 @@
>             resolve_parallel_copy(nir___parallel_copy_instr
>             *pcopy,
>                   }
>
>                   /* The register/source corresponding to the given index */
>             -   nir_src values[num_copies * 2];
>             -   memset(values, 0, sizeof values);
>             +   nir_src *values = alloca(num_copies * 2 * sizeof *values);
>             +   memset(values, 0, num_copies * 2 * sizeof *values);
>
>                   /* The current location of a given piece of data */
>             -   int loc[num_copies * 2];
>             +   int *loc = alloca(num_copies * 2 * sizeof *loc);
>
>                   /* The piece of data that the given piece of data is
>             to be copied
>             from */
>             -   int pred[num_copies * 2];
>             +   int *pred = alloca(num_copies * 2 * sizeof *pred);
>
>
> These three are all pretty small.  < 10 elements in the usual case.
> It's going to be a crazy shader if they get above, say, 50 or 100 entries.
>
>
>                   /* Initialize loc and pred.  We will use -1 for "null" */
>             -   memset(loc, -1, sizeof loc);
>             -   memset(pred, -1, sizeof pred);
>             +   memset(loc, -1, num_copies * 2 * sizeof *loc);
>             +   memset(pred, -1, num_copies * 2 * sizeof *pred);
>
>                   /* The destinations we have yet to properly fill */
>             -   int to_do[num_copies * 2];
>             +   int *to_do = alloca(num_copies * 2 * sizeof *to_do);
>                   int to_do_idx = -1;
>
>                   /* Now we set everything up:
>             @@ -737,7 +738,7 @@
>             resolve_parallel_copy(nir___parallel_copy_instr *pcopy,
>                   }
>
>                   /* Currently empty destinations we can go ahead and
>             fill */
>             -   int ready[num_copies * 2];
>             +   int *ready = alloca(num_copies * 2 * sizeof *ready);
>
>
> Also small.  See above.
>
>                   int ready_idx = -1;
>
>                   /* Mark the ones that are ready for copying.  We know
>             an index is a
>             diff --git a/src/glsl/nir/nir_live___variables.c
>             b/src/glsl/nir/nir_live___variables.c
>             index 7402dc0..b57ca3a 100644
>             --- a/src/glsl/nir/nir_live___variables.c
>             +++ b/src/glsl/nir/nir_live___variables.c
>             @@ -26,6 +26,7 @@
>
>                #include "nir.h"
>                #include "nir_worklist.h"
>             +#include "c99_alloca.h"
>
>                /*
>                 * Basic liveness analysis.  This works only in SSA form.
>             @@ -130,8 +131,8 @@ static bool
>                propagate_across_edge(nir___block *pred, nir_block *succ,
>                                      struct live_variables_state *state)
>                {
>             -   BITSET_WORD live[state->bitset_words];
>             -   memcpy(live, succ->live_in, sizeof live);
>             +   BITSET_WORD *live = alloca(state->bitset_words * sizeof
>             *live);
>             +   memcpy(live, succ->live_in, state->bitset_words * sizeof
>             *live);
>
>
> For this one, we might as well ralloc a single temporary, store it in
> live_variables_state, and re-use it.  We're not really saving anything
> by "reallocating" it each time; especially since we memcpy anyway.
>
>
>                   nir_foreach_instr(succ, instr) {
>                      if (instr->type != nir_instr_type_phi)
>             diff --git a/src/glsl/nir/nir_lower_vars___to_ssa.c
>             b/src/glsl/nir/nir_lower_vars___to_ssa.c
>             index 8af7530..f54d1b7 100644
>             --- a/src/glsl/nir/nir_lower_vars___to_ssa.c
>             +++ b/src/glsl/nir/nir_lower_vars___to_ssa.c
>             @@ -27,6 +27,9 @@
>
>                #include "nir.h"
>
>             +#include "c99_alloca.h"
>             +
>             +
>                struct deref_node {
>                   struct deref_node *parent;
>                   const struct glsl_type *type;
>             @@ -899,8 +902,8 @@ rename_variables_block(nir___block
>             *block, struct
>             lower_variables_state *state)
>                static void
>                insert_phi_nodes(struct lower_variables_state *state)
>                {
>             -   unsigned work[state->impl->num_blocks];
>             -   unsigned has_already[state->impl->num___blocks];
>             +   unsigned *work = alloca(state->impl->num_blocks * sizeof
>             *work);
>             +   unsigned *has_already = alloca(state->impl->num_blocks *
>             sizeof
>             *has_already);
>
>                   /*
>                    * Since the work flags already prevent us from
>             inserting a node
>             that has
>             @@ -910,10 +913,10 @@ insert_phi_nodes(struct
>             lower_variables_state
>             *state)
>                    * function. So all we need to handle W is an array
>             and a pointer
>             to the
>                    * next element to be inserted and the next element to
>             be removed.
>                    */
>             -   nir_block *W[state->impl->num_blocks];
>             +   nir_block **W = alloca(state->impl->num_blocks * sizeof *W);
>
>             -   memset(work, 0, sizeof work);
>             -   memset(has_already, 0, sizeof has_already);
>             +   memset(work, 0, state->impl->num_blocks * sizeof *work);
>             +   memset(has_already, 0, state->impl->num_blocks * sizeof
>             *has_already);
>
>
> num_blocks shouldn't be large, so you can probably go ahead and put this
> on the stack.  On the other hand, insert_phi_nodes is called once per
> run of the optimization loop and we malloc enough other stuff that we
> won't notice (from a performance perspective) if you just malloc it.
>
>
>                   unsigned w_start, w_end;
>                   unsigned iter_count = 0;
>
>
>         Looks OK to me.
>
>         One thing I might have done would be instead of:
>
>         unsigned *work = alloca(state->impl->num_blocks * sizeof *work);
>         ...
>         memset(work, 0, state->impl->num_blocks * sizeof *work);
>
>         do
>
>         const int work_size = state->impl->num_blocks * sizeof *work;
>         unsigned *work = alloca(work_size);
>
>      > ...
>      > memset(work, 0, work_size);
>
>     It's not a bad idea.  The snafu is that `work` must be declared
>     before word_size so that `sizeof *work` is valid.
>
>     That or we must use the actual type of `*work`, but that is more
>     dangerous as it's very easy to get it wrong, particularly when
>     dealing with pointers of pointers.
>
>     So this would need to look like
>
>        unsigned *work;
>        const size_t work_size = state->impl->num_blocks * sizeof *work;
>        work = alloca(work_size);
>        ...
>        memset(work, 0, work_size);
>
>
>     I'll post this in a follow-up review.
>
>         ...
>         memset(work, 0, work_size);
>
>
>         AFAIK, there's no zeroing version of alloca().
>
>
>     Yes, I also searched.  And unfortunately inline functions can't be
>     used due to alloca semantics.  I'm not sure if there's any
>     C-preprocessor magic.
>
>
>     Anyway, depending on the maximum size of these alloca arrays, we
>     might actually need to use malloc, as Windows stack is 1MB by
>     default, plus the OpenGL driver only gets to use whatever stack is
>     left from the application.
>
>
> I doubt you'll have too much trouble there.  Most of those should be <
> 1K even for really big shaders and none of these functions ever call any
> of the others.  I've made a few comments above about potential specific
> non-alloca solutions that might be better for some of them.

Thanks.

> On a slightly different note, could you be more specific about what MSVC
> can't do here?  I know it's had some issues with variable-length arrays
> in the past (I've seen some of those issues personally), but could you
> be more specific about what they are?

MSVC doesn't support VLA at all.  That is, just like C90, the size of 
arrays must be a constant, or a constant expression.

There's a brief mention of this in 
https://msdn.microsoft.com/en-US/library/zb1574zs.aspx

   "Variable length arrays are not currently supported in Visual C++."

but it's in a section about OpenMP.

The fact is that any attempt to use VLA results in error C2057: expected 
constant expression:

   https://msdn.microsoft.com/en-us/library/eff825eh.aspx

Jose