[Mesa-dev] [PATCH 1/2] nir: Use alloca instead of variable length arrays.

Fri Feb 27 14:50:57 PST 2015

On Fri, Feb 27, 2015 at 6:04 AM, Jose Fonseca <jfonseca at vmware.com> wrote:

> On 26/02/15 18:07, Brian Paul wrote:
>
>> On 02/26/2015 09:51 AM, Jose Fonseca wrote:
>>
>>> This is to enable the code to build with -Werror=vla in the short term,
>>> and enable the code to build with MSVC2013 soon after.
>>> ---
>>>   include/c99_alloca.h                 | 45
>>> ++++++++++++++++++++++++++++++++++++
>>>   src/glsl/nir/nir_from_ssa.c          | 19 +++++++--------
>>>   src/glsl/nir/nir_live_variables.c    |  5 ++--
>>>   src/glsl/nir/nir_lower_vars_to_ssa.c | 13 +++++++----
>>>   4 files changed, 66 insertions(+), 16 deletions(-)
>>>   create mode 100644 include/c99_alloca.h
>>>
>>> diff --git a/include/c99_alloca.h b/include/c99_alloca.h
>>> new file mode 100644
>>> index 0000000..6d96d06
>>> --- /dev/null
>>> +++ b/include/c99_alloca.h
>>> @@ -0,0 +1,45 @@
>>> +/**********************************************************
>>> ****************
>>>
>>> + *
>>> + * Copyright 2015 VMware, Inc.
>>> + * All Rights Reserved.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person
>>> obtaining a
>>> + * copy of this software and associated documentation files (the
>>> + * "Software"), to deal in the Software without restriction, including
>>> + * without limitation the rights to use, copy, modify, merge, publish,
>>> + * distribute, sub license, and/or sell copies of the Software, and to
>>> + * permit persons to whom the Software is furnished to do so, subject to
>>> + * the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice (including the
>>> + * next paragraph) shall be included in all copies or substantial
>>> portions
>>> + * of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>> EXPRESS
>>> + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>>> NON-INFRINGEMENT.
>>> + * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
>>> + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
>>> CONTRACT,
>>> + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>>> + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>>> + *
>>> +
>>> ************************************************************
>>> **************/
>>>
>>> +
>>> +#ifndef _C99_ALLOCA_H_
>>> +#define _C99_ALLOCA_H_
>>> +
>>> +
>>> +#if defined(_MSC_VER)
>>> +
>>> +#  include <malloc.h>
>>> +
>>> +#  define alloca _alloca
>>> +
>>> +#else /* !defined(_MSC_VER) */
>>> +
>>> +#  include <alloca.h>
>>> +
>>> +#endif /* !defined(_MSC_VER) */
>>> +
>>> +
>>> +#endif
>>> diff --git a/src/glsl/nir/nir_from_ssa.c b/src/glsl/nir/nir_from_ssa.c
>>> index c695c95..66339f3 100644
>>> --- a/src/glsl/nir/nir_from_ssa.c
>>> +++ b/src/glsl/nir/nir_from_ssa.c
>>> @@ -26,6 +26,7 @@
>>>    */
>>>
>>>   #include "nir.h"
>>> +#include "c99_alloca.h"
>>>
>>>   /*
>>>    * This file implements an out-of-SSA pass as described in "Revisiting
>>> @@ -181,7 +182,7 @@ merge_merge_sets(merge_set *a, merge_set *b)
>>>   static bool
>>>   merge_sets_interfere(merge_set *a, merge_set *b)
>>>   {
>>> -   merge_node *dom[a->size + b->size];
>>> +   merge_node **dom = alloca((a->size + b->size) * sizeof *dom);
>>>      int dom_idx = -1;
>>>
>>>      struct exec_node *an = exec_list_get_head(&a->nodes);
>>> @@ -673,21 +674,21 @@ resolve_parallel_copy(nir_parallel_copy_instr
>>> *pcopy,
>>>      }
>>>
>>>      /* The register/source corresponding to the given index */
>>> -   nir_src values[num_copies * 2];
>>> -   memset(values, 0, sizeof values);
>>> +   nir_src *values = alloca(num_copies * 2 * sizeof *values);
>>> +   memset(values, 0, num_copies * 2 * sizeof *values);
>>>
>>>      /* The current location of a given piece of data */
>>> -   int loc[num_copies * 2];
>>> +   int *loc = alloca(num_copies * 2 * sizeof *loc);
>>>
>>>      /* The piece of data that the given piece of data is to be copied
>>> from */
>>> -   int pred[num_copies * 2];
>>> +   int *pred = alloca(num_copies * 2 * sizeof *pred);
>>>
>>
These three are all pretty small.  < 10 elements in the usual case.  It's
going to be a crazy shader if they get above, say, 50 or 100 entries.

>
>>>      /* Initialize loc and pred.  We will use -1 for "null" */
>>> -   memset(loc, -1, sizeof loc);
>>> -   memset(pred, -1, sizeof pred);
>>> +   memset(loc, -1, num_copies * 2 * sizeof *loc);
>>> +   memset(pred, -1, num_copies * 2 * sizeof *pred);
>>>
>>>      /* The destinations we have yet to properly fill */
>>> -   int to_do[num_copies * 2];
>>> +   int *to_do = alloca(num_copies * 2 * sizeof *to_do);
>>>      int to_do_idx = -1;
>>>
>>>      /* Now we set everything up:
>>> @@ -737,7 +738,7 @@ resolve_parallel_copy(nir_parallel_copy_instr
>>> *pcopy,
>>>      }
>>>
>>>      /* Currently empty destinations we can go ahead and fill */
>>> -   int ready[num_copies * 2];
>>> +   int *ready = alloca(num_copies * 2 * sizeof *ready);
>>>
>>
Also small.  See above.

>      int ready_idx = -1;
>>>
>>>      /* Mark the ones that are ready for copying.  We know an index is a
>>> diff --git a/src/glsl/nir/nir_live_variables.c
>>> b/src/glsl/nir/nir_live_variables.c
>>> index 7402dc0..b57ca3a 100644
>>> --- a/src/glsl/nir/nir_live_variables.c
>>> +++ b/src/glsl/nir/nir_live_variables.c
>>> @@ -26,6 +26,7 @@
>>>
>>>   #include "nir.h"
>>>   #include "nir_worklist.h"
>>> +#include "c99_alloca.h"
>>>
>>>   /*
>>>    * Basic liveness analysis.  This works only in SSA form.
>>> @@ -130,8 +131,8 @@ static bool
>>>   propagate_across_edge(nir_block *pred, nir_block *succ,
>>>                         struct live_variables_state *state)
>>>   {
>>> -   BITSET_WORD live[state->bitset_words];
>>> -   memcpy(live, succ->live_in, sizeof live);
>>> +   BITSET_WORD *live = alloca(state->bitset_words * sizeof *live);
>>> +   memcpy(live, succ->live_in, state->bitset_words * sizeof *live);
>>>
>>
For this one, we might as well ralloc a single temporary, store it in
live_variables_state, and re-use it.  We're not really saving anything by
"reallocating" it each time; especially since we memcpy anyway.

>
>>>      nir_foreach_instr(succ, instr) {
>>>         if (instr->type != nir_instr_type_phi)
>>> diff --git a/src/glsl/nir/nir_lower_vars_to_ssa.c
>>> b/src/glsl/nir/nir_lower_vars_to_ssa.c
>>> index 8af7530..f54d1b7 100644
>>> --- a/src/glsl/nir/nir_lower_vars_to_ssa.c
>>> +++ b/src/glsl/nir/nir_lower_vars_to_ssa.c
>>> @@ -27,6 +27,9 @@
>>>
>>>   #include "nir.h"
>>>
>>> +#include "c99_alloca.h"
>>> +
>>> +
>>>   struct deref_node {
>>>      struct deref_node *parent;
>>>      const struct glsl_type *type;
>>> @@ -899,8 +902,8 @@ rename_variables_block(nir_block *block, struct
>>> lower_variables_state *state)
>>>   static void
>>>   insert_phi_nodes(struct lower_variables_state *state)
>>>   {
>>> -   unsigned work[state->impl->num_blocks];
>>> -   unsigned has_already[state->impl->num_blocks];
>>> +   unsigned *work = alloca(state->impl->num_blocks * sizeof *work);
>>> +   unsigned *has_already = alloca(state->impl->num_blocks * sizeof
>>> *has_already);
>>>
>>>      /*
>>>       * Since the work flags already prevent us from inserting a node
>>> that has
>>> @@ -910,10 +913,10 @@ insert_phi_nodes(struct lower_variables_state
>>> *state)
>>>       * function. So all we need to handle W is an array and a pointer
>>> to the
>>>       * next element to be inserted and the next element to be removed.
>>>       */
>>> -   nir_block *W[state->impl->num_blocks];
>>> +   nir_block **W = alloca(state->impl->num_blocks * sizeof *W);
>>>
>>> -   memset(work, 0, sizeof work);
>>> -   memset(has_already, 0, sizeof has_already);
>>> +   memset(work, 0, state->impl->num_blocks * sizeof *work);
>>> +   memset(has_already, 0, state->impl->num_blocks * sizeof
>>> *has_already);
>>>
>>
num_blocks shouldn't be large, so you can probably go ahead and put this on
the stack.  On the other hand, insert_phi_nodes is called once per run of
the optimization loop and we malloc enough other stuff that we won't notice
(from a performance perspective) if you just malloc it.

>
>>>      unsigned w_start, w_end;
>>>      unsigned iter_count = 0;
>>>
>>>
>> Looks OK to me.
>>
>> One thing I might have done would be instead of:
>>
>> unsigned *work = alloca(state->impl->num_blocks * sizeof *work);
>> ...
>> memset(work, 0, state->impl->num_blocks * sizeof *work);
>>
>> do
>>
>> const int work_size = state->impl->num_blocks * sizeof *work;
>> unsigned *work = alloca(work_size);
>>
> > ...
> > memset(work, 0, work_size);
>
> It's not a bad idea.  The snafu is that `work` must be declared before
> word_size so that `sizeof *work` is valid.
>
> That or we must use the actual type of `*work`, but that is more dangerous
> as it's very easy to get it wrong, particularly when dealing with pointers
> of pointers.
>
> So this would need to look like
>
>   unsigned *work;
>   const size_t work_size = state->impl->num_blocks * sizeof *work;
>   work = alloca(work_size);
>   ...
>   memset(work, 0, work_size);
>
>
> I'll post this in a follow-up review.
>
>  ...
>> memset(work, 0, work_size);
>>
>>
>> AFAIK, there's no zeroing version of alloca().
>>
>
> Yes, I also searched.  And unfortunately inline functions can't be used
> due to alloca semantics.  I'm not sure if there's any C-preprocessor magic.
>
>
> Anyway, depending on the maximum size of these alloca arrays, we might
> actually need to use malloc, as Windows stack is 1MB by default, plus the
> OpenGL driver only gets to use whatever stack is left from the application.
>

I doubt you'll have too much trouble there.  Most of those should be < 1K
even for really big shaders and none of these functions ever call any of
the others.  I've made a few comments above about potential specific
non-alloca solutions that might be better for some of them.

On a slightly different note, could you be more specific about what MSVC
can't do here?  I know it's had some issues with variable-length arrays in
the past (I've seen some of those issues personally), but could you be more
specific about what they are?
--Jason

> If using C++ on NIR is acceptable, std::vector<> would be a nice solution
> overall: practically the same syntax/semantics as C99 VLAs, plus no risk of
> stack overflow.
>
>
>> Reviewed-by: Brian Paul <brianp at vmware.com>
>>
>>
> Thanks.
>
> Jose
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150227/d1b77727/attachment-0001.html>