[Mesa-dev] i965: Invalid memory accesses after resizing brw_codegen's store table

Iago Toral itoral at igalia.com
Thu Oct 29 00:32:33 PDT 2015


On Wed, 2015-10-28 at 10:58 -0700, Kristian Høgsberg wrote:
> On Wed, Oct 28, 2015 at 10:01:40AM +0100, Samuel Iglesias Gonsálvez wrote:
> > There is no opinions about this issue or reviews of the proposed patch
> > after one week.
> > 
> > This is just a reminder in case you have missed it :-)
> 
> Thanks for the reminder! How about something like this instead?

Yeah, that works too. I was a bit concerned that this same problem may
be affecting other places and this would only address it for
brw_send_indirect_message, but after a quick review we don't generally
need to hold pointers to previous instructions and the places where we
do, like in brw_ENDIF or brw_WHILE we are careful to create the
instructions we need before we look for pointers to others (which we do
using indices into the store anyway).

Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>

I'll push this patch tomorrow if nobody else objects.

Thanks Kristian!

> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index ebd811f..cd5c726 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -2511,12 +2511,20 @@ brw_send_indirect_message(struct brw_codegen *p,
>                            struct brw_reg desc)
>  {
>     const struct brw_device_info *devinfo = p->devinfo;
> -   struct brw_inst *send, *setup;
> +   struct brw_inst *send;
> +   int setup;
>  
>     assert(desc.type == BRW_REGISTER_TYPE_UD);
>  
> +   /* We hold on to the setup instruction (the SEND in the direct case, the OR
> +    * in the indirect case) by its index in the instruction store.  The
> +    * pointer returned by next_insn() may become invalid if emitting the SEND
> +    * in the indirect case reallocs the store.
> +    */
> +
>     if (desc.file == BRW_IMMEDIATE_VALUE) {
> -      setup = send = next_insn(p, BRW_OPCODE_SEND);
> +      setup = p->nr_insn;
> +      send = next_insn(p, BRW_OPCODE_SEND);
>        brw_set_src1(p, send, desc);
>  
>     } else {
> @@ -2531,7 +2539,8 @@ brw_send_indirect_message(struct brw_codegen *p,
>         * caller can specify additional descriptor bits with the usual
>         * brw_set_*_message() helper functions.
>         */
> -      setup = brw_OR(p, addr, desc, brw_imm_ud(0));
> +      setup = p->nr_insn;
> +      brw_OR(p, addr, desc, brw_imm_ud(0));
>  
>        brw_pop_insn_state(p);
>  
> @@ -2543,7 +2552,7 @@ brw_send_indirect_message(struct brw_codegen *p,
>     brw_set_src0(p, send, retype(payload, BRW_REGISTER_TYPE_UD));
>     brw_inst_set_sfid(devinfo, send, sfid);
>  
> -   return setup;
> +   return &p->store[setup];
>  }
>  
>  static struct brw_inst *
> 
> 
> > Sam
> > 
> > On 21/10/15 12:23, Iago Toral wrote:
> > > Hi,
> > > 
> > > The problem is with code like this (see brw_send_indirect_message):
> > > 
> > > setup = brw_OR(p, addr, desc, brw_imm_ud(0));
> > > send = next_insn(p, BRW_OPCODE_SEND);
> > > ...
> > > return setup;
> > > 
> > > If next_insn triggers a realloc of the instruction store, then the setup
> > > instruction pointer is no longer valid. Notice that this can happen
> > > anywhere where we keep pointers to previous instructions before creating
> > > new ones (!)
> > > 
> > > The patch from Samuel fixes this by special-casing this for SEND
> > > instructions only (since we know that the indirect versions can hit
> > > this, maybe there are more situations though). It does so by trying to
> > > make sure that we never realloc the store with a SEND instruction. For
> > > this, we realloc before we reach the end of the current store (32
> > > instructions before the limit) as long as the instruction is not a SEND
> > > (so that if it is a SEND we still have up to 32 opportunities to do the
> > > realloc without a different instruction before running out of space in
> > > the store).
> > > 
> > > Iago
> > > 
> > > On Wed, 2015-10-21 at 09:02 +0200, Samuel Iglesias Gonsalvez wrote:
> > >> Hello,
> > >>
> > >> I have found several invalid memory accesses when running
> > >> dEQP-GLES31.functional.ssbo.* tests on i965 driver (and gen7+). That
> > >> invalid memory accesses were unluckily happening when generating the
> > >> assembly instructions for SSBO stores for different compute shaders.
> > >>
> > >> However it looks like this problem could happen to other shaders and
> > >> situations. Because of that, I am going to explain the problem here:
> > >>
> > >> When generating a untyped surface write/read, i965 driver will end up
> > >> calling brw_send_indirect_message() through
> > >> brw_send_indirect_surface_message(). At brw_send_indirect_message()'s
> > >> 'else' branch, the code generates a load of the indirect descriptor to
> > >> an address register using an OR instruction and it also generates a new
> > >> SEND instruction; if this case happens, the OR instruction is returned.
> > >> brw_send_indirect_surface_message() uses that OR instruction to set mlen
> > >> and rlen's descriptor bits later.
> > >>
> > >> Just to give more context, when generating instructions in fs/vec4
> > >> generators, i965 driver uses pointers to elements in the 'store' table
> > >> inside struct brw_codegen. That table has an initial size of 1024 but,
> > >> when it's full, it is resized (doubling its size each time,
> > >> see brw_next_insn()). This resize operation ends up calling
> > >> realloc(). However the returned pointer by realloc() could be different
> > >> and the old allocated memory would be free'd as part of the process.
> > >>
> > >> Back to the issue, if the p->store's resize happens when we get the
> > >> pointer to the SEND instruction at brw_send_indirect_message()'s 'else'
> > >> branch, we could have the following problem:
> > >>
> > >> The realloc() returns a new pointer and *free'd* the old allocation, so
> > >> the pointer we previously saved for the OR instruction at
> > >> brw_send_indirect_surface_message() becomes invalid (because it is a
> > >> pointer of the old allocation). Then, we access to that invalid pointer
> > >> when setting up rlen/mlen bits at the end of
> > >> brw_send_indirect_surface_message() and we would have undefined results.
> > >>
> > >> This issue is quite unlikely to happen but it is reproducible on
> > >> ~120 dEQP-GLES31.functional.ssbo.* tests, basically because they have
> > >> the same shaders except the buffer variable's data type. Those tests
> > >> were failing intermittently at different rates but valgrind helped to
> > >> find what was happening.
> > >>
> > >> I would like to expose publicly the problem and analyse possible
> > >> solutions for it along with the community. For the time being, a patch
> > >> is proposed to mitigate this issue, which is sent as a reply to this
> > >> one.
> > >>
> > >> What this work-around patch does is: book enough room for one or more
> > >> consecutive SEND* instructions at the end of the p->store table in order
> > >> to avoid reallocating p->store in the aforementioned case. The 32 value
> > >> was chosen arbitrary because it has low impact in p->store
> > >> (its initial size is 1024) and makes this issue much less likely to
> > >> happen. We could tune this number to a less conservative value if
> > >> needed. If you want to test it, that patch should be applied on top of
> > >> this Curro's patch [0] as it fixes a lot of compute shader compilation
> > >> errors (~700 dEQP-GLES31.functional.ssbo.* tests). I have setup a branch
> > >> with both patches in [1].
> > >>
> > >> Feel free to comment about other solutions, ideas, opinions, etc.
> > >>
> > >> Thanks,
> > >>
> > >> Sam
> > >>
> > >> [0] See attachment at
> > >> http://lists.freedesktop.org/archives/mesa-dev/2015-October/097183.html
> > >> [1] $ git clone -b dEQP-functional-ssbo-fixes-v1 \
> > >> https://github.com/Igalia/mesa.git
> > >>
> > >> Samuel Iglesias Gonsalvez (1):
> > >>   i965: book space at the end of p->store for SEND opcodes to avoid
> > >>     invalid memory access
> > >>
> > >>  src/mesa/drivers/dri/i965/brw_eu_emit.c | 12 +++++++++++-
> > >>  1 file changed, 11 insertions(+), 1 deletion(-)
> > >>
> > > 
> > > 
> > > 
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 




More information about the mesa-dev mailing list