[Mesa-dev] i965: Invalid memory accesses after resizing brw_codegen's store table
Iago Toral
itoral at igalia.com
Thu Oct 29 00:32:33 PDT 2015
On Wed, 2015-10-28 at 10:58 -0700, Kristian Høgsberg wrote:
> On Wed, Oct 28, 2015 at 10:01:40AM +0100, Samuel Iglesias Gonsálvez wrote:
> > There is no opinions about this issue or reviews of the proposed patch
> > after one week.
> >
> > This is just a reminder in case you have missed it :-)
>
> Thanks for the reminder! How about something like this instead?
Yeah, that works too. I was a bit concerned that this same problem may
be affecting other places and this would only address it for
brw_send_indirect_message, but after a quick review we don't generally
need to hold pointers to previous instructions and the places where we
do, like in brw_ENDIF or brw_WHILE we are careful to create the
instructions we need before we look for pointers to others (which we do
using indices into the store anyway).
Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>
I'll push this patch tomorrow if nobody else objects.
Thanks Kristian!
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index ebd811f..cd5c726 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -2511,12 +2511,20 @@ brw_send_indirect_message(struct brw_codegen *p,
> struct brw_reg desc)
> {
> const struct brw_device_info *devinfo = p->devinfo;
> - struct brw_inst *send, *setup;
> + struct brw_inst *send;
> + int setup;
>
> assert(desc.type == BRW_REGISTER_TYPE_UD);
>
> + /* We hold on to the setup instruction (the SEND in the direct case, the OR
> + * in the indirect case) by its index in the instruction store. The
> + * pointer returned by next_insn() may become invalid if emitting the SEND
> + * in the indirect case reallocs the store.
> + */
> +
> if (desc.file == BRW_IMMEDIATE_VALUE) {
> - setup = send = next_insn(p, BRW_OPCODE_SEND);
> + setup = p->nr_insn;
> + send = next_insn(p, BRW_OPCODE_SEND);
> brw_set_src1(p, send, desc);
>
> } else {
> @@ -2531,7 +2539,8 @@ brw_send_indirect_message(struct brw_codegen *p,
> * caller can specify additional descriptor bits with the usual
> * brw_set_*_message() helper functions.
> */
> - setup = brw_OR(p, addr, desc, brw_imm_ud(0));
> + setup = p->nr_insn;
> + brw_OR(p, addr, desc, brw_imm_ud(0));
>
> brw_pop_insn_state(p);
>
> @@ -2543,7 +2552,7 @@ brw_send_indirect_message(struct brw_codegen *p,
> brw_set_src0(p, send, retype(payload, BRW_REGISTER_TYPE_UD));
> brw_inst_set_sfid(devinfo, send, sfid);
>
> - return setup;
> + return &p->store[setup];
> }
>
> static struct brw_inst *
>
>
> > Sam
> >
> > On 21/10/15 12:23, Iago Toral wrote:
> > > Hi,
> > >
> > > The problem is with code like this (see brw_send_indirect_message):
> > >
> > > setup = brw_OR(p, addr, desc, brw_imm_ud(0));
> > > send = next_insn(p, BRW_OPCODE_SEND);
> > > ...
> > > return setup;
> > >
> > > If next_insn triggers a realloc of the instruction store, then the setup
> > > instruction pointer is no longer valid. Notice that this can happen
> > > anywhere where we keep pointers to previous instructions before creating
> > > new ones (!)
> > >
> > > The patch from Samuel fixes this by special-casing this for SEND
> > > instructions only (since we know that the indirect versions can hit
> > > this, maybe there are more situations though). It does so by trying to
> > > make sure that we never realloc the store with a SEND instruction. For
> > > this, we realloc before we reach the end of the current store (32
> > > instructions before the limit) as long as the instruction is not a SEND
> > > (so that if it is a SEND we still have up to 32 opportunities to do the
> > > realloc without a different instruction before running out of space in
> > > the store).
> > >
> > > Iago
> > >
> > > On Wed, 2015-10-21 at 09:02 +0200, Samuel Iglesias Gonsalvez wrote:
> > >> Hello,
> > >>
> > >> I have found several invalid memory accesses when running
> > >> dEQP-GLES31.functional.ssbo.* tests on i965 driver (and gen7+). That
> > >> invalid memory accesses were unluckily happening when generating the
> > >> assembly instructions for SSBO stores for different compute shaders.
> > >>
> > >> However it looks like this problem could happen to other shaders and
> > >> situations. Because of that, I am going to explain the problem here:
> > >>
> > >> When generating a untyped surface write/read, i965 driver will end up
> > >> calling brw_send_indirect_message() through
> > >> brw_send_indirect_surface_message(). At brw_send_indirect_message()'s
> > >> 'else' branch, the code generates a load of the indirect descriptor to
> > >> an address register using an OR instruction and it also generates a new
> > >> SEND instruction; if this case happens, the OR instruction is returned.
> > >> brw_send_indirect_surface_message() uses that OR instruction to set mlen
> > >> and rlen's descriptor bits later.
> > >>
> > >> Just to give more context, when generating instructions in fs/vec4
> > >> generators, i965 driver uses pointers to elements in the 'store' table
> > >> inside struct brw_codegen. That table has an initial size of 1024 but,
> > >> when it's full, it is resized (doubling its size each time,
> > >> see brw_next_insn()). This resize operation ends up calling
> > >> realloc(). However the returned pointer by realloc() could be different
> > >> and the old allocated memory would be free'd as part of the process.
> > >>
> > >> Back to the issue, if the p->store's resize happens when we get the
> > >> pointer to the SEND instruction at brw_send_indirect_message()'s 'else'
> > >> branch, we could have the following problem:
> > >>
> > >> The realloc() returns a new pointer and *free'd* the old allocation, so
> > >> the pointer we previously saved for the OR instruction at
> > >> brw_send_indirect_surface_message() becomes invalid (because it is a
> > >> pointer of the old allocation). Then, we access to that invalid pointer
> > >> when setting up rlen/mlen bits at the end of
> > >> brw_send_indirect_surface_message() and we would have undefined results.
> > >>
> > >> This issue is quite unlikely to happen but it is reproducible on
> > >> ~120 dEQP-GLES31.functional.ssbo.* tests, basically because they have
> > >> the same shaders except the buffer variable's data type. Those tests
> > >> were failing intermittently at different rates but valgrind helped to
> > >> find what was happening.
> > >>
> > >> I would like to expose publicly the problem and analyse possible
> > >> solutions for it along with the community. For the time being, a patch
> > >> is proposed to mitigate this issue, which is sent as a reply to this
> > >> one.
> > >>
> > >> What this work-around patch does is: book enough room for one or more
> > >> consecutive SEND* instructions at the end of the p->store table in order
> > >> to avoid reallocating p->store in the aforementioned case. The 32 value
> > >> was chosen arbitrary because it has low impact in p->store
> > >> (its initial size is 1024) and makes this issue much less likely to
> > >> happen. We could tune this number to a less conservative value if
> > >> needed. If you want to test it, that patch should be applied on top of
> > >> this Curro's patch [0] as it fixes a lot of compute shader compilation
> > >> errors (~700 dEQP-GLES31.functional.ssbo.* tests). I have setup a branch
> > >> with both patches in [1].
> > >>
> > >> Feel free to comment about other solutions, ideas, opinions, etc.
> > >>
> > >> Thanks,
> > >>
> > >> Sam
> > >>
> > >> [0] See attachment at
> > >> http://lists.freedesktop.org/archives/mesa-dev/2015-October/097183.html
> > >> [1] $ git clone -b dEQP-functional-ssbo-fixes-v1 \
> > >> https://github.com/Igalia/mesa.git
> > >>
> > >> Samuel Iglesias Gonsalvez (1):
> > >> i965: book space at the end of p->store for SEND opcodes to avoid
> > >> invalid memory access
> > >>
> > >> src/mesa/drivers/dri/i965/brw_eu_emit.c | 12 +++++++++++-
> > >> 1 file changed, 11 insertions(+), 1 deletion(-)
> > >>
> > >
> > >
> > >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
More information about the mesa-dev
mailing list