[Mesa-dev] i965: Invalid memory accesses after resizing brw_codegen's store table

Kristian Høgsberg krh at bitplanet.net
Thu Oct 29 12:28:31 PDT 2015


On Thu, Oct 29, 2015 at 12:32 AM, Iago Toral <itoral at igalia.com> wrote:
> On Wed, 2015-10-28 at 10:58 -0700, Kristian Høgsberg wrote:
>> On Wed, Oct 28, 2015 at 10:01:40AM +0100, Samuel Iglesias Gonsálvez wrote:
>> > There is no opinions about this issue or reviews of the proposed patch
>> > after one week.
>> >
>> > This is just a reminder in case you have missed it :-)
>>
>> Thanks for the reminder! How about something like this instead?
>
> Yeah, that works too. I was a bit concerned that this same problem may
> be affecting other places and this would only address it for
> brw_send_indirect_message, but after a quick review we don't generally
> need to hold pointers to previous instructions and the places where we
> do, like in brw_ENDIF or brw_WHILE we are careful to create the
> instructions we need before we look for pointers to others (which we do
> using indices into the store anyway).

Yea, it's not a general fix, but I'm not sure how we'd do that. In
your patch you were special casing the send instruction, which also
only covers some cases. What I had in mind when I wrote this
alternative fix was that: 1) it's a lot simpler and 2) it's local to
the cause of the problem and 3) doesn't suggest that you can safely
hold on to inst pointers (shows the opposite, in fact)

> Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>
>
> I'll push this patch tomorrow if nobody else objects.

Thanks, good find.

> Thanks Kristian!
>
>> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c
>> index ebd811f..cd5c726 100644
>> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
>> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
>> @@ -2511,12 +2511,20 @@ brw_send_indirect_message(struct brw_codegen *p,
>>                            struct brw_reg desc)
>>  {
>>     const struct brw_device_info *devinfo = p->devinfo;
>> -   struct brw_inst *send, *setup;
>> +   struct brw_inst *send;
>> +   int setup;
>>
>>     assert(desc.type == BRW_REGISTER_TYPE_UD);
>>
>> +   /* We hold on to the setup instruction (the SEND in the direct case, the OR
>> +    * in the indirect case) by its index in the instruction store.  The
>> +    * pointer returned by next_insn() may become invalid if emitting the SEND
>> +    * in the indirect case reallocs the store.
>> +    */
>> +
>>     if (desc.file == BRW_IMMEDIATE_VALUE) {
>> -      setup = send = next_insn(p, BRW_OPCODE_SEND);
>> +      setup = p->nr_insn;
>> +      send = next_insn(p, BRW_OPCODE_SEND);
>>        brw_set_src1(p, send, desc);
>>
>>     } else {
>> @@ -2531,7 +2539,8 @@ brw_send_indirect_message(struct brw_codegen *p,
>>         * caller can specify additional descriptor bits with the usual
>>         * brw_set_*_message() helper functions.
>>         */
>> -      setup = brw_OR(p, addr, desc, brw_imm_ud(0));
>> +      setup = p->nr_insn;
>> +      brw_OR(p, addr, desc, brw_imm_ud(0));
>>
>>        brw_pop_insn_state(p);
>>
>> @@ -2543,7 +2552,7 @@ brw_send_indirect_message(struct brw_codegen *p,
>>     brw_set_src0(p, send, retype(payload, BRW_REGISTER_TYPE_UD));
>>     brw_inst_set_sfid(devinfo, send, sfid);
>>
>> -   return setup;
>> +   return &p->store[setup];
>>  }
>>
>>  static struct brw_inst *
>>
>>
>> > Sam
>> >
>> > On 21/10/15 12:23, Iago Toral wrote:
>> > > Hi,
>> > >
>> > > The problem is with code like this (see brw_send_indirect_message):
>> > >
>> > > setup = brw_OR(p, addr, desc, brw_imm_ud(0));
>> > > send = next_insn(p, BRW_OPCODE_SEND);
>> > > ...
>> > > return setup;
>> > >
>> > > If next_insn triggers a realloc of the instruction store, then the setup
>> > > instruction pointer is no longer valid. Notice that this can happen
>> > > anywhere where we keep pointers to previous instructions before creating
>> > > new ones (!)
>> > >
>> > > The patch from Samuel fixes this by special-casing this for SEND
>> > > instructions only (since we know that the indirect versions can hit
>> > > this, maybe there are more situations though). It does so by trying to
>> > > make sure that we never realloc the store with a SEND instruction. For
>> > > this, we realloc before we reach the end of the current store (32
>> > > instructions before the limit) as long as the instruction is not a SEND
>> > > (so that if it is a SEND we still have up to 32 opportunities to do the
>> > > realloc without a different instruction before running out of space in
>> > > the store).
>> > >
>> > > Iago
>> > >
>> > > On Wed, 2015-10-21 at 09:02 +0200, Samuel Iglesias Gonsalvez wrote:
>> > >> Hello,
>> > >>
>> > >> I have found several invalid memory accesses when running
>> > >> dEQP-GLES31.functional.ssbo.* tests on i965 driver (and gen7+). That
>> > >> invalid memory accesses were unluckily happening when generating the
>> > >> assembly instructions for SSBO stores for different compute shaders.
>> > >>
>> > >> However it looks like this problem could happen to other shaders and
>> > >> situations. Because of that, I am going to explain the problem here:
>> > >>
>> > >> When generating a untyped surface write/read, i965 driver will end up
>> > >> calling brw_send_indirect_message() through
>> > >> brw_send_indirect_surface_message(). At brw_send_indirect_message()'s
>> > >> 'else' branch, the code generates a load of the indirect descriptor to
>> > >> an address register using an OR instruction and it also generates a new
>> > >> SEND instruction; if this case happens, the OR instruction is returned.
>> > >> brw_send_indirect_surface_message() uses that OR instruction to set mlen
>> > >> and rlen's descriptor bits later.
>> > >>
>> > >> Just to give more context, when generating instructions in fs/vec4
>> > >> generators, i965 driver uses pointers to elements in the 'store' table
>> > >> inside struct brw_codegen. That table has an initial size of 1024 but,
>> > >> when it's full, it is resized (doubling its size each time,
>> > >> see brw_next_insn()). This resize operation ends up calling
>> > >> realloc(). However the returned pointer by realloc() could be different
>> > >> and the old allocated memory would be free'd as part of the process.
>> > >>
>> > >> Back to the issue, if the p->store's resize happens when we get the
>> > >> pointer to the SEND instruction at brw_send_indirect_message()'s 'else'
>> > >> branch, we could have the following problem:
>> > >>
>> > >> The realloc() returns a new pointer and *free'd* the old allocation, so
>> > >> the pointer we previously saved for the OR instruction at
>> > >> brw_send_indirect_surface_message() becomes invalid (because it is a
>> > >> pointer of the old allocation). Then, we access to that invalid pointer
>> > >> when setting up rlen/mlen bits at the end of
>> > >> brw_send_indirect_surface_message() and we would have undefined results.
>> > >>
>> > >> This issue is quite unlikely to happen but it is reproducible on
>> > >> ~120 dEQP-GLES31.functional.ssbo.* tests, basically because they have
>> > >> the same shaders except the buffer variable's data type. Those tests
>> > >> were failing intermittently at different rates but valgrind helped to
>> > >> find what was happening.
>> > >>
>> > >> I would like to expose publicly the problem and analyse possible
>> > >> solutions for it along with the community. For the time being, a patch
>> > >> is proposed to mitigate this issue, which is sent as a reply to this
>> > >> one.
>> > >>
>> > >> What this work-around patch does is: book enough room for one or more
>> > >> consecutive SEND* instructions at the end of the p->store table in order
>> > >> to avoid reallocating p->store in the aforementioned case. The 32 value
>> > >> was chosen arbitrary because it has low impact in p->store
>> > >> (its initial size is 1024) and makes this issue much less likely to
>> > >> happen. We could tune this number to a less conservative value if
>> > >> needed. If you want to test it, that patch should be applied on top of
>> > >> this Curro's patch [0] as it fixes a lot of compute shader compilation
>> > >> errors (~700 dEQP-GLES31.functional.ssbo.* tests). I have setup a branch
>> > >> with both patches in [1].
>> > >>
>> > >> Feel free to comment about other solutions, ideas, opinions, etc.
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Sam
>> > >>
>> > >> [0] See attachment at
>> > >> http://lists.freedesktop.org/archives/mesa-dev/2015-October/097183.html
>> > >> [1] $ git clone -b dEQP-functional-ssbo-fixes-v1 \
>> > >> https://github.com/Igalia/mesa.git
>> > >>
>> > >> Samuel Iglesias Gonsalvez (1):
>> > >>   i965: book space at the end of p->store for SEND opcodes to avoid
>> > >>     invalid memory access
>> > >>
>> > >>  src/mesa/drivers/dri/i965/brw_eu_emit.c | 12 +++++++++++-
>> > >>  1 file changed, 11 insertions(+), 1 deletion(-)
>> > >>
>> > >
>> > >
>> > >
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
>


More information about the mesa-dev mailing list