[Mesa-stable] [Mesa-dev] [PATCH 1/1] i965: Make sure the shadow buffers have enough space

Fri Apr 13 21:08:40 UTC 2018

On Fri, 13 Apr 2018 13:51:02 -0700
Kenneth Graunke <kenneth at whitecape.org> wrote:

> On Friday, April 13, 2018 1:35:45 PM PDT Kenneth Graunke wrote:
> > On Monday, April 9, 2018 4:06:16 PM PDT James Xiong wrote:  
> > > From: "Xiong, James" <james.xiong at intel.com>
> > > 
> > > On non-LLC platforms, we malloc shadow batch/state buffers
> > > of the same sizes as our batch/state buffers' GEM allocations.
> > > However the buffer allocator reuses similar-sized gem objects,
> > > it returns a buffer larger than we asked for in some cases
> > > and we end up with smaller shadow buffers. If we utilize the
> > > full-size of the over-allocated batch/state buffers, we may wind
> > > up accessing beyond the bounds of the shadow buffers and cause
> > > segmentation fault and/or memory corruption.  
> > 
> > Oh, good catch!  We do indeed malloc too little if the bufmgr
Thank you for taking time to review, Kenneth.
> >   
> > > A few examples:
> > >      case            batch                  state
> > >                  request bo   shadow     request bo  shadow
> > > init        0    20K     20K  20K        16K     16K 16K
> > > grow_buffer 1    30K     32K  30K        24K     24K 24K
> > > grow_buffer 2    48K     48K  48K        36K     40K 36K
> > > grow_buffer 3    72K     80K  72K        60K     64K 60K
> > > grow_buffer 4    120K    128K 120K       -       -   -
> > > 
> > > batch #1, #3, #4; state #2 and #3 are problematic. We can change
> > > the order to allocate the bo first, then allocate the shadow
> > > buffer using the bo's size so that the shadow buffer have at
> > > least an equivalent size of the gem allocation.
> > > 
> > > Another problem: even though the state/batch buffer could grow,
> > > when checking if it runs out space, we always compare with the
> > > initial batch/state sizes. To utilize the entire buffers, change
> > > to compare with the actual sizes.  
> > 
> > This is actually intentional.  Our goal is to flush batches when the
> > amount of commands or state reaches those thresholds.  Occasionally,
> > we'll be in the middle of emitting a draw, and unable to stop.  In
> > that case, we grow the batch and keep going.  But after that, we're
> > beyond our original target, so we flush next time.  We don't want
> > to grow without bounds...it's meant more for emergencies, or if
> > we've badly estimated the size of the draw call.
I am not sure I get it. Let me give an example: the state buffer
gets grown once from 16K to 24K in brw_state_batch(), the used_size
becomes 20K, then brw_require_statebuffer_space(1024) gets called to
ask for 1K space, with the original logical, it compares the used size
with 16K and flush the batch even though the state buffer still has 4K
space available?
> > 
> > I've sent a simpler patch which I think should hopefully fix your
> > bug: https://patchwork.freedesktop.org/patch/217107/  
> 
> Lionel noticed that I botched that patch.  Here's an actual one:
> 
> https://patchwork.freedesktop.org/patch/217108/
Yes it will fix the existing bug. However the assumption here is
that the init allocation size will NOT be rounded up as it happens to
be the bucket size.
I am working on an optimization to improve memory usage(that's how
I find out this bug), this assumption is no longer true. Essentially the
bufmgr could return a buffer with the same or larger size whether it is
same as the bucket's or not. Anyway I guess I can send the fix
later along with the optimization patches.