[Mesa-stable] [PATCH] Revert "i965: Stop aux data compare preventing program binary re-use"

Pohjolainen, Topi topi.pohjolainen at intel.com
Thu Aug 27 23:56:43 PDT 2015


On Thu, Aug 27, 2015 at 10:05:14AM -0700, Ben Widawsky wrote:
> On Thu, Aug 27, 2015 at 10:51:59AM +0300, Pohjolainen, Topi wrote:
> > On Wed, Aug 26, 2015 at 03:46:05PM -0700, Ben Widawsky wrote:
> > > This reverts commit 1bba29ed403e735ba0bf04ed8aa2e571884fcaaf
> > > Author: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > > Date:   Thu Jun 25 14:00:41 2015 +0300
> > > 
> > >     i965: Stop aux data compare preventing program binary re-use
> > > 
> > > This fixes an intermittent failure in
> > > piglit.spec.arb_pixel_buffer_object.texsubimage pbo.sklm64 (maybe other
> > > platforms as well, but it is harder to reproduce). I can usually hit the failure
> > > within 10 runs of the test. This is a very hairy commit to debug. I'll let Topi
> > > handle it, or else we should go with the revert. I am open to either. I got
> > > lucky that Jenkins caught this on a run.
> > > 
> > > Here was the script I used for bisect:
> > > 
> > > i=0
> > > while [ $i -lt 40 ] ; do
> > >    ./bin/texsubimage pbo -auto -fbo > /dev/null 2>&1
> > >    [[ $? != 0 ]] && echo fail && exit 1
> > >    ((i++))
> > > done
> > > 
> > > exit 0
> > 
> > Should I use different piglit version than the current master? I'm asking
> > because I get this with both Mesa master and my patch reverted.
> 
> My piglit was pretty old. I just updated that, but mesa was master as of
> yesterday (output is below)
> 
> The one bit of advice I can add is that you make sure your system is updated to
> the very latest.
> 
> > 
> > testrunner at skl-y:~/topi/piglit$ ./bin/texsubimage pbo -auto -fbo
> > Using test set: Core formats
> > texsubimage failed
> >   target: GL_TEXTURE_2D
> >   internal format: GL_COMPRESSED_RGB_S3TC_DXT1_EXT
> >   region: 68, 12  32 x 48
> > texsubimage failed
> >   target: GL_TEXTURE_2D
> >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT1_EXT
> >   region: 0, 28  116 x 20
> > texsubimage failed
> >   target: GL_TEXTURE_2D
> >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT3_EXT
> >   region: 16, 4  60 x 36
> > texsubimage failed
> >   target: GL_TEXTURE_2D
> >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT5_EXT
> >   region: 8, 0  104 x 60
> > Mesa: User error: GL_INVALID_OPERATION in glTexSubImage2D(out of bounds PBO access)
> > PIGLIT: {"result": "fail" }
> 
> Interesting. It so happens I have a patch that purports to fix some of these
> things. I did not try this patch myself for this issue.
> http://patchwork.freedesktop.org/patch/54025/
> 
> (I've been hanging on to this since I needed to do a bit of research to address
> Matt's feedback, specifically regarding render compression).
> 
> > 
> > 
> > Could you include the console output of the failure you get?
> > 
> 
> Here are two sample failures (occurred in 7 runs)
> 
> Using test set: Core formats
> texsubimage failed
>   target: GL_TEXTURE_2D
>   internal format: GL_INTENSITY
>   region: 27, 1  13 x 61
> Mesa: User error: GL_INVALID_OPERATION in glTexSubImage3D(out of bounds PBO access)
> 
> Using test set: Core formats
> texsubimage failed
>   target: GL_TEXTURE_2D
>   internal format: GL_INTENSITY
>   region: 80, 38  24 x 25
> texsubimage failed
>   target: GL_TEXTURE_2D
>   internal format: GL_LUMINANCE8
>   region: 50, 5  26 x 58
> Mesa: User error: GL_INVALID_OPERATION in glTexSubImage2D(out of bounds PBO access)
> PIGLIT: {"result": "fail" }
> 
> 
> > > 
> > > Cc: <mesa-stable at lists.freedesktop.org>
> > > Cc: Kenneth Graunke <kenneth at whitecape.org>
> > > Cc: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > > Reported-by: Mark Janes <mark.a.janes at intel.com> (jenkins)
> > > Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_state_cache.c | 52 ++++++++++++++++++-----------
> > >  1 file changed, 32 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/i965/brw_state_cache.c
> > > index fbc0419..e50d6a0 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_state_cache.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
> > > @@ -200,23 +200,36 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t new_size)
> > >  }
> > >  
> > >  /**
> > > - * Attempts to find an item in the cache with identical data.
> > > + * Attempts to find an item in the cache with identical data and aux
> > > + * data to use
> > >   */
> > > -static const struct brw_cache_item *
> > > -brw_lookup_prog(const struct brw_cache *cache,
> > > -                enum brw_cache_id cache_id,
> > > -                const void *data, unsigned data_size)
> > > +static bool
> > > +brw_try_upload_using_copy(struct brw_cache *cache,
> > > +			  struct brw_cache_item *result_item,
> > > +			  const void *data,
> > > +			  const void *aux)
> > >  {
> > > -   const struct brw_context *brw = cache->brw;
> > > +   struct brw_context *brw = cache->brw;
> > >     unsigned i;
> > > -   const struct brw_cache_item *item;
> > > +   struct brw_cache_item *item;
> > >  
> > >     for (i = 0; i < cache->size; i++) {
> > >        for (item = cache->items[i]; item; item = item->next) {
> > > +	 const void *item_aux = item->key + item->key_size;
> > >  	 int ret;
> > >  
> > > -	 if (item->cache_id != cache_id || item->size != data_size)
> > > +	 if (item->cache_id != result_item->cache_id ||
> > > +	     item->size != result_item->size ||
> > > +	     item->aux_size != result_item->aux_size) {
> > > +	    continue;
> > > +	 }
> > > +
> > > +         if (cache->aux_compare[result_item->cache_id]) {
> > > +            if (!cache->aux_compare[result_item->cache_id](item_aux, aux))
> > > +               continue;
> > > +         } else if (memcmp(item_aux, aux, item->aux_size) != 0) {
> > >  	    continue;
> > > +	 }
> > >  
> > >           if (!brw->has_llc)
> > >              drm_intel_bo_map(cache->bo, false);
> > > @@ -226,11 +239,13 @@ brw_lookup_prog(const struct brw_cache *cache,
> > >  	 if (ret)
> > >  	    continue;
> > >  
> > > -	 return item;

I'm getting stronger feeling that this is somehow timing related. I forced
the cache lookup to always miss by replacing the line above by

         return NULL;

And with Ben's machine against gbm backend the error observed by Ben still
happens. This should be the safest mechanism as nothing is re-used.

I'll keep looking.

> > > +	 result_item->offset = item->offset;
> > > +
> > > +	 return true;
> > >        }
> > >     }
> > >  
> > > -   return NULL;
> > > +   return false;
> > >  }
> > >  
> > >  static uint32_t
> > > @@ -279,8 +294,6 @@ brw_upload_cache(struct brw_cache *cache,
> > >  {
> > >     struct brw_context *brw = cache->brw;
> > >     struct brw_cache_item *item = CALLOC_STRUCT(brw_cache_item);
> > > -   const struct brw_cache_item *matching_data =
> > > -      brw_lookup_prog(cache, cache_id, data, data_size);
> > >     GLuint hash;
> > >     void *tmp;
> > >  
> > > @@ -292,15 +305,14 @@ brw_upload_cache(struct brw_cache *cache,
> > >     hash = hash_key(item);
> > >     item->hash = hash;
> > >  
> > > -   /* If we can find a matching prog in the cache already, then reuse the
> > > -    * existing stuff without creating new copy into the underlying buffer
> > > -    * object. This is notably useful for programs generating shaders at
> > > -    * runtime, where multiple shaders may compile to the same thing in our
> > > -    * backend.
> > > +   /* If we can find a matching prog/prog_data combo in the cache
> > > +    * already, then reuse the existing stuff.  This will mean not
> > > +    * flagging CACHE_NEW_* when transitioning between the two
> > > +    * equivalent hash keys.  This is notably useful for programs
> > > +    * generating shaders at runtime, where multiple shaders may
> > > +    * compile to the thing in our backend.
> > >      */
> > > -   if (matching_data) {
> > > -      item->offset = matching_data->offset;
> > > -   } else {
> > > +   if (!brw_try_upload_using_copy(cache, item, data, aux)) {
> > >        item->offset = brw_alloc_item_data(cache, data_size);
> > >  
> > >        /* Copy data to the buffer */
> > > -- 
> > > 2.5.0
> > > 


More information about the mesa-stable mailing list