[Mesa-dev] [PATCH] Revert "i965: Stop aux data compare preventing program binary re-use"

Pohjolainen, Topi topi.pohjolainen at intel.com
Tue Sep 1 00:54:12 PDT 2015


On Fri, Aug 28, 2015 at 09:34:12AM -0700, Ben Widawsky wrote:
> On Fri, Aug 28, 2015 at 10:15:30AM +0300, Pohjolainen, Topi wrote:
> > On Fri, Aug 28, 2015 at 09:56:43AM +0300, Pohjolainen, Topi wrote:
> > > On Thu, Aug 27, 2015 at 10:05:14AM -0700, Ben Widawsky wrote:
> > > > On Thu, Aug 27, 2015 at 10:51:59AM +0300, Pohjolainen, Topi wrote:
> > > > > On Wed, Aug 26, 2015 at 03:46:05PM -0700, Ben Widawsky wrote:
> > > > > > This reverts commit 1bba29ed403e735ba0bf04ed8aa2e571884fcaaf
> > > > > > Author: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > > > > > Date:   Thu Jun 25 14:00:41 2015 +0300
> > > > > > 
> > > > > >     i965: Stop aux data compare preventing program binary re-use
> > > > > > 
> > > > > > This fixes an intermittent failure in
> > > > > > piglit.spec.arb_pixel_buffer_object.texsubimage pbo.sklm64 (maybe other
> > > > > > platforms as well, but it is harder to reproduce). I can usually hit the failure
> > > > > > within 10 runs of the test. This is a very hairy commit to debug. I'll let Topi
> > > > > > handle it, or else we should go with the revert. I am open to either. I got
> > > > > > lucky that Jenkins caught this on a run.
> > > > > > 
> > > > > > Here was the script I used for bisect:
> > > > > > 
> > > > > > i=0
> > > > > > while [ $i -lt 40 ] ; do
> > > > > >    ./bin/texsubimage pbo -auto -fbo > /dev/null 2>&1
> > > > > >    [[ $? != 0 ]] && echo fail && exit 1
> > > > > >    ((i++))
> > > > > > done
> > > > > > 
> > > > > > exit 0
> > > > > 
> > > > > Should I use different piglit version than the current master? I'm asking
> > > > > because I get this with both Mesa master and my patch reverted.
> > > > 
> > > > My piglit was pretty old. I just updated that, but mesa was master as of
> > > > yesterday (output is below)
> > > > 
> > > > The one bit of advice I can add is that you make sure your system is updated to
> > > > the very latest.
> > > > 
> > > > > 
> > > > > testrunner at skl-y:~/topi/piglit$ ./bin/texsubimage pbo -auto -fbo
> > > > > Using test set: Core formats
> > > > > texsubimage failed
> > > > >   target: GL_TEXTURE_2D
> > > > >   internal format: GL_COMPRESSED_RGB_S3TC_DXT1_EXT
> > > > >   region: 68, 12  32 x 48
> > > > > texsubimage failed
> > > > >   target: GL_TEXTURE_2D
> > > > >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT1_EXT
> > > > >   region: 0, 28  116 x 20
> > > > > texsubimage failed
> > > > >   target: GL_TEXTURE_2D
> > > > >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT3_EXT
> > > > >   region: 16, 4  60 x 36
> > > > > texsubimage failed
> > > > >   target: GL_TEXTURE_2D
> > > > >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT5_EXT
> > > > >   region: 8, 0  104 x 60
> > > > > Mesa: User error: GL_INVALID_OPERATION in glTexSubImage2D(out of bounds PBO access)
> > > > > PIGLIT: {"result": "fail" }
> > > > 
> > > > Interesting. It so happens I have a patch that purports to fix some of these
> > > > things. I did not try this patch myself for this issue.
> > > > http://patchwork.freedesktop.org/patch/54025/
> > > > 
> > > > (I've been hanging on to this since I needed to do a bit of research to address
> > > > Matt's feedback, specifically regarding render compression).
> > > > 
> > > > > 
> > > > > 
> > > > > Could you include the console output of the failure you get?
> > > > > 
> > > > 
> > > > Here are two sample failures (occurred in 7 runs)
> > > > 
> > > > Using test set: Core formats
> > > > texsubimage failed
> > > >   target: GL_TEXTURE_2D
> > > >   internal format: GL_INTENSITY
> > > >   region: 27, 1  13 x 61
> > > > Mesa: User error: GL_INVALID_OPERATION in glTexSubImage3D(out of bounds PBO access)
> > > > 
> > > > Using test set: Core formats
> > > > texsubimage failed
> > > >   target: GL_TEXTURE_2D
> > > >   internal format: GL_INTENSITY
> > > >   region: 80, 38  24 x 25
> > > > texsubimage failed
> > > >   target: GL_TEXTURE_2D
> > > >   internal format: GL_LUMINANCE8
> > > >   region: 50, 5  26 x 58
> > > > Mesa: User error: GL_INVALID_OPERATION in glTexSubImage2D(out of bounds PBO access)
> > > > PIGLIT: {"result": "fail" }
> > > > 
> > > > 
> > > > > > 
> > > > > > Cc: <mesa-stable at lists.freedesktop.org>
> > > > > > Cc: Kenneth Graunke <kenneth at whitecape.org>
> > > > > > Cc: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > > > > > Reported-by: Mark Janes <mark.a.janes at intel.com> (jenkins)
> > > > > > Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> > > > > > ---
> > > > > >  src/mesa/drivers/dri/i965/brw_state_cache.c | 52 ++++++++++++++++++-----------
> > > > > >  1 file changed, 32 insertions(+), 20 deletions(-)
> > > > > > 
> > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/i965/brw_state_cache.c
> > > > > > index fbc0419..e50d6a0 100644
> > > > > > --- a/src/mesa/drivers/dri/i965/brw_state_cache.c
> > > > > > +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
> > > > > > @@ -200,23 +200,36 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t new_size)
> > > > > >  }
> > > > > >  
> > > > > >  /**
> > > > > > - * Attempts to find an item in the cache with identical data.
> > > > > > + * Attempts to find an item in the cache with identical data and aux
> > > > > > + * data to use
> > > > > >   */
> > > > > > -static const struct brw_cache_item *
> > > > > > -brw_lookup_prog(const struct brw_cache *cache,
> > > > > > -                enum brw_cache_id cache_id,
> > > > > > -                const void *data, unsigned data_size)
> > > > > > +static bool
> > > > > > +brw_try_upload_using_copy(struct brw_cache *cache,
> > > > > > +			  struct brw_cache_item *result_item,
> > > > > > +			  const void *data,
> > > > > > +			  const void *aux)
> > > > > >  {
> > > > > > -   const struct brw_context *brw = cache->brw;
> > > > > > +   struct brw_context *brw = cache->brw;
> > > > > >     unsigned i;
> > > > > > -   const struct brw_cache_item *item;
> > > > > > +   struct brw_cache_item *item;
> > > > > >  
> > > > > >     for (i = 0; i < cache->size; i++) {
> > > > > >        for (item = cache->items[i]; item; item = item->next) {
> > > > > > +	 const void *item_aux = item->key + item->key_size;
> > > > > >  	 int ret;
> > > > > >  
> > > > > > -	 if (item->cache_id != cache_id || item->size != data_size)
> > > > > > +	 if (item->cache_id != result_item->cache_id ||
> > > > > > +	     item->size != result_item->size ||
> > > > > > +	     item->aux_size != result_item->aux_size) {
> > > > > > +	    continue;
> > > > > > +	 }
> > > > > > +
> > > > > > +         if (cache->aux_compare[result_item->cache_id]) {
> > > > > > +            if (!cache->aux_compare[result_item->cache_id](item_aux, aux))
> > > > > > +               continue;
> > > > > > +         } else if (memcmp(item_aux, aux, item->aux_size) != 0) {
> > > > > >  	    continue;
> > > > > > +	 }
> > > > > >  
> > > > > >           if (!brw->has_llc)
> > > > > >              drm_intel_bo_map(cache->bo, false);
> > > > > > @@ -226,11 +239,13 @@ brw_lookup_prog(const struct brw_cache *cache,
> > > > > >  	 if (ret)
> > > > > >  	    continue;
> > > > > >  
> > > > > > -	 return item;
> > > 
> > > I'm getting stronger feeling that this is somehow timing related. I forced
> > > the cache lookup to always miss by replacing the line above by
> > > 
> > >          return NULL;
> > > 
> > > And with Ben's machine against gbm backend the error observed by Ben still
> > > happens. This should be the safest mechanism as nothing is re-used.
> > > 
> > > I'll keep looking.
> > 
> > Actually, when I revert the patch, I still get:
> > 
> > ATTENTION: default value of option vblank_mode overridden by environment.
> > ATTENTION: default value of option vblank_mode overridden by environment.
> > Using test set: Core formats
> > Mesa: User error: GL_INVALID_OPERATION in glTexSubImage3D(out of bounds PBO access)
> > Unexpected GL error: GL_INVALID_OPERATION 0x502
> 
> Sorry for not being clearer. That's right. but that's not the failure I was
> referring to. The INVALID_OPERATION will result in a piglit skip, not a failure.
> 
> The failure I am asking you to look at is different.

Right. We discussed this quite a bit more. Here is a little more insight.
First, this fires a few hundreds of times during one round of test. I guess
this is not intended.

diff --git a/src/mesa/main/image.c b/src/mesa/main/image.c
index 711a190..0759bdc 100644
--- a/src/mesa/main/image.c
+++ b/src/mesa/main/image.c
@@ -192,6 +192,17 @@ _mesa_image_offset( GLuint dimensions,
                  + topOfImage
                  + (skiprows + row) * bytes_per_row
                  + (skippixels + column) * bytes_per_pixel;
+
+      if (offset > packing->BufferObj->Size) {
+         printf("_mesa_image_offset: bytes_per_image = %u, %u:%u:%u\n",
+                bytes_per_image, skipimages, skiprows, skippixels);
+         printf("_mesa_image_offset: %u:%u:%u:%u\n",
+                (skipimages + img) * bytes_per_image,
+                topOfImage,
+                (skiprows + row) * bytes_per_row,
+                (skippixels + column) * bytes_per_pixel);
+      }
+
    }
 
    return offset;


Also observing tests/texturing/texsubimage.c::test_format() tells us that
the test tries to check the image contents (by calling equal_images())
regardless if the texture uploading succeeded or not. So if I'm interpreting
this correctly, in the "passing" case we are just lucky, nothing gets uploaded
so we are comparing against data that is not written by the test.


More information about the mesa-dev mailing list