[Mesa-dev] [Mesa-stable] [PATCH] Revert "i965: Stop aux data compare preventing program binary re-use"

Ben Widawsky ben at bwidawsk.net
Thu Sep 24 10:14:56 PDT 2015


On Thu, Sep 24, 2015 at 11:41:15AM +0100, Emil Velikov wrote:
> Hi guys,
> 
> On 2 September 2015 at 20:10, Pohjolainen, Topi
> <topi.pohjolainen at intel.com> wrote:
> > On Thu, Aug 27, 2015 at 10:05:14AM -0700, Ben Widawsky wrote:
> >> On Thu, Aug 27, 2015 at 10:51:59AM +0300, Pohjolainen, Topi wrote:
> >> > On Wed, Aug 26, 2015 at 03:46:05PM -0700, Ben Widawsky wrote:
> >> > > This reverts commit 1bba29ed403e735ba0bf04ed8aa2e571884fcaaf
> >> > > Author: Topi Pohjolainen <topi.pohjolainen at intel.com>
> >> > > Date:   Thu Jun 25 14:00:41 2015 +0300
> >> > >
> >> > >     i965: Stop aux data compare preventing program binary re-use
> >> > >
> >> > > This fixes an intermittent failure in
> >> > > piglit.spec.arb_pixel_buffer_object.texsubimage pbo.sklm64 (maybe other
> >> > > platforms as well, but it is harder to reproduce). I can usually hit the failure
> >> > > within 10 runs of the test. This is a very hairy commit to debug. I'll let Topi
> >> > > handle it, or else we should go with the revert. I am open to either. I got
> >> > > lucky that Jenkins caught this on a run.
> >> > >
> >> > > Here was the script I used for bisect:
> >> > >
> >> > > i=0
> >> > > while [ $i -lt 40 ] ; do
> >> > >    ./bin/texsubimage pbo -auto -fbo > /dev/null 2>&1
> >> > >    [[ $? != 0 ]] && echo fail && exit 1
> >> > >    ((i++))
> >> > > done
> >> > >
> >> > > exit 0
> >> >
> >> > Should I use different piglit version than the current master? I'm asking
> >> > because I get this with both Mesa master and my patch reverted.
> >>
> >> My piglit was pretty old. I just updated that, but mesa was master as of
> >> yesterday (output is below)
> >>
> >> The one bit of advice I can add is that you make sure your system is updated to
> >> the very latest.
> >>
> >> >
> >> > testrunner at skl-y:~/topi/piglit$ ./bin/texsubimage pbo -auto -fbo
> >> > Using test set: Core formats
> >> > texsubimage failed
> >> >   target: GL_TEXTURE_2D
> >> >   internal format: GL_COMPRESSED_RGB_S3TC_DXT1_EXT
> >> >   region: 68, 12  32 x 48
> >> > texsubimage failed
> >> >   target: GL_TEXTURE_2D
> >> >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT1_EXT
> >> >   region: 0, 28  116 x 20
> >> > texsubimage failed
> >> >   target: GL_TEXTURE_2D
> >> >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT3_EXT
> >> >   region: 16, 4  60 x 36
> >> > texsubimage failed
> >> >   target: GL_TEXTURE_2D
> >> >   internal format: GL_COMPRESSED_RGBA_S3TC_DXT5_EXT
> >> >   region: 8, 0  104 x 60
> >> > Mesa: User error: GL_INVALID_OPERATION in glTexSubImage2D(out of bounds PBO access)
> >> > PIGLIT: {"result": "fail" }
> >>
> >> Interesting. It so happens I have a patch that purports to fix some of these
> >> things. I did not try this patch myself for this issue.
> >> http://patchwork.freedesktop.org/patch/54025/
> >>
> >> (I've been hanging on to this since I needed to do a bit of research to address
> >> Matt's feedback, specifically regarding render compression).
> >>
> >> >
> >> >
> >> > Could you include the console output of the failure you get?
> >> >
> >>
> >> Here are two sample failures (occurred in 7 runs)
> >>
> >> Using test set: Core formats
> >> texsubimage failed
> >>   target: GL_TEXTURE_2D
> >>   internal format: GL_INTENSITY
> >>   region: 27, 1  13 x 61
> >> Mesa: User error: GL_INVALID_OPERATION in glTexSubImage3D(out of bounds PBO access)
> >
> > I got more and more a feeling that this is timing related. As the current
> > logic fails to ever re-use the items in the cache, the comparisons are
> > redundant and we can hardcode the search to always fail. This changes the
> > runtime dynamics closer to the patch being reverted:
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/
> > index f5f279c..746fe7b 100644
> > --- a/src/mesa/drivers/dri/i965/brw_state_cache.c
> > +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
> > @@ -224,6 +224,8 @@ brw_try_upload_using_copy(struct brw_cache *cache,
> >             continue;
> >          }
> >
> > +         return false;
> > +
> >           if (cache->aux_compare[result_item->cache_id]) {
> >              if (!cache->aux_compare[result_item->cache_id](item_aux, aux))
> >                 continue;
> >
> >
> > I can reproduce the error on IVB when I disable fast clears (I noticed that
> > the error is not SKL specific - it depends on the fast clears being disabled).
> Based on the discussion so far, I take it that it's unlikely that
> we'll revert the commit at this point ?
> Should we drop this patch for now ?
> 
> Thanks
> Emil

Yes. So far as we can tell, the regression is real and fixed by the revert - but
it's only the one test, and only on SKL. The bug is being tracked here:
https://bugs.freedesktop.org/show_bug.cgi?id=91926

Topi and I discussed a few things...
1. Merging the patch.
2. Merging the patch to stable only
3. Doing nothing. There is a bugzilla people can find.

I am in favor of #3


More information about the mesa-dev mailing list