[Mesa-stable] [PATCH] Revert "i965: Stop aux data compare preventing program binary re-use"
Pohjolainen, Topi
topi.pohjolainen at intel.com
Thu Aug 27 23:56:43 PDT 2015
On Thu, Aug 27, 2015 at 10:05:14AM -0700, Ben Widawsky wrote:
> On Thu, Aug 27, 2015 at 10:51:59AM +0300, Pohjolainen, Topi wrote:
> > On Wed, Aug 26, 2015 at 03:46:05PM -0700, Ben Widawsky wrote:
> > > This reverts commit 1bba29ed403e735ba0bf04ed8aa2e571884fcaaf
> > > Author: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > > Date: Thu Jun 25 14:00:41 2015 +0300
> > >
> > > i965: Stop aux data compare preventing program binary re-use
> > >
> > > This fixes an intermittent failure in
> > > piglit.spec.arb_pixel_buffer_object.texsubimage pbo.sklm64 (maybe other
> > > platforms as well, but it is harder to reproduce). I can usually hit the failure
> > > within 10 runs of the test. This is a very hairy commit to debug. I'll let Topi
> > > handle it, or else we should go with the revert. I am open to either. I got
> > > lucky that Jenkins caught this on a run.
> > >
> > > Here was the script I used for bisect:
> > >
> > > i=0
> > > while [ $i -lt 40 ] ; do
> > > ./bin/texsubimage pbo -auto -fbo > /dev/null 2>&1
> > > [[ $? != 0 ]] && echo fail && exit 1
> > > ((i++))
> > > done
> > >
> > > exit 0
> >
> > Should I use different piglit version than the current master? I'm asking
> > because I get this with both Mesa master and my patch reverted.
>
> My piglit was pretty old. I just updated that, but mesa was master as of
> yesterday (output is below)
>
> The one bit of advice I can add is that you make sure your system is updated to
> the very latest.
>
> >
> > testrunner at skl-y:~/topi/piglit$ ./bin/texsubimage pbo -auto -fbo
> > Using test set: Core formats
> > texsubimage failed
> > target: GL_TEXTURE_2D
> > internal format: GL_COMPRESSED_RGB_S3TC_DXT1_EXT
> > region: 68, 12 32 x 48
> > texsubimage failed
> > target: GL_TEXTURE_2D
> > internal format: GL_COMPRESSED_RGBA_S3TC_DXT1_EXT
> > region: 0, 28 116 x 20
> > texsubimage failed
> > target: GL_TEXTURE_2D
> > internal format: GL_COMPRESSED_RGBA_S3TC_DXT3_EXT
> > region: 16, 4 60 x 36
> > texsubimage failed
> > target: GL_TEXTURE_2D
> > internal format: GL_COMPRESSED_RGBA_S3TC_DXT5_EXT
> > region: 8, 0 104 x 60
> > Mesa: User error: GL_INVALID_OPERATION in glTexSubImage2D(out of bounds PBO access)
> > PIGLIT: {"result": "fail" }
>
> Interesting. It so happens I have a patch that purports to fix some of these
> things. I did not try this patch myself for this issue.
> http://patchwork.freedesktop.org/patch/54025/
>
> (I've been hanging on to this since I needed to do a bit of research to address
> Matt's feedback, specifically regarding render compression).
>
> >
> >
> > Could you include the console output of the failure you get?
> >
>
> Here are two sample failures (occurred in 7 runs)
>
> Using test set: Core formats
> texsubimage failed
> target: GL_TEXTURE_2D
> internal format: GL_INTENSITY
> region: 27, 1 13 x 61
> Mesa: User error: GL_INVALID_OPERATION in glTexSubImage3D(out of bounds PBO access)
>
> Using test set: Core formats
> texsubimage failed
> target: GL_TEXTURE_2D
> internal format: GL_INTENSITY
> region: 80, 38 24 x 25
> texsubimage failed
> target: GL_TEXTURE_2D
> internal format: GL_LUMINANCE8
> region: 50, 5 26 x 58
> Mesa: User error: GL_INVALID_OPERATION in glTexSubImage2D(out of bounds PBO access)
> PIGLIT: {"result": "fail" }
>
>
> > >
> > > Cc: <mesa-stable at lists.freedesktop.org>
> > > Cc: Kenneth Graunke <kenneth at whitecape.org>
> > > Cc: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > > Reported-by: Mark Janes <mark.a.janes at intel.com> (jenkins)
> > > Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> > > ---
> > > src/mesa/drivers/dri/i965/brw_state_cache.c | 52 ++++++++++++++++++-----------
> > > 1 file changed, 32 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/i965/brw_state_cache.c
> > > index fbc0419..e50d6a0 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_state_cache.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
> > > @@ -200,23 +200,36 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t new_size)
> > > }
> > >
> > > /**
> > > - * Attempts to find an item in the cache with identical data.
> > > + * Attempts to find an item in the cache with identical data and aux
> > > + * data to use
> > > */
> > > -static const struct brw_cache_item *
> > > -brw_lookup_prog(const struct brw_cache *cache,
> > > - enum brw_cache_id cache_id,
> > > - const void *data, unsigned data_size)
> > > +static bool
> > > +brw_try_upload_using_copy(struct brw_cache *cache,
> > > + struct brw_cache_item *result_item,
> > > + const void *data,
> > > + const void *aux)
> > > {
> > > - const struct brw_context *brw = cache->brw;
> > > + struct brw_context *brw = cache->brw;
> > > unsigned i;
> > > - const struct brw_cache_item *item;
> > > + struct brw_cache_item *item;
> > >
> > > for (i = 0; i < cache->size; i++) {
> > > for (item = cache->items[i]; item; item = item->next) {
> > > + const void *item_aux = item->key + item->key_size;
> > > int ret;
> > >
> > > - if (item->cache_id != cache_id || item->size != data_size)
> > > + if (item->cache_id != result_item->cache_id ||
> > > + item->size != result_item->size ||
> > > + item->aux_size != result_item->aux_size) {
> > > + continue;
> > > + }
> > > +
> > > + if (cache->aux_compare[result_item->cache_id]) {
> > > + if (!cache->aux_compare[result_item->cache_id](item_aux, aux))
> > > + continue;
> > > + } else if (memcmp(item_aux, aux, item->aux_size) != 0) {
> > > continue;
> > > + }
> > >
> > > if (!brw->has_llc)
> > > drm_intel_bo_map(cache->bo, false);
> > > @@ -226,11 +239,13 @@ brw_lookup_prog(const struct brw_cache *cache,
> > > if (ret)
> > > continue;
> > >
> > > - return item;
I'm getting stronger feeling that this is somehow timing related. I forced
the cache lookup to always miss by replacing the line above by
return NULL;
And with Ben's machine against gbm backend the error observed by Ben still
happens. This should be the safest mechanism as nothing is re-used.
I'll keep looking.
> > > + result_item->offset = item->offset;
> > > +
> > > + return true;
> > > }
> > > }
> > >
> > > - return NULL;
> > > + return false;
> > > }
> > >
> > > static uint32_t
> > > @@ -279,8 +294,6 @@ brw_upload_cache(struct brw_cache *cache,
> > > {
> > > struct brw_context *brw = cache->brw;
> > > struct brw_cache_item *item = CALLOC_STRUCT(brw_cache_item);
> > > - const struct brw_cache_item *matching_data =
> > > - brw_lookup_prog(cache, cache_id, data, data_size);
> > > GLuint hash;
> > > void *tmp;
> > >
> > > @@ -292,15 +305,14 @@ brw_upload_cache(struct brw_cache *cache,
> > > hash = hash_key(item);
> > > item->hash = hash;
> > >
> > > - /* If we can find a matching prog in the cache already, then reuse the
> > > - * existing stuff without creating new copy into the underlying buffer
> > > - * object. This is notably useful for programs generating shaders at
> > > - * runtime, where multiple shaders may compile to the same thing in our
> > > - * backend.
> > > + /* If we can find a matching prog/prog_data combo in the cache
> > > + * already, then reuse the existing stuff. This will mean not
> > > + * flagging CACHE_NEW_* when transitioning between the two
> > > + * equivalent hash keys. This is notably useful for programs
> > > + * generating shaders at runtime, where multiple shaders may
> > > + * compile to the thing in our backend.
> > > */
> > > - if (matching_data) {
> > > - item->offset = matching_data->offset;
> > > - } else {
> > > + if (!brw_try_upload_using_copy(cache, item, data, aux)) {
> > > item->offset = brw_alloc_item_data(cache, data_size);
> > >
> > > /* Copy data to the buffer */
> > > --
> > > 2.5.0
> > >
More information about the mesa-stable
mailing list