[Mesa-dev] [RFC] i965: Don't consider uniform value locations in program uploads

Ben Widawsky ben at bwidawsk.net
Thu Jun 4 17:35:11 PDT 2015


On Wed, Jun 03, 2015 at 09:32:55PM +0300, Pohjolainen, Topi wrote:
> On Wed, Jun 03, 2015 at 09:21:11PM +0300, Topi Pohjolainen wrote:
> > Shader programs are cached per stage (FS, VS, GS) using the
> > corresponding shader source identifier and compile time choices
> > as key. However, one not only stores the program binary but
> > a pair consisting of program binary and program data. The latter
> > represents the store of constants (such as uniforms) used by
> > the program.
> > 
> > However, when programs are searched in the cache for reloading
> > only the program key representing the binary is considered
> > (see for example, brw_upload_wm_prog() and brw_search_cache()).
> > Hence, when programs are re-loaded from cache the first program
> > binary, program data pair is extracted without considering if
> > the program data matches the currently in use uniform storage
> > as well.
> > 
> > My reasoning Why this actually works is because the key
> > contains the identifier of the corresponding gl_program that
> > represents the source code for the shader program. Hence,
> > two programs having identical source code still have unique
> > keys.
> > And therefore brw_try_upload_using_copy() never encounters
> > a case where a matching binary is found but the program data
> > doesn't match.
> 
> In fact, thinking some more I think this is possible when the
> same, say fragment shader, is used with two different vertex
> shaders. This results into there being matching binaries but
> program data pointing to different storage. Looking at
> brw_upload_cache() I still can't see how failing
> brw_try_upload_using_copy() makes a difference. We only upload
> the program binary again (even though that is the part that
> actually matches). And then proceed the same way regardless
> of the result of brw_try_upload_using_copy(). The program data
> gets augmented with the key.
> 
> But the point remains that when a program is reloaded through
> the brw_search_cache() only the key (and not the program data)
> is considered returning the first matching pair.
> 
> I probably need to write a piglit test for this.
> 
> > 
> > My ultimate goal is to stop storing pointers to the individual
> > components of a uniform but to store only a pointer to the
> > "struct gl_uniform_storage" instead, and allow
> > gen6_upload_push_constants() to iterate over individual
> > components and array elements. This is needed to be able to
> > convert 32-bits floats to fp16 - otherwise there is only
> > pointer to 32-bits without knowing its type (int, float, etc)
> > let alone its target precision.
> > 
> > No regression in jenkins. However, we talked about this with
> > Ken and this doesn't really tell much as piglit doesn't really
> > re-use shader sources during one execution.
> > 
> > Signed-off-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > CC: Kenneth Graunke <kenneth at whitecape.org>
> > CC: Tapani P\344lli <tapani.palli at intel.com>
> > ---
> >  src/mesa/drivers/dri/i965/brw_program.c | 6 ------
> >  1 file changed, 6 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c
> > index e5c0d3c..7f5fde8 100644
> > --- a/src/mesa/drivers/dri/i965/brw_program.c
> > +++ b/src/mesa/drivers/dri/i965/brw_program.c
> > @@ -576,12 +576,6 @@ brw_stage_prog_data_compare(const struct brw_stage_prog_data *a,
> >     if (memcmp(a, b, offsetof(struct brw_stage_prog_data, param)))
> >        return false;
> >  
> > -   if (memcmp(a->param, b->param, a->nr_params * sizeof(void *)))
> > -      return false;
> > -
> > -   if (memcmp(a->pull_param, b->pull_param, a->nr_pull_params * sizeof(void *)))
> > -      return false;
> > -
> >     return true;
> >  }
> >  

I am looking at a lot of this code for the first time, and I have a kind of wild
guess.

The first time you upload a program, the program (kinda annoying that
brw_upload_item_data doesn't seem to actually do that). Malloc a pointer (tmp,
item->key), store the program and aux there. Set that pointer as the key.

The aux data lives at key + key_size.

Indeed search_cache() only checks the key. For WM it does contain the
urb_entry data that I think would change if number of uniforms differed. So for
your example above with 2 VS sharing an FS, if the number of uniforms are the
same, then the program should be identical in the FS, right? Similarly for the
GS with input_varyings. I think generally this is the behavior you'd want.

brw_try_upload_using_copy() seems correct to me as it does do the aux_compare
(and falls back to memcmp).

I didn't verify aux is really only uniforms and/or varyings though.

Hopefully this wasn't too stupid.


More information about the mesa-dev mailing list